Obtaining earliest change for one colum?

Question

Post reply

Obtaining earliest change for one colum?

Rob-350472

SSCertifiable

Points: 7197
More actions
April 11, 2013 at 10:07 am

#272933

Guys,
I've been looking at this problem on and off for a while and I can't for the life of me find a solution. Basically we have an audit system where by every time someone changes one field in one table it appends the entire row into another table, a slight simplification but you get the idea.
What I want to get is a historical record for one field over time - so say this was a list of individuals and it had their wealth in dollars, their job title, their address, their age etc. It may be that their wealth in dollars didn't change over the course of say 10 rows in the table but other variables did change, I'm trying to build up a picture of their wealth changing. Totally random example but I thought it might help!
With this test data:
SELECT * INTO #test
FROM
(
SELECT 1 as Audit_ID,123 as EntID,100 as Field_A,10 as Field_B,'01-Jan-12' as ActionDate UNION
SELECT 2 as Audit_ID,123 as EntID,110 as Field_A,10 as Field_B,'15-Feb-12' as ActionDate UNION
SELECT 3 as Audit_ID,123 as EntID,150 as Field_A,15 as Field_B,'20-Mar-12' as ActionDate UNION
SELECT 4 as Audit_ID,123 as EntID,100 as Field_A,10 as Field_B,'12-Jun-12' as ActionDate UNION
SELECT 5 as Audit_ID,445 as EntID,50 as Field_A,4 as Field_B,'02-Apr-12' as ActionDate UNION
SELECT 6 as Audit_ID,445 as EntID,60 as Field_A,3 as Field_B,'15-May-12' as ActionDate UNION
SELECT 7 as Audit_ID,445 as EntID,30 as Field_A,3 as Field_B,'18-Aug-12' as ActionDate UNION
SELECT 8 as Audit_ID,552 as EntID,500 as Field_A,1 as Field_B,'10-Jan-12' as ActionDate UNION
SELECT 9 as Audit_ID,552 as EntID,450 as Field_A,3 as Field_B,'15-Mar-12' as ActionDate UNION
SELECT 10 as Audit_ID,552 as EntID,320 as Field_A,4 as Field_B,'29-Aug-12' as ActionDate
)x
Say I was trying to track field B over time for Entity (EntID) 123, I'd want a results set like this:
01-Jan-12 10
20-Mar-12 15
12-Jun-12 10
I thought I could do some kind of ranking whereby I partitioned on the field I was interested on and ranked on date, and chose those with rank 1, however, this doesn't take account for the same value at different time points - e.g. using the wealth example perhaps in Jan 2010 I had $200, in Feb 2010 this went up to 350 but then it fell to $200 again in September 2010, with my ranking idea I'd just have Feb 2010 and 200, not September too.
I thought about some form of 'double join' type thing along hte lines of Field B <> Field B, I've also got pratically no experience of them but for some reason I'm wondering if a CTE is called for?!
Hoepfully I've explained this not too bad and my test data gives you kind of an idea!
Thanks for reading 🙂

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

natevdh@gmail.com Valued Member Points: 69 More actions · Answer 1

You were on the right track with the self join

SELECT t1.ActionDate, t1.Field_B

FROM #test t1

LEFT JOIN #test t2 ON t1.Audit_ID = t2.Audit_ID-1

WHERE t1.Field_B != t2.Field_B OR t2.Audit_ID IS NULL

natevdh@gmail.com Valued Member Points: 69 More actions · Answer 2

Forgot the entity chunk. Yes you will need a CTE and a Row_Number function.

[Code]

DECLARE @EntityID INT = 123;

WITH cte

AS (

SELECT row_Number() OVER (ORDER BY Audit_ID) AS RN

,t.Field_B

,t.ActionDate

FROM #test t

WHERE @EntityID = t.EntID

)

SELECT t1.ActionDate

,t1.Field_B

FROM cte t1

LEFT JOIN cte t2 ON t1.RN = t2.RN - 1

WHERE t1.Field_B != t2.Field_B

OR t2.RN IS NULL

[/code]

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

Here is another option:

SELECT

Audit_ID,

EntID,

Field_A,

Field_B,

ActionDate,

rn3 = ROW_NUMBER() OVER (PARTITION BY EntID ORDER BY ActionDate) - ROW_NUMBER() OVER (PARTITION BY EntID, Field_B ORDER BY ActionDate)

INTO #test

FROM

(

SELECT 1 as Audit_ID, 123 as EntID, 100 as Field_A, 10 as Field_B, CAST('01-Jan-12' AS DATETIME) as ActionDate UNION

SELECT 2 as Audit_ID, 123 as EntID, 110 as Field_A, 10 as Field_B, CAST('15-Feb-12' AS DATETIME) as ActionDate UNION

SELECT 3 as Audit_ID, 123 as EntID, 150 as Field_A, 15 as Field_B, CAST('20-Mar-12' AS DATETIME) as ActionDate UNION

SELECT 4 as Audit_ID, 123 as EntID, 100 as Field_A, 10 as Field_B, CAST('12-Jun-12' AS DATETIME) as ActionDate UNION

SELECT 5 as Audit_ID, 445 as EntID, 50 as Field_A, 4 as Field_B, CAST('02-Apr-12' AS DATETIME) as ActionDate UNION

SELECT 6 as Audit_ID, 445 as EntID, 60 as Field_A, 3 as Field_B, CAST('15-May-12' AS DATETIME) as ActionDate UNION

SELECT 7 as Audit_ID, 445 as EntID, 30 as Field_A, 3 as Field_B, CAST('18-Aug-12' AS DATETIME) as ActionDate UNION

SELECT 8 as Audit_ID, 552 as EntID, 500 as Field_A, 1 as Field_B, CAST('10-Jan-12' AS DATETIME) as ActionDate UNION

SELECT 9 as Audit_ID, 552 as EntID, 450 as Field_A, 3 as Field_B, CAST('15-Mar-12' AS DATETIME) as ActionDate UNION

SELECT 10 as Audit_ID, 552 as EntID, 320 as Field_A, 4 as Field_B, CAST('29-Aug-12' AS DATETIME) as ActionDate

)x

GO

WITH BaseData AS (

SELECT

Audit_ID,

EntID,

Field_A,

Field_B,

ActionDate,

rn3 = ROW_NUMBER() OVER (PARTITION BY EntID ORDER BY ActionDate) - ROW_NUMBER() OVER (PARTITION BY EntID, Field_B ORDER BY ActionDate)

FROM

#test

), IntermediateData AS (

SELECT

*,

rn = ROW_NUMBER() OVER (PARTITION BY EntID, rn3 ORDER BY ActionDate)

FROM

BaseData

)

SELECT

*

FROM

IntermediateData

WHERE

rn = 1

ORDER BY

EntID,

ActionDate;

GO

DROP TABLE #test;

GO

Rob-350472 SSCertifiable Points: 7197 More actions · Answer 4

Thanks for the responses guys, Lynn, I went with your method in the end, it's exactly what I was looking for!

I'm glad I was on the right track knowing some form of ranking and or self joins and CTEs were called for, I just couldn't quite get it down. When I get some time (who know's when that'll be!) i really need to get stuck into some CTE testing to see what they can do.

Thanks again 🙂

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

Here's an alternative with no CTE or ranking function:

SELECT

s.*,

y.ReasonToKeep

FROM Sampledata s -- CI scan

OUTER APPLY ( -- CI seek possible

SELECT TOP 1 *

FROM Sampledata i

WHERE i.EntID = s.EntID

AND i.ActionDate < s.ActionDate

ORDER BY i.EntID, i.ActionDate DESC) x

CROSS APPLY ( -- compute scalar

SELECT

ReasonToKeep = CASE

WHEN x.Field_B IS NULL THEN 'FIRST'

WHEN x.Field_B <> s.Field_B THEN 'CHANGED'

END

) y

WHERE y.ReasonToKeep IS NOT NULL

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

The CTE and NoCTE versions are comparable on a small data set. I'd like to try them both on a million row data set and see how they compare.

Perhaps this evening I will have a chance to set it up.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

Lynn Pettis (4/12/2013)
The CTE and NoCTE versions are comparable on a small data set. I'd like to try them both on a million row data set and see how they compare.
Perhaps this evening I will have a chance to set it up.

Heh if it's a race Lynn, I reserve the right to change shoes!!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Rob-350472 SSCertifiable Points: 7197 More actions · Answer 8

I'd like to have time to try the above non CTE approach, time doesn't allow at the moment but hopefully at some point.

FYI the #test replica on my 'real' data yields about 600,000 rows so it's not an insignificant data set, however, it's probably quite a rare request (well, I've not had to dig into these tables for this info ever and we've had them on the go since 2008!)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

Rob-350472 (4/12/2013)
I'd like to have time to try the above non CTE approach, time doesn't allow at the moment but hopefully at some point.
FYI the #test replica on my 'real' data yields about 600,000 rows so it's not an insignificant data set, however, it's probably quite a rare request (well, I've not had to dig into these tables for this info ever and we've had them on the go since 2008!)

I know from recent experiments with similar Gaps'n'Islands solutions that the query I posted will work well providing there's an index to support seeks where shown - preferably unique. It performs best when the number of rows returned is significantly less than the total number of rows in the table and works well over many millions of rows.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden