April 11, 2013 at 10:07 am
Guys,
I've been looking at this problem on and off for a while and I can't for the life of me find a solution. Basically we have an audit system where by every time someone changes one field in one table it appends the entire row into another table, a slight simplification but you get the idea.
What I want to get is a historical record for one field over time - so say this was a list of individuals and it had their wealth in dollars, their job title, their address, their age etc. It may be that their wealth in dollars didn't change over the course of say 10 rows in the table but other variables did change, I'm trying to build up a picture of their wealth changing. Totally random example but I thought it might help!
With this test data:
SELECT * INTO #test
FROM
(
SELECT 1 as Audit_ID,123 as EntID,100 as Field_A,10 as Field_B,'01-Jan-12' as ActionDate UNION
SELECT 2 as Audit_ID,123 as EntID,110 as Field_A,10 as Field_B,'15-Feb-12' as ActionDate UNION
SELECT 3 as Audit_ID,123 as EntID,150 as Field_A,15 as Field_B,'20-Mar-12' as ActionDate UNION
SELECT 4 as Audit_ID,123 as EntID,100 as Field_A,10 as Field_B,'12-Jun-12' as ActionDate UNION
SELECT 5 as Audit_ID,445 as EntID,50 as Field_A,4 as Field_B,'02-Apr-12' as ActionDate UNION
SELECT 6 as Audit_ID,445 as EntID,60 as Field_A,3 as Field_B,'15-May-12' as ActionDate UNION
SELECT 7 as Audit_ID,445 as EntID,30 as Field_A,3 as Field_B,'18-Aug-12' as ActionDate UNION
SELECT 8 as Audit_ID,552 as EntID,500 as Field_A,1 as Field_B,'10-Jan-12' as ActionDate UNION
SELECT 9 as Audit_ID,552 as EntID,450 as Field_A,3 as Field_B,'15-Mar-12' as ActionDate UNION
SELECT 10 as Audit_ID,552 as EntID,320 as Field_A,4 as Field_B,'29-Aug-12' as ActionDate
)x
Say I was trying to track field B over time for Entity (EntID) 123, I'd want a results set like this:
01-Jan-12 10
20-Mar-12 15
12-Jun-12 10
I thought I could do some kind of ranking whereby I partitioned on the field I was interested on and ranked on date, and chose those with rank 1, however, this doesn't take account for the same value at different time points - e.g. using the wealth example perhaps in Jan 2010 I had $200, in Feb 2010 this went up to 350 but then it fell to $200 again in September 2010, with my ranking idea I'd just have Feb 2010 and 200, not September too.
I thought about some form of 'double join' type thing along hte lines of Field B <> Field B, I've also got pratically no experience of them but for some reason I'm wondering if a CTE is called for?!
Hoepfully I've explained this not too bad and my test data gives you kind of an idea!
Thanks for reading 🙂
April 11, 2013 at 12:05 pm
You were on the right track with the self join
SELECT t1.ActionDate, t1.Field_B
FROM #test t1
LEFT JOIN #test t2 ON t1.Audit_ID = t2.Audit_ID-1
WHERE t1.Field_B != t2.Field_B OR t2.Audit_ID IS NULL
April 11, 2013 at 12:26 pm
Forgot the entity chunk. Yes you will need a CTE and a Row_Number function.
[Code]
DECLARE @EntityID INT = 123;
WITH cte
AS (
SELECT row_Number() OVER (ORDER BY Audit_ID) AS RN
,t.Field_B
,t.ActionDate
FROM #test t
WHERE @EntityID = t.EntID
)
SELECT t1.ActionDate
,t1.Field_B
FROM cte t1
LEFT JOIN cte t2 ON t1.RN = t2.RN - 1
WHERE t1.Field_B != t2.Field_B
OR t2.RN IS NULL
[/code]
April 11, 2013 at 1:18 pm
Here is another option:
SELECT
Audit_ID,
EntID,
Field_A,
Field_B,
ActionDate,
rn3 = ROW_NUMBER() OVER (PARTITION BY EntID ORDER BY ActionDate) - ROW_NUMBER() OVER (PARTITION BY EntID, Field_B ORDER BY ActionDate)
INTO #test
FROM
(
SELECT 1 as Audit_ID, 123 as EntID, 100 as Field_A, 10 as Field_B, CAST('01-Jan-12' AS DATETIME) as ActionDate UNION
SELECT 2 as Audit_ID, 123 as EntID, 110 as Field_A, 10 as Field_B, CAST('15-Feb-12' AS DATETIME) as ActionDate UNION
SELECT 3 as Audit_ID, 123 as EntID, 150 as Field_A, 15 as Field_B, CAST('20-Mar-12' AS DATETIME) as ActionDate UNION
SELECT 4 as Audit_ID, 123 as EntID, 100 as Field_A, 10 as Field_B, CAST('12-Jun-12' AS DATETIME) as ActionDate UNION
SELECT 5 as Audit_ID, 445 as EntID, 50 as Field_A, 4 as Field_B, CAST('02-Apr-12' AS DATETIME) as ActionDate UNION
SELECT 6 as Audit_ID, 445 as EntID, 60 as Field_A, 3 as Field_B, CAST('15-May-12' AS DATETIME) as ActionDate UNION
SELECT 7 as Audit_ID, 445 as EntID, 30 as Field_A, 3 as Field_B, CAST('18-Aug-12' AS DATETIME) as ActionDate UNION
SELECT 8 as Audit_ID, 552 as EntID, 500 as Field_A, 1 as Field_B, CAST('10-Jan-12' AS DATETIME) as ActionDate UNION
SELECT 9 as Audit_ID, 552 as EntID, 450 as Field_A, 3 as Field_B, CAST('15-Mar-12' AS DATETIME) as ActionDate UNION
SELECT 10 as Audit_ID, 552 as EntID, 320 as Field_A, 4 as Field_B, CAST('29-Aug-12' AS DATETIME) as ActionDate
)x
GO
WITH BaseData AS (
SELECT
Audit_ID,
EntID,
Field_A,
Field_B,
ActionDate,
rn3 = ROW_NUMBER() OVER (PARTITION BY EntID ORDER BY ActionDate) - ROW_NUMBER() OVER (PARTITION BY EntID, Field_B ORDER BY ActionDate)
FROM
#test
), IntermediateData AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY EntID, rn3 ORDER BY ActionDate)
FROM
BaseData
)
SELECT
*
FROM
IntermediateData
WHERE
rn = 1
ORDER BY
EntID,
ActionDate;
GO
DROP TABLE #test;
GO
April 12, 2013 at 4:16 am
Thanks for the responses guys, Lynn, I went with your method in the end, it's exactly what I was looking for!
I'm glad I was on the right track knowing some form of ranking and or self joins and CTEs were called for, I just couldn't quite get it down. When I get some time (who know's when that'll be!) i really need to get stuck into some CTE testing to see what they can do.
Thanks again 🙂
April 12, 2013 at 6:35 am
Here's an alternative with no CTE or ranking function:
SELECT
s.*,
y.ReasonToKeep
FROM Sampledata s -- CI scan
OUTER APPLY ( -- CI seek possible
SELECT TOP 1 *
FROM Sampledata i
WHERE i.EntID = s.EntID
AND i.ActionDate < s.ActionDate
ORDER BY i.EntID, i.ActionDate DESC) x
CROSS APPLY ( -- compute scalar
SELECT
ReasonToKeep = CASE
WHEN x.Field_B IS NULL THEN 'FIRST'
WHEN x.Field_B <> s.Field_B THEN 'CHANGED'
END
) y
WHERE y.ReasonToKeep IS NOT NULL
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 12, 2013 at 7:59 am
The CTE and NoCTE versions are comparable on a small data set. I'd like to try them both on a million row data set and see how they compare.
Perhaps this evening I will have a chance to set it up.
April 12, 2013 at 8:02 am
Lynn Pettis (4/12/2013)
The CTE and NoCTE versions are comparable on a small data set. I'd like to try them both on a million row data set and see how they compare.Perhaps this evening I will have a chance to set it up.
Heh if it's a race Lynn, I reserve the right to change shoes!!
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 12, 2013 at 8:04 am
I'd like to have time to try the above non CTE approach, time doesn't allow at the moment but hopefully at some point.
FYI the #test replica on my 'real' data yields about 600,000 rows so it's not an insignificant data set, however, it's probably quite a rare request (well, I've not had to dig into these tables for this info ever and we've had them on the go since 2008!)
April 12, 2013 at 8:11 am
Rob-350472 (4/12/2013)
I'd like to have time to try the above non CTE approach, time doesn't allow at the moment but hopefully at some point.FYI the #test replica on my 'real' data yields about 600,000 rows so it's not an insignificant data set, however, it's probably quite a rare request (well, I've not had to dig into these tables for this info ever and we've had them on the go since 2008!)
I know from recent experiments with similar Gaps'n'Islands solutions that the query I posted will work well providing there's an index to support seeks where shown - preferably unique. It performs best when the number of rows returned is significantly less than the total number of rows in the table and works well over many millions of rows.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply