November 18, 2015 at 5:40 pm
Hello,
I'm looking for a SQL UPDATE statement to update all the historical class4 entries with the most recent class4 for a given identifier...
Original table:
asOfDate identifierclass4
3/31/2015 1 Demo
4/30/2015 1 Demo
5/31/2015 1 New
3/31/2015 2 Wrong
4/30/2015 2 Wrong
5/31/2015 2 Correct
After UPDATE statement, would be this:
asOfDate identifierclass4
3/31/2015 1 New
4/30/2015 1 New
5/31/2015 1 New
3/31/2015 2 Correct
4/30/2015 2 Correct
5/31/2015 2 Correct
November 18, 2015 at 9:15 pm
For future posts, help make it easy for people to help you. Please see the first link under "Helpful Links" in my signature line at the end of this post.
Here's a solution to your problem, test code included. Details are documented in the code.
--===== If it exists, drop the test table to make reruns easier in SSMS.
-- This is NOT a part of the solution.
IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL
DROP TABLE #TestTable
;
--===== Create the test table. This is NOT a part of the solution.
-- We're just creating test data here.
SELECT td.*
INTO #TestTable
FROM (
SELECT '3/31/2015',1,'Demo' UNION ALL
SELECT '4/30/2015',1,'Demo' UNION ALL
SELECT '5/31/2015',1,'New' UNION ALL
SELECT '3/31/2015',2,'Wrong' UNION ALL
SELECT '4/30/2015',2,'Wrong' UNION ALL
SELECT '5/31/2015',2,'Correct'
) td (asOfDate, identifier, class4)
;
--===== Show the "BEFORE" content of the table.
SELECT * FROM #TestTable
;
--===== Solve the problem using a self-joined UPDATE
-- SELECT tgt.*,dt.class4
UPDATE tgt
SET tgt.class4 = dt.class4
FROM #TestTable tgt
JOIN ( --=== Numbers the rows for each identifier. Latest row for each will be numbered "1".
SELECT RowNum = ROW_NUMBER() OVER (PARTITION BY identifier ORDER BY asOfDate DESC)
,asOfDate
,identifier
,class4
FROM #TestTable
) dt
ON tgt.identifier = dt.identifier
AND tgt.asOfDate <> dt.asOfDate
WHERE dt.RowNum = 1
;
--===== Show the "AFTER" content of the table.
SELECT * FROM #TestTable;
I believe that using something like "Lead/Lag" would do the same thing faster but I don't have 2012 on the machine that I'm currently working on and won't post code that I haven't tested.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 19, 2015 at 1:39 am
Jeff,
I dont think Lead/Lag will help in this case as it will only update by a given offset, so using the example
SELECT
#TestTable.*
,LEAD(class4,1,class4) OVER (PARTITION BY identifier ORDER BY asofdate) LEAD
FROM #TestTable
Which gives the result
asOfDate identifierclass4LEAD fn
3/31/20151DemoDemo
4/30/20151DemoNew
5/31/20151NewNew
3/31/20152WrongWrong
4/30/20152WrongCorrect
5/31/20152CorrectCorrect
The LAST_VALUE function will probably work, but will need an sort on the columns
SELECT
tgt.*
,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier)
FROM #TestTable tgt
Order by Identifier,AsOfDate ASC
I've attached the Plans for both your solution and the LAST_VALUE and there is very little difference in them, the last value has a single table scan with a 77% sort, your solution has 2 tables scansm and a sort.
One consideration is that the LAST_VALUE will update all rows as there is no where to check if the columns are the same, to get that you would probably need a CTE as you cant use the LAST_VALUE In a where clause, eg something like this
WITH CTE
AS
(
SELECT TOP 100 PERCENT
tgt.*
,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier) lastClass4
FROM #TestTable tgt
Order by Identifier,AsOfDate ASC
)
SELECT * FROM CTE WHERE lastClass4<>class4
But it doesnt seem to impact the actual plan.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 19, 2015 at 5:41 am
Shamelessly nicking Jeff's sample data:
--===== If it exists, drop the test table to make reruns easier in SSMS.
-- This is NOT a part of the solution.
IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL DROP TABLE #TestTable;
--===== Create the test table. This is NOT a part of the solution.
-- We're just creating test data here.
SELECT td.*
INTO #TestTable
FROM (
SELECT '3/31/2015',1,'Demo' UNION ALL
SELECT '4/30/2015',1,'Demo' UNION ALL
SELECT '5/31/2015',1,'New' UNION ALL
SELECT '3/31/2015',2,'Wrong' UNION ALL
SELECT '4/30/2015',2,'Wrong' UNION ALL
SELECT '5/31/2015',2,'Correct'
) td (asOfDate, identifier, class4);
--===== Show the "BEFORE" content of the table.
SELECT * FROM #TestTable;
-- See if an updatable CTE might work
WITH Updater AS (
SELECT t.asOfDate, t.identifier, t.class4, x.NewClass4
FROM #TestTable t
CROSS APPLY (SELECT TOP 1 NewClass4 = class4 FROM #TestTable ti WHERE ti.identifier = t.identifier ORDER BY asOfDate DESC) x
WHERE t.class4 <> x.NewClass4)
SELECT *
FROM Updater;
-- Looks promising so give it a go
WITH Updater AS (
SELECT t.asOfDate, t.identifier, t.class4, x.NewClass4
FROM #TestTable t
CROSS APPLY (SELECT TOP 1 NewClass4 = class4 FROM #TestTable ti WHERE ti.identifier = t.identifier ORDER BY asOfDate DESC) x
WHERE t.class4 <> x.NewClass4
)
UPDATE Updater SET class4 = NewClass4
--===== Show the "AFTER" content of the table: looks good to me.
SELECT * FROM #TestTable;
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 19, 2015 at 6:12 am
Jeff Moden (11/18/2015)
I believe that using something like "Lead/Lag" would do the same thing faster but I don't have 2012 on the machine that I'm currently working on and won't post code that I haven't tested.
FIRST_VALUE() i suppose. Have no 2012 at hand too.
November 19, 2015 at 6:51 am
Jason-299789 (11/19/2015)
Jeff,I dont think Lead/Lag will help in this case as it will only update by a given offset, so using the example
SELECT
#TestTable.*
,LEAD(class4,1,class4) OVER (PARTITION BY identifier ORDER BY asofdate) LEAD
FROM #TestTable
Which gives the result
asOfDate identifierclass4LEAD fn
3/31/20151DemoDemo
4/30/20151DemoNew
5/31/20151NewNew
3/31/20152WrongWrong
4/30/20152WrongCorrect
5/31/20152CorrectCorrect
The LAST_VALUE function will probably work, but will need an sort on the columns
SELECT
tgt.*
,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier)
FROM #TestTable tgt
Order by Identifier,AsOfDate ASC
I've attached the Plans for both your solution and the LAST_VALUE and there is very little difference in them, the last value has a single table scan with a 77% sort, your solution has 2 tables scansm and a sort.
One consideration is that the LAST_VALUE will update all rows as there is no where to check if the columns are the same, to get that you would probably need a CTE as you cant use the LAST_VALUE In a where clause, eg something like this
WITH CTE
AS
(
SELECT TOP 100 PERCENT
tgt.*
,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier) lastClass4
FROM #TestTable tgt
Order by Identifier,AsOfDate ASC
)
SELECT * FROM CTE WHERE lastClass4<>class4
But it doesnt seem to impact the actual plan.
Agreed on Lead/Lag. I don't know why (except that it's shorter to say) I continue to confuse those terms with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which is what both running totals and "data smears" (like what this actually is) can use. We could use a Quirky Update for really high performance (single pass UPDATE involved) but it requires a bit more attention to detail and would also require a reverse order Clustered Index.
Shifting gears, don't use TOP 100 PERCENT. From 2005 and on, it may not work as expected. You have to use TOP somereallylargenumber instead to have it work as expected. I typically use 2000000000 simply because it's easy to remember, easy to type, is still within the limits of an INT, and still really big. If you want to guarantee that you won't overrun it for most applications, use the upper limit of INT, which is 2147483647.
Totally forgot about the possibility of LAST_VALUE because we're still stuck with 2005 at work (hopefully updating on Q1 2016) and haven't had the opportunity to work much with it.
Thank you for taking the time to post your suggestions.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 19, 2015 at 6:54 am
serg-52 (11/19/2015)
Jeff Moden (11/18/2015)
I believe that using something like "Lead/Lag" would do the same thing faster but I don't have 2012 on the machine that I'm currently working on and won't post code that I haven't tested.FIRST_VALUE() i suppose. Have no 2012 at hand too.
Thanks, Serg. I really need to get out of the habit of saying "Lead/Lag". Seems like the natural thing but isn't right for things like this. Appreciate the suggestion on FIRST_VALUE(), as well.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 19, 2015 at 7:00 am
ChrisM@Work (11/19/2015)
Shamelessly nicking Jeff's sample data:
--===== If it exists, drop the test table to make reruns easier in SSMS.
-- This is NOT a part of the solution.
IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL DROP TABLE #TestTable;
--===== Create the test table. This is NOT a part of the solution.
-- We're just creating test data here.
SELECT td.*
INTO #TestTable
FROM (
SELECT '3/31/2015',1,'Demo' UNION ALL
SELECT '4/30/2015',1,'Demo' UNION ALL
SELECT '5/31/2015',1,'New' UNION ALL
SELECT '3/31/2015',2,'Wrong' UNION ALL
SELECT '4/30/2015',2,'Wrong' UNION ALL
SELECT '5/31/2015',2,'Correct'
) td (asOfDate, identifier, class4);
--===== Show the "BEFORE" content of the table.
SELECT * FROM #TestTable;
-- See if an updatable CTE might work
WITH Updater AS (
SELECT t.asOfDate, t.identifier, t.class4, x.NewClass4
FROM #TestTable t
CROSS APPLY (SELECT TOP 1 NewClass4 = class4 FROM #TestTable ti WHERE ti.identifier = t.identifier ORDER BY asOfDate DESC) x
WHERE t.class4 <> x.NewClass4)
SELECT *
FROM Updater;
-- Looks promising so give it a go
WITH Updater AS (
SELECT t.asOfDate, t.identifier, t.class4, x.NewClass4
FROM #TestTable t
CROSS APPLY (SELECT TOP 1 NewClass4 = class4 FROM #TestTable ti WHERE ti.identifier = t.identifier ORDER BY asOfDate DESC) x
WHERE t.class4 <> x.NewClass4
)
UPDATE Updater SET class4 = NewClass4
--===== Show the "AFTER" content of the table: looks good to me.
SELECT * FROM #TestTable;
Heh, you know me... I never mind someone nicking test data to demonstrate alternate methods. That's why I post it.
If I get the time, I'll try the PRECEEDING thing at work on one of the currently empty 2012 boxes because it would be nice if we could avoid the second scan.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 19, 2015 at 7:02 am
Jeff,
Thank you for the feedback on the TOP 100 Percent, I do try and avoid it whenever possible, but I know ORDER BY is not allowed in CTE's without a TOP.
Good luck with the Q1 Update.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 19, 2015 at 7:05 am
Jason-299789 (11/19/2015)
Jeff,Thank you for the feedback on the TOP 100 Percent, I do try and avoid it whenever possible, but I know ORDER BY is not allowed in CTE's without a TOP.
Good luck with the Q1 Update.
Logically, if you were to use ORDER BY without TOP in a CTE, there's a high probability that it will sting you with a sort.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 19, 2015 at 7:18 am
Jason-299789 (11/19/2015)
The LAST_VALUE function will probably work, but will need an sort on the columns
SELECT
tgt.*
,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier)
FROM #TestTable tgt
Order by Identifier,AsOfDate ASC
I thing it should be just
SELECT
tgt.*
,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY AsOfDate)
FROM #TestTable tgt
No more sorts but in OVER() is needed.
November 19, 2015 at 7:31 am
serg-52 (11/19/2015)
Jason-299789 (11/19/2015)
The LAST_VALUE function will probably work, but will need an sort on the columns
SELECT
tgt.*
,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY identifier)
FROM #TestTable tgt
Order by Identifier,AsOfDate ASC
I thing it should be just
SELECT
tgt.*
,LAST_VALUE(Class4) OVER (PARTITION BY identifier ORDER BY AsOfDate)
FROM #TestTable tgt
No more sorts but in OVER() is needed.
Serg if you run your code then you get the following data set
asOfDate identifierclass4LASTVALUE
3/31/20151DemoDemo
4/30/20151DemoDemo
5/31/20151NewNew
3/31/20152WrongWrong
4/30/20152WrongWrong
5/31/20152CorrectCorrect
Its a quirk with the LAST_VALUE/FIRST_VALUE window function in that the ORDER BY forms part of the Partition clause, if you had say dates the same you would get this output
asOfDate identifierclass4LASTVALUE
3/31/20151DemoDemo
4/30/20151DemoDemo
5/31/20151NewNew
3/31/20152WrongWrong
4/30/20152WrongCORRECT
4/30/20152CORRECTCORRECT
5/31/20152CorrectCorrect
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 19, 2015 at 8:59 am
This is not a problem with FIRST_VALUE, only LAST_VALUE. If you do not specify a window, it defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. For FIRST_VALUE, the desired value falls in the default range, but for LAST_VALUE, the desired value only falls in that range when the CURRENT ROW is the last row. There's a fix and a work around. The fix is to (always) specify the window. The workaround is to use FIRST_VALUE and then sort in reverse order, but this is a bit counter-intuitive.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
November 19, 2015 at 9:27 am
Thanks drew, I've just tried with First_value and an ORDER BY AsOFDate DESC) and that works perfectly.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
November 19, 2015 at 10:53 am
Ya just gotta love the way people come together to simplify things on some of these posts. Thanks folks.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply