Running date update

Question

Running date update

Phil Parkin

SSC Guru

Points: 246960
More actions
February 13, 2017 at 7:16 am

#321657

Hi all, I can't think of a fast way of doing this. I would like a query which updates a date column whenever status becomes T, and for all future rows, for that Id (by Date1, ascending, in the code which follows). The complication is that if there is a later row with status T, that date should be used for all subsequent rows, unless there is another T (in which case, use that row's date) and so on.
Here is some setup code:
IF OBJECT_ID('tempdb..#SomeTab', 'U') IS NOT NULL DROP TABLE #SomeTab;
CREATE TABLE #SomeTab ( SomeId INT , Date1 DATETIME , Stat CHAR(1) , Date2 DATETIME );
INSERT #SomeTab ( SomeId , Date1 , Stat , Date2 ) VALUES ( 1 ,'20150101' ,'A' ,NULL ) ,( 1 , '20150201' , 'A' , NULL ) ,( 1 , '20150301' , 'T' , NULL ) ,( 1 , '20150401' , 'A' , NULL ) ,( 1 , '20150501' , 'A' , NULL ) ,( 1 , '20150601' , 'T' , NULL ) ,( 1 , '20150701' , 'A' , NULL );
SELECT * FROM #SomeTab st ORDER BY st.SomeId, st.Date1

And here is how I would like the results to look:
IF OBJECT_ID('tempdb..#Results', 'U') IS NOT NULL DROP TABLE #Results;
CREATE TABLE #Results ( SomeId INT , Date1 DATETIME , Stat CHAR(1) , Date2 DATETIME );
INSERT #Results ( SomeId , Date1 , Stat , Date2 ) VALUES ( 1 ,'20150101' ,'A' ,NULL ) ,( 1 , '20150201' , 'A' , NULL ) ,( 1 , '20150301' , 'T' , '20150301' ) ,( 1 , '20150401' , 'A' , '20150301' ) ,( 1 , '20150501' , 'A' , '20150301' ) ,( 1 , '20150601' , 'T' , '20150601' ) ,( 1 , '20150701' , 'A' , '20150601' );
SELECT * FROM #Results r ORDER BY r.SomeId, r.Date1
--Edit: The datetimes should be dates in the code above.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

LinksUp SSCertifiable Points: 6526 More actions · Answer 1

This is similar to a problem I had last week. The TOP 1 option was the answer when looking for a "sliding" result.

Don't know if this fulfills your requirement that it be fast, but it does work. And it may point you in a new direction that may lead to fast.

with cte as ( select SomeID, Date1, Stat from #someTab where Stat = 'T' ) Update st SET st.Date2 = (Select top 1 c.Date1 from cte c where st.Date1 >= c.Date1 and st.someID = c.someID order by c.Date1 desc ) from #someTab st;

select * from #someTab;

__________________________________________________________________________________________________________
How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 2

This doesn't use a correlated subquery, so it should perform better, especially if you have an index on Date1. Parts of it are shamelessly borrowed from this article.

WITH RangeStarts AS ( SELECT SomeId , Date1 , Stat , CASE WHEN Stat = 'A' THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY SomeId, Stat ORDER BY Date1) END AS PtnNo FROM #SomeTab ) , Partitioned AS ( SELECT SomeId , Date1 , Stat , COUNT(PtnNo) OVER (PARTITION BY SomeId ORDER BY Date1 ROWS UNBOUNDED PRECEDING) AS PtnNo FROM RangeStarts ) SELECT SomeId , Date1 , Stat , CASE WHEN PtnNo = 0 THEN NULL ELSE FIRST_VALUE(Date1) OVER (PARTITION BY SomeId, PtnNo ORDER BY Date1) END AS Date2 FROM Partitioned ORDER BY Date1

John

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

Yet another option.
WITH OnlyTs AS( SELECT Date1, LEAD(Date1, 1, '9999') OVER(ORDER BY Date1) NextDate FROM #SomeTab WHERE Stat = 'T' ) --SELECT * UPDATE st SET Date2 = t.Date1 FROM #SomeTab st JOIN OnlyTs t ON st.Date1 >= t.Date1 AND st.Date1 < t.NextDate;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Phil Parkin SSC Guru Points: 246960 More actions · Answer 4

Thanks very much to those who responded, I appreciate the help.
I've ended up with a somewhat more complex version of Luis' query, which performed the best on my data. My version was more complex because I forgot to mention the special case where multiple rows (in Date1 order) have status 'T'. In these cases, I need to set the date to the date of the first occurrence of the status T cluster. I did this by introducing another CTE. The final version of my query looks like this:

WITH BaseData AS ( SELECT r.PolicyNumber , r.ValuationDate , PrevStatus = LAG(r.PolicyStatus, 1, '_') OVER (PARTITION BY r.PolicyNumber ORDER BY r.ValuationDate ) , r.PolicyStatus FROM #Results r ) , OnlyTs AS ( SELECT bd.PolicyNumber , bd.ValuationDate , NextDate = LEAD(bd.ValuationDate, 1, '9999') OVER (PARTITION BY bd.PolicyNumber ORDER BY bd.ValuationDate ) FROM BaseData bd WHERE bd.PolicyStatus = 'T' AND bd.PrevStatus <> 'T' ) UPDATE r SET r.DecrementDate = t.ValuationDate FROM #Results r JOIN OnlyTs t ON r.ValuationDate >= t.ValuationDate AND r.ValuationDate < t.NextDate AND r.PolicyNumber = t.PolicyNumber;

This updates approximately 1 million rows of a 6.5 million row temp table in 50 seconds or so.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

drew.allen SSC Guru Points: 76985 More actions · Answer 5

The problem with that solution is that it requires a JOIN which is expensive. You can do it without a JOIN.

; WITH T_Dates AS ( SELECT *, CASE WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL WHEN st.PolicyStatus = 'T' THEN st.ValuationDate END AS t_date FROM #SomeTab st ) SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate FROM T_Dates

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Phil Parkin SSC Guru Points: 246960 More actions · Answer 6

drew.allen - Monday, February 13, 2017 4:14 PM
The problem with that solution is that it requires a JOIN which is expensive. You can do it without a JOIN.
; WITH T_Dates AS ( SELECT *, CASE WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL WHEN st.PolicyStatus = 'T' THEN st.ValuationDate END AS t_date FROM #SomeTab st ) SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate FROM T_Dates

Looks like I have more testing to do tomorrow, thanks, Drew!

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Phil Parkin SSC Guru Points: 246960 More actions · Answer 7

Phil Parkin - Monday, February 13, 2017 4:47 PM
drew.allen - Monday, February 13, 2017 4:14 PM
The problem with that solution is that it requires a JOIN which is expensive. You can do it without a JOIN.
; WITH T_Dates AS ( SELECT *, CASE WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL WHEN st.PolicyStatus = 'T' THEN st.ValuationDate END AS t_date FROM #SomeTab st ) SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate FROM T_Dates
Looks like I have more testing to do tomorrow, thanks, Drew!

While the SELECT version of this query is appealingly succinct, the UPDATE version is less so. At least, my UPDATE version is:

WITH T_Dates AS ( SELECT r.PolicyNumber , r.ValuationDate , r.PolicyStatus , t_date = CASE WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber ORDER BY r.ValuationDate ) = 'T' THEN NULL WHEN r.PolicyStatus = 'T' THEN r.ValuationDate END FROM #Results r ) , DecDates AS ( SELECT T_Dates.PolicyNumber , T_Dates.ValuationDate , T_Dates.PolicyStatus , DecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber ORDER BY T_Dates.ValuationDate ROWS UNBOUNDED PRECEDING ) FROM T_Dates ) UPDATE r SET r.DecrementDate = DecDates.DecrementDate FROM #Results r JOIN DecDates ON r.PolicyNumber = DecDates.PolicyNumber AND r.ValuationDate = DecDates.ValuationDate;

In performance terms, this is roughly the same as the solution I posted earlier, despite having a cleaner-looking execution plan.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 8

Phil Parkin - Tuesday, February 14, 2017 7:13 AM
While the SELECT version of this query is appealingly succinct, the UPDATE version is less so. At least, my UPDATE version is:
WITH T_Dates AS ( SELECT r.PolicyNumber , r.ValuationDate , r.PolicyStatus , t_date = CASE WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber ORDER BY r.ValuationDate ) = 'T' THEN NULL WHEN r.PolicyStatus = 'T' THEN r.ValuationDate END FROM #Results r ) , DecDates AS ( SELECT T_Dates.PolicyNumber , T_Dates.ValuationDate , T_Dates.PolicyStatus , DecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber ORDER BY T_Dates.ValuationDate ROWS UNBOUNDED PRECEDING ) FROM T_Dates ) UPDATE r SET r.DecrementDate = DecDates.DecrementDate FROM #Results r JOIN DecDates ON r.PolicyNumber = DecDates.PolicyNumber AND r.ValuationDate = DecDates.ValuationDate;

In performance terms, this is roughly the same as the solution I posted earlier, despite having a cleaner-looking execution plan.

You can update through a CTE, you don't need the join back to the table at the end. You're using different columns names than in the test code you posted, so I'm not certain this will execute, however I have used this approach many times. Build a CTE that returns the field you want to update and a field with the new value, then just simply execute the update on the cte. Just make sure you are only affecting columns from a single source table.
WITH T_Dates AS ( SELECT r.PolicyNumber , r.ValuationDate , r.PolicyStatus , r.DecrementDate , t_date = CASE WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber ORDER BY r.ValuationDate ) = 'T' THEN NULL WHEN r.PolicyStatus = 'T' THEN r.ValuationDate END FROM #Results r ) , DecDates AS ( SELECT T_Dates.PolicyNumber , T_Dates.ValuationDate , T_Dates.PolicyStatus , T_Dates.DecrementDate , NewDecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber ORDER BY T_Dates.ValuationDate ROWS UNBOUNDED PRECEDING ) FROM T_Dates ) UPDATE DecDates SET DecrementDate = NewDecrementDate

Wes
(A solid design is always preferable to a creative workaround)

Phil Parkin SSC Guru Points: 246960 More actions · Answer 9

whenriksen - Tuesday, February 14, 2017 7:42 AM
You can update through a CTE, you don't need the join back to the table at the end. You're using different columns names than in the test code you posted, so I'm not certain this will execute, however I have used this approach many times. Build a CTE that returns the field you want to update and a field with the new value, then just simply execute the update on the cte. Just make sure you are only affecting columns from a single source table.
WITH T_Dates AS ( SELECT r.PolicyNumber , r.ValuationDate , r.PolicyStatus , r.DecrementDate , t_date = CASE WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber ORDER BY r.ValuationDate ) = 'T' THEN NULL WHEN r.PolicyStatus = 'T' THEN r.ValuationDate END FROM #Results r ) , DecDates AS ( SELECT T_Dates.PolicyNumber , T_Dates.ValuationDate , T_Dates.PolicyStatus , T_Dates.DecrementDate , NewDecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber ORDER BY T_Dates.ValuationDate ROWS UNBOUNDED PRECEDING ) FROM T_Dates ) UPDATE DecDates SET DecrementDate = NewDecrementDate

This version works well & produces a nice clean non-parallel plan. Thanks for helping (and reminding me about updateable CTEs).

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Jeff Moden SSC Guru Points: 1004354 More actions · Answer 10

So how long does the update through the CTE take for 1 million out of the 6.5 million rows now?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Luis Cazares SSC Guru Points: 183706 More actions · Answer 11

Jeff Moden - Tuesday, February 14, 2017 9:12 AM
So how long does the update through the CTE take for 1 million out of the 6.5 million rows now?

I might know where this is going.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Phil Parkin SSC Guru Points: 246960 More actions · Answer 12

Jeff Moden - Tuesday, February 14, 2017 9:12 AM
So how long does the update through the CTE take for 1 million out of the 6.5 million rows now?

Takes around 38 seconds.
Out of interest, if I add a WHERE DecDates.DecrementDate <> DecDates.NewDecrementDate filter to the update, it takes around 12 seconds on second and subsequent iterations.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin