February 13, 2017 at 7:16 am
Hi all, I can't think of a fast way of doing this. I would like a query which updates a date column whenever status becomes T, and for all future rows, for that Id (by Date1, ascending, in the code which follows). The complication is that if there is a later row with status T, that date should be used for all subsequent rows, unless there is another T (in which case, use that row's date) and so on.
Here is some setup code:
IF OBJECT_ID('tempdb..#SomeTab', 'U') IS NOT NULL
DROP TABLE #SomeTab;
CREATE TABLE #SomeTab
(
SomeId INT
, Date1 DATETIME
, Stat CHAR(1)
, Date2 DATETIME
);
INSERT #SomeTab
(
SomeId
, Date1
, Stat
, Date2
)
VALUES
(
1
,'20150101'
,'A'
,NULL
)
,(
1
, '20150201'
, 'A'
, NULL
)
,(
1
, '20150301'
, 'T'
, NULL
)
,(
1
, '20150401'
, 'A'
, NULL
)
,(
1
, '20150501'
, 'A'
, NULL
)
,(
1
, '20150601'
, 'T'
, NULL
)
,(
1
, '20150701'
, 'A'
, NULL
);
SELECT *
FROM #SomeTab st
ORDER BY st.SomeId, st.Date1
And here is how I would like the results to look:
IF OBJECT_ID('tempdb..#Results', 'U') IS NOT NULL
DROP TABLE #Results;
CREATE TABLE #Results
(
SomeId INT
, Date1 DATETIME
, Stat CHAR(1)
, Date2 DATETIME
);
INSERT #Results
(
SomeId
, Date1
, Stat
, Date2
)
VALUES
(
1
,'20150101'
,'A'
,NULL
)
,(
1
, '20150201'
, 'A'
, NULL
)
,(
1
, '20150301'
, 'T'
, '20150301'
)
,(
1
, '20150401'
, 'A'
, '20150301'
)
,(
1
, '20150501'
, 'A'
, '20150301'
)
,(
1
, '20150601'
, 'T'
, '20150601'
)
,(
1
, '20150701'
, 'A'
, '20150601'
);
SELECT * FROM #Results r
ORDER BY r.SomeId, r.Date1
--Edit: The datetimes should be dates in the code above.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 13, 2017 at 8:02 am
This is similar to a problem I had last week. The TOP 1 option was the answer when looking for a "sliding" result.
Don't know if this fulfills your requirement that it be fast, but it does work. And it may point you in a new direction that may lead to fast.
with cte as
(
select SomeID, Date1, Stat
from #someTab
where Stat = 'T'
)
Update st SET
st.Date2 = (Select top 1 c.Date1
from cte c
where st.Date1 >= c.Date1
and st.someID = c.someID
order by c.Date1 desc
)
from #someTab st;
select * from #someTab;
__________________________________________________________________________________________________________
How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/
February 13, 2017 at 8:41 am
This doesn't use a correlated subquery, so it should perform better, especially if you have an index on Date1. Parts of it are shamelessly borrowed from this article.
WITH RangeStarts AS (
SELECT
SomeId
, Date1
, Stat
, CASE
WHEN Stat = 'A' THEN NULL
ELSE ROW_NUMBER() OVER (PARTITION BY SomeId, Stat ORDER BY Date1)
END AS PtnNo
FROM #SomeTab
)
, Partitioned AS (
SELECT
SomeId
, Date1
, Stat
, COUNT(PtnNo) OVER (PARTITION BY SomeId ORDER BY Date1 ROWS UNBOUNDED PRECEDING) AS PtnNo
FROM RangeStarts
)
SELECT
SomeId
, Date1
, Stat
, CASE
WHEN PtnNo = 0 THEN NULL
ELSE FIRST_VALUE(Date1) OVER (PARTITION BY SomeId, PtnNo ORDER BY Date1)
END AS Date2
FROM Partitioned
ORDER BY Date1
John
February 13, 2017 at 9:27 am
Yet another option.
WITH OnlyTs AS(
SELECT Date1, LEAD(Date1, 1, '9999') OVER(ORDER BY Date1) NextDate
FROM #SomeTab
WHERE Stat = 'T'
)
--SELECT *
UPDATE st SET
Date2 = t.Date1
FROM #SomeTab st
JOIN OnlyTs t ON st.Date1 >= t.Date1 AND st.Date1 < t.NextDate;
February 13, 2017 at 12:00 pm
Thanks very much to those who responded, I appreciate the help.
I've ended up with a somewhat more complex version of Luis' query, which performed the best on my data. My version was more complex because I forgot to mention the special case where multiple rows (in Date1 order) have status 'T'. In these cases, I need to set the date to the date of the first occurrence of the status T cluster. I did this by introducing another CTE. The final version of my query looks like this:
WITH BaseData
AS
(
SELECT
r.PolicyNumber
, r.ValuationDate
, PrevStatus = LAG(r.PolicyStatus, 1, '_') OVER (PARTITION BY r.PolicyNumber
ORDER BY r.ValuationDate
)
, r.PolicyStatus
FROM #Results r
)
, OnlyTs
AS
(
SELECT
bd.PolicyNumber
, bd.ValuationDate
, NextDate = LEAD(bd.ValuationDate, 1, '9999') OVER (PARTITION BY bd.PolicyNumber
ORDER BY bd.ValuationDate
)
FROM BaseData bd
WHERE
bd.PolicyStatus = 'T' AND
bd.PrevStatus <> 'T'
)
UPDATE r
SET r.DecrementDate = t.ValuationDate
FROM
#Results r
JOIN OnlyTs t ON r.ValuationDate >= t.ValuationDate AND
r.ValuationDate < t.NextDate AND
r.PolicyNumber = t.PolicyNumber;
This updates approximately 1 million rows of a 6.5 million row temp table in 50 seconds or so.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 13, 2017 at 4:14 pm
The problem with that solution is that it requires a JOIN which is expensive. You can do it without a JOIN.
;
WITH T_Dates AS
(
SELECT *,
CASE
WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL
WHEN st.PolicyStatus = 'T' THEN st.ValuationDate
END AS t_date
FROM #SomeTab st
)
SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate
FROM T_Dates
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 13, 2017 at 4:47 pm
drew.allen - Monday, February 13, 2017 4:14 PMThe problem with that solution is that it requires a JOIN which is expensive. You can do it without a JOIN.
;
WITH T_Dates AS
(
SELECT *,
CASE
WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL
WHEN st.PolicyStatus = 'T' THEN st.ValuationDate
END AS t_date
FROM #SomeTab st
)
SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate
FROM T_Dates
Looks like I have more testing to do tomorrow, thanks, Drew!
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 14, 2017 at 7:13 am
Phil Parkin - Monday, February 13, 2017 4:47 PMdrew.allen - Monday, February 13, 2017 4:14 PMThe problem with that solution is that it requires a JOIN which is expensive. You can do it without a JOIN.
;
WITH T_Dates AS
(
SELECT *,
CASE
WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL
WHEN st.PolicyStatus = 'T' THEN st.ValuationDate
END AS t_date
FROM #SomeTab st
)
SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate
FROM T_DatesLooks like I have more testing to do tomorrow, thanks, Drew!
While the SELECT version of this query is appealingly succinct, the UPDATE version is less so. At least, my UPDATE version is:
WITH T_Dates
AS
(
SELECT
r.PolicyNumber
, r.ValuationDate
, r.PolicyStatus
, t_date = CASE
WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
ORDER BY r.ValuationDate
) = 'T' THEN
NULL
WHEN r.PolicyStatus = 'T' THEN
r.ValuationDate
END
FROM #Results r
)
, DecDates
AS
(
SELECT
T_Dates.PolicyNumber
, T_Dates.ValuationDate
, T_Dates.PolicyStatus
, DecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
ORDER BY T_Dates.ValuationDate
ROWS UNBOUNDED PRECEDING
)
FROM T_Dates
)
UPDATE r
SET r.DecrementDate = DecDates.DecrementDate
FROM
#Results r
JOIN DecDates ON r.PolicyNumber = DecDates.PolicyNumber AND
r.ValuationDate = DecDates.ValuationDate;
In performance terms, this is roughly the same as the solution I posted earlier, despite having a cleaner-looking execution plan.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 14, 2017 at 7:42 am
Phil Parkin - Tuesday, February 14, 2017 7:13 AMWhile the SELECT version of this query is appealingly succinct, the UPDATE version is less so. At least, my UPDATE version is:
WITH T_Dates
AS
(
SELECT
r.PolicyNumber
, r.ValuationDate
, r.PolicyStatus
, t_date = CASE
WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
ORDER BY r.ValuationDate
) = 'T' THEN
NULL
WHEN r.PolicyStatus = 'T' THEN
r.ValuationDate
END
FROM #Results r
)
, DecDates
AS
(
SELECT
T_Dates.PolicyNumber
, T_Dates.ValuationDate
, T_Dates.PolicyStatus
, DecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
ORDER BY T_Dates.ValuationDate
ROWS UNBOUNDED PRECEDING
)
FROM T_Dates
)
UPDATE r
SET r.DecrementDate = DecDates.DecrementDate
FROM
#Results r
JOIN DecDates ON r.PolicyNumber = DecDates.PolicyNumber AND
r.ValuationDate = DecDates.ValuationDate;
In performance terms, this is roughly the same as the solution I posted earlier, despite having a cleaner-looking execution plan.
You can update through a CTE, you don't need the join back to the table at the end. You're using different columns names than in the test code you posted, so I'm not certain this will execute, however I have used this approach many times. Build a CTE that returns the field you want to update and a field with the new value, then just simply execute the update on the cte. Just make sure you are only affecting columns from a single source table. WITH T_Dates
AS
(
SELECT
r.PolicyNumber
, r.ValuationDate
, r.PolicyStatus
, r.DecrementDate
, t_date = CASE
WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
ORDER BY r.ValuationDate
) = 'T' THEN
NULL
WHEN r.PolicyStatus = 'T' THEN
r.ValuationDate
END
FROM #Results r
)
, DecDates
AS
(
SELECT
T_Dates.PolicyNumber
, T_Dates.ValuationDate
, T_Dates.PolicyStatus
, T_Dates.DecrementDate
, NewDecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
ORDER BY T_Dates.ValuationDate
ROWS UNBOUNDED PRECEDING
)
FROM T_Dates
)
UPDATE DecDates
SET DecrementDate = NewDecrementDate
Wes
(A solid design is always preferable to a creative workaround)
February 14, 2017 at 8:00 am
whenriksen - Tuesday, February 14, 2017 7:42 AMYou can update through a CTE, you don't need the join back to the table at the end. You're using different columns names than in the test code you posted, so I'm not certain this will execute, however I have used this approach many times. Build a CTE that returns the field you want to update and a field with the new value, then just simply execute the update on the cte. Just make sure you are only affecting columns from a single source table.
WITH T_Dates
AS
(
SELECT
r.PolicyNumber
, r.ValuationDate
, r.PolicyStatus
, r.DecrementDate
, t_date = CASE
WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
ORDER BY r.ValuationDate
) = 'T' THEN
NULL
WHEN r.PolicyStatus = 'T' THEN
r.ValuationDate
END
FROM #Results r
)
, DecDates
AS
(
SELECT
T_Dates.PolicyNumber
, T_Dates.ValuationDate
, T_Dates.PolicyStatus
, T_Dates.DecrementDate
, NewDecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
ORDER BY T_Dates.ValuationDate
ROWS UNBOUNDED PRECEDING
)
FROM T_Dates
)
UPDATE DecDates
SET DecrementDate = NewDecrementDate
This version works well & produces a nice clean non-parallel plan. Thanks for helping (and reminding me about updateable CTEs).
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 14, 2017 at 9:12 am
So how long does the update through the CTE take for 1 million out of the 6.5 million rows now?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 14, 2017 at 9:44 am
Jeff Moden - Tuesday, February 14, 2017 9:12 AMSo how long does the update through the CTE take for 1 million out of the 6.5 million rows now?
I might know where this is going.
February 14, 2017 at 10:01 am
Jeff Moden - Tuesday, February 14, 2017 9:12 AMSo how long does the update through the CTE take for 1 million out of the 6.5 million rows now?
Takes around 38 seconds.
Out of interest, if I add a WHERE DecDates.DecrementDate <> DecDates.NewDecrementDate filter to the update, it takes around 12 seconds on second and subsequent iterations.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply