Running date update

  • Hi all, I can't think of a fast way of doing this. I would like a query which updates a date column whenever status becomes T, and for all future rows, for that Id (by Date1, ascending, in the code which follows). The complication is that if there is a later row with status T, that date should be used for all subsequent rows, unless there is another T (in which case, use that row's date) and so on.

    https://www.sqlservercentral.com/Forums/Uploads/Images/d38d9dcf-6273-41e3-a167-1ad1.PNG

    Here is some setup code:

    IF OBJECT_ID('tempdb..#SomeTab', 'U') IS NOT NULL
      DROP TABLE #SomeTab;

    CREATE TABLE #SomeTab
    (
      SomeId INT
    , Date1 DATETIME
    , Stat CHAR(1)
    , Date2 DATETIME
    );

    INSERT #SomeTab
    (
      SomeId
    , Date1
    , Stat
    , Date2
    )
    VALUES
    (
      1
      ,'20150101'
      ,'A'
      ,NULL
    )
    ,(
      1
    ,  '20150201'
    ,  'A'
    ,  NULL
    )
    ,(
      1
    ,  '20150301'
    ,  'T'
    ,  NULL
    )
    ,(
      1
    ,  '20150401'
    ,  'A'
    ,  NULL
    )
    ,(
      1
    ,  '20150501'
    ,  'A'
    ,  NULL
    )
    ,(
      1
    ,  '20150601'
    ,  'T'
    ,  NULL
    )
    ,(
      1
    ,  '20150701'
    ,  'A'
    ,  NULL
    );

    SELECT *
    FROM #SomeTab st
    ORDER BY st.SomeId, st.Date1


    And here is how I would like the results to look:

    IF OBJECT_ID('tempdb..#Results', 'U') IS NOT NULL
      DROP TABLE #Results;

    CREATE TABLE #Results
    (
      SomeId INT
    , Date1 DATETIME
    , Stat CHAR(1)
    , Date2 DATETIME
    );

    INSERT #Results
    (
      SomeId
    , Date1
    , Stat
    , Date2
    )
    VALUES
    (
      1
      ,'20150101'
      ,'A'
      ,NULL
    )
    ,(
      1
    ,  '20150201'
    ,  'A'
    ,  NULL
    )
    ,(
      1
    ,  '20150301'
    ,  'T'
    , '20150301'
    )
    ,(
      1
    ,  '20150401'
    ,  'A'
    ,  '20150301'
    )
    ,(
      1
    ,  '20150501'
    ,  'A'
    ,  '20150301'
    )
    ,(
      1
    ,  '20150601'
    ,  'T'
    ,  '20150601'
    )
    ,(
      1
    ,  '20150701'
    ,  'A'
    ,  '20150601'
    );

    SELECT * FROM #Results r
    ORDER BY r.SomeId, r.Date1

    --Edit: The datetimes should be dates in the code above.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • This is similar to a problem I had last week. The TOP 1 option was the answer when looking for a "sliding" result.

    Don't know if this fulfills your requirement that it be fast, but it does work. And it may point you in a new direction that may lead to fast.

    with cte as
    (
     select SomeID, Date1, Stat
      from #someTab
      where Stat = 'T'
     
    )
    Update st SET
     st.Date2 = (Select top 1 c.Date1
          from cte c
          where st.Date1 >= c.Date1
          and st.someID = c.someID
          order by c.Date1 desc
         )
     from #someTab st;

    select * from #someTab;

    __________________________________________________________________________________________________________
    How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • This doesn't use a correlated subquery, so it should perform better, especially if you have an index on Date1.  Parts of it are shamelessly borrowed from this article.

    WITH RangeStarts AS (
        SELECT
            SomeId
        ,    Date1
        ,    Stat
        ,    CASE
                WHEN Stat = 'A' THEN NULL
                ELSE ROW_NUMBER() OVER (PARTITION BY SomeId, Stat ORDER BY Date1)
            END AS PtnNo
        FROM #SomeTab
        )
    , Partitioned AS (
        SELECT
            SomeId
        ,    Date1
        ,    Stat
        ,    COUNT(PtnNo) OVER (PARTITION BY SomeId ORDER BY Date1 ROWS UNBOUNDED PRECEDING) AS PtnNo
        FROM RangeStarts
        )
    SELECT
        SomeId
    ,    Date1
    ,    Stat
    ,    CASE
            WHEN PtnNo = 0 THEN NULL
            ELSE FIRST_VALUE(Date1) OVER (PARTITION BY SomeId, PtnNo ORDER BY Date1)
        END AS Date2
    FROM Partitioned
    ORDER BY Date1

    John

  • Yet another option.

    WITH OnlyTs AS(
      SELECT Date1, LEAD(Date1, 1, '9999') OVER(ORDER BY Date1) NextDate
      FROM #SomeTab
      WHERE Stat = 'T'
    )
    --SELECT *
    UPDATE st SET
      Date2 = t.Date1
    FROM #SomeTab st
    JOIN OnlyTs t ON st.Date1 >= t.Date1 AND st.Date1 < t.NextDate;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks very much to those who responded, I appreciate the help.
    I've ended up with a somewhat more complex version of Luis' query, which performed the best on my data. My version was more complex because I forgot to mention the special case where multiple rows (in Date1 order) have status 'T'. In these cases, I need to set the date to the date of the first occurrence of the status T cluster. I did this by introducing another CTE. The final version of my query looks like this:

    WITH BaseData
    AS
    (
      SELECT
       r.PolicyNumber
      ,  r.ValuationDate
      ,  PrevStatus = LAG(r.PolicyStatus, 1, '_') OVER (PARTITION BY r.PolicyNumber
                       ORDER BY r.ValuationDate
                       )
      ,  r.PolicyStatus
      FROM #Results r
    )
    ,  OnlyTs
    AS
    (
      SELECT
       bd.PolicyNumber
      ,  bd.ValuationDate
      ,  NextDate = LEAD(bd.ValuationDate, 1, '9999') OVER (PARTITION BY bd.PolicyNumber
                        ORDER BY bd.ValuationDate
                        )
      FROM BaseData bd
      WHERE
       bd.PolicyStatus = 'T' AND
       bd.PrevStatus <> 'T'
    )
    UPDATE r
    SET  r.DecrementDate = t.ValuationDate
    FROM
       #Results r
    JOIN OnlyTs t ON r.ValuationDate >= t.ValuationDate AND
           r.ValuationDate < t.NextDate AND
           r.PolicyNumber = t.PolicyNumber;

    This updates approximately 1 million rows of a 6.5 million row temp table in 50 seconds or so.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • The problem with that solution is that it requires a JOIN which is expensive.  You can do it without a JOIN.

    ;
    WITH T_Dates AS
    (
        SELECT *,
            CASE
                WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL
                WHEN st.PolicyStatus = 'T' THEN st.ValuationDate
            END AS t_date
        FROM #SomeTab st
    )
    SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate
    FROM T_Dates

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Monday, February 13, 2017 4:14 PM

    The problem with that solution is that it requires a JOIN which is expensive.  You can do it without a JOIN.

    ;
    WITH T_Dates AS
    (
        SELECT *,
            CASE
                WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL
                WHEN st.PolicyStatus = 'T' THEN st.ValuationDate
            END AS t_date
        FROM #SomeTab st
    )
    SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate
    FROM T_Dates

    Looks like I have more testing to do tomorrow, thanks, Drew!

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Monday, February 13, 2017 4:47 PM

    drew.allen - Monday, February 13, 2017 4:14 PM

    The problem with that solution is that it requires a JOIN which is expensive.  You can do it without a JOIN.

    ;
    WITH T_Dates AS
    (
        SELECT *,
            CASE
                WHEN LAG(st.PolicyStatus) OVER(PARTITION BY st.PolicyNumber ORDER BY st.ValuationDate) = 'T' THEN NULL
                WHEN st.PolicyStatus = 'T' THEN st.ValuationDate
            END AS t_date
        FROM #SomeTab st
    )
    SELECT PolicyNumber, ValuationDate, PolicyStatus, MAX(t_date) OVER(PARTITION BY PolicyNumber ORDER BY ValuationDate ROWS UNBOUNDED PRECEDING) AS DecrementDate
    FROM T_Dates

    Looks like I have more testing to do tomorrow, thanks, Drew!

    While the SELECT version of this query is appealingly succinct, the UPDATE version is less so. At least, my UPDATE version is:

    WITH T_Dates
    AS
    (
      SELECT
       r.PolicyNumber
      ,  r.ValuationDate
      ,  r.PolicyStatus
      ,  t_date = CASE
            WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
                      ORDER BY r.ValuationDate
                      ) = 'T' THEN
             NULL
            WHEN r.PolicyStatus = 'T' THEN
             r.ValuationDate
          END
      FROM #Results r
    )
    ,  DecDates
    AS
    (
      SELECT
       T_Dates.PolicyNumber
      ,  T_Dates.ValuationDate
      ,  T_Dates.PolicyStatus
      ,  DecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
                     ORDER BY T_Dates.ValuationDate
                     ROWS UNBOUNDED PRECEDING
                     )
      FROM T_Dates
    )
    UPDATE r
    SET  r.DecrementDate = DecDates.DecrementDate
    FROM
       #Results r
    JOIN DecDates ON r.PolicyNumber = DecDates.PolicyNumber AND
           r.ValuationDate = DecDates.ValuationDate;


    In performance terms, this is roughly the same as the solution I posted earlier, despite having a cleaner-looking execution plan.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Tuesday, February 14, 2017 7:13 AM

    While the SELECT version of this query is appealingly succinct, the UPDATE version is less so. At least, my UPDATE version is:

    WITH T_Dates
    AS
    (
      SELECT
       r.PolicyNumber
      ,  r.ValuationDate
      ,  r.PolicyStatus
      ,  t_date = CASE
            WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
                      ORDER BY r.ValuationDate
                      ) = 'T' THEN
             NULL
            WHEN r.PolicyStatus = 'T' THEN
             r.ValuationDate
          END
      FROM #Results r
    )
    ,  DecDates
    AS
    (
      SELECT
       T_Dates.PolicyNumber
      ,  T_Dates.ValuationDate
      ,  T_Dates.PolicyStatus
      ,  DecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
                     ORDER BY T_Dates.ValuationDate
                     ROWS UNBOUNDED PRECEDING
                     )
      FROM T_Dates
    )
    UPDATE r
    SET  r.DecrementDate = DecDates.DecrementDate
    FROM
       #Results r
    JOIN DecDates ON r.PolicyNumber = DecDates.PolicyNumber AND
           r.ValuationDate = DecDates.ValuationDate;


    In performance terms, this is roughly the same as the solution I posted earlier, despite having a cleaner-looking execution plan.

    You can update through a CTE, you don't need the join back to the table at the end.  You're using different columns names than in the test code you posted, so I'm not certain this will execute, however I have used this approach many times.  Build a CTE that returns the field you want to update and a field with the new value, then just simply execute the update on the cte.  Just make sure you are only affecting columns from a single source table.
    WITH T_Dates
    AS
    (
      SELECT
       r.PolicyNumber
      ,  r.ValuationDate
      ,  r.PolicyStatus
      , r.DecrementDate
      ,  t_date = CASE
            WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
                       ORDER BY r.ValuationDate
                      ) = 'T' THEN
              NULL
            WHEN r.PolicyStatus = 'T' THEN
              r.ValuationDate
          END
      FROM #Results r
    )
    ,  DecDates
    AS
    (
      SELECT
       T_Dates.PolicyNumber
      ,  T_Dates.ValuationDate
      ,  T_Dates.PolicyStatus
      , T_Dates.DecrementDate
      ,  NewDecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
                     ORDER BY T_Dates.ValuationDate
                     ROWS UNBOUNDED PRECEDING
                      )
      FROM T_Dates
    )
    UPDATE DecDates
    SET  DecrementDate = NewDecrementDate

    Wes
    (A solid design is always preferable to a creative workaround)

  • whenriksen - Tuesday, February 14, 2017 7:42 AM

    You can update through a CTE, you don't need the join back to the table at the end.  You're using different columns names than in the test code you posted, so I'm not certain this will execute, however I have used this approach many times.  Build a CTE that returns the field you want to update and a field with the new value, then just simply execute the update on the cte.  Just make sure you are only affecting columns from a single source table.
    WITH T_Dates
    AS
    (
      SELECT
       r.PolicyNumber
      ,  r.ValuationDate
      ,  r.PolicyStatus
      , r.DecrementDate
      ,  t_date = CASE
            WHEN LAG(r.PolicyStatus) OVER (PARTITION BY r.PolicyNumber
                       ORDER BY r.ValuationDate
                      ) = 'T' THEN
              NULL
            WHEN r.PolicyStatus = 'T' THEN
              r.ValuationDate
          END
      FROM #Results r
    )
    ,  DecDates
    AS
    (
      SELECT
       T_Dates.PolicyNumber
      ,  T_Dates.ValuationDate
      ,  T_Dates.PolicyStatus
      , T_Dates.DecrementDate
      ,  NewDecrementDate = MAX(T_Dates.t_date) OVER (PARTITION BY T_Dates.PolicyNumber
                     ORDER BY T_Dates.ValuationDate
                     ROWS UNBOUNDED PRECEDING
                      )
      FROM T_Dates
    )
    UPDATE DecDates
    SET  DecrementDate = NewDecrementDate

    This version works well & produces a nice clean non-parallel plan. Thanks for helping (and reminding me about updateable CTEs).

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • So how long does the update through the CTE take for 1 million out of the 6.5 million rows now?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, February 14, 2017 9:12 AM

    So how long does the update through the CTE take for 1 million out of the 6.5 million rows now?

    I might know where this is going.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Jeff Moden - Tuesday, February 14, 2017 9:12 AM

    So how long does the update through the CTE take for 1 million out of the 6.5 million rows now?

    Takes around 38 seconds.
    Out of interest, if I add a WHERE DecDates.DecrementDate <> DecDates.NewDecrementDate filter to the update, it takes around 12 seconds on second and subsequent iterations.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply