HELP,TSQL, - How do I write the SELECT stmt to condense multiple time spans

  • My attempts failed. I need a SELECT STMT that will give me 2 rows as output. The 2 rows must have 2 spans as follows

    --Desired Output for member       20160701 - 20171231
    --                                                  20180105 - 29991231

    You know what i am trying here.. I am trying to condense nearby spans that are continuous together.  Hope this explains.. You may assume that the table has data for many members. But for sake of simplicity lets have data for just one buddy.

    If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;

    CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );

    INSERT INTO #DLT( emp, St, Et, RSN )
    VALUES
    ('0010970115','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20170201',    '20171130',    'Member Eligibility Change'),
    ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),

    ('0010970115','20180105',    '20180531',    'Member Eligibility Change'),
    ('0010970115','20180601',    '29991231',    'Member Eligibility Change')

    ;
    With C as
    (
    Select emp, St, Et ,
    LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt
    ,DENSE_RANK() OVER (PARTITION BY emp ORDER BY St) AS Grp2
    , DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF
    FROM
    #DLT
    )
    Select *
    ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC
    FROM
    C

    --Desired Output for member  20160701 - 20171231
    --                                20180105 - 29991231

  • There are 11 possible relationships between two intervals, requiring at least 22 rows to represent completely.  You have provided less than a quarter of the minimum to adequately test anything.

    Furthermore, you have another thread of this very subject where you posted just 59 minutes ago that "This is perfect".  Did you read the article that I linked to in that thread?  It walks through exactly what you are trying to do.

    This is just another example of you moving the goal posts.  Read the article that I mentioned above and post code that reflects the techniques in that article, and then I will help you if you are still having problems.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • mw_sql_developer - Tuesday, July 31, 2018 2:37 PM

    My attempts failed. I need a SELECT STMT that will give me 2 rows as output. The 2 rows must have 2 spans as follows

    --Desired Output for member       20160701 - 20171231
    --                                                  20180105 - 29991231

    You know what i am trying here.. I am trying to condense nearby spans that are continuous together.  Hope this explains.. You may assume that the table has data for many members. But for sake of simplicity lets have data for just one buddy.

    If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;

    CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );

    INSERT INTO #DLT( emp, St, Et, RSN )
    VALUES
    ('0010970115','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20170201',    '20171130',    'Member Eligibility Change'),
    ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),

    ('0010970115','20180105',    '20180531',    'Member Eligibility Change'),
    ('0010970115','20180601',    '29991231',    'Member Eligibility Change')

    ;
    With C as
    (
    Select emp, St, Et ,
    LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt
    ,DENSE_RANK() OVER (PARTITION BY emp ORDER BY St) AS Grp2
    , DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF
    FROM
    #DLT
    )
    Select *
    ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC
    FROM
    C

    --Desired Output for member  20160701 - 20171231
    --                                20180105 - 29991231

    Does this question have anything to do with this thread: https://www.sqlservercentral.com/Forums/1977582/SQL-Syntax-HELP-on-merging-enrollment-spans#bm1979663

  • A sample solution (not an efficient or "good" one but it works with the sample data you gave, but no promises it would work with "real" data, and I imagine there are better/faster ways to solve this) could be:

    IF OBJECT_ID('tempdb..#DLT') IS NOT NULL
    DROP TABLE [#DLT];

    CREATE TABLE [#DLT]
    (
      [emp] VARCHAR(13)
      , [St] VARCHAR(8)
      , [Et] VARCHAR(8)
      , [RSN] VARCHAR(100)
    );

    INSERT INTO [#DLT]
    (
      [emp]
      , [St]
      , [Et]
      , [RSN]
    )
    VALUES
    (
      '0010970115'
      , '20160701'
      , '20170131'
      , 'Member Eligibility Change'
    )
    , (
      '0010970115'
      , '20170201'
      , '20171130'
      , 'Member Eligibility Change'
      )
    , (
      '0010970115'
      , '20171201'
      , '20171231'
      , 'Member Eligibility Change'
      )
    , (
      '0010970115'
      , '20180105'
      , '20180531'
      , 'Member Eligibility Change'
      )
    , (
      '0010970115'
      , '20180601'
      , '29991231'
      , 'Member Eligibility Change'
      );
    WITH [C]
    AS ( SELECT
       [#DLT].[emp]
       , [#DLT].[St]
       , [#DLT].[Et]
       , DATEDIFF(
           D
           , [#DLT].[Et]
           , ( LEAD([#DLT].[St]) OVER ( PARTITION BY
                     [#DLT].[emp]
                     ORDER BY
                     [#DLT].[emp]
                     , [#DLT].[St]
                    )
            )
          )     AS [DIFF]
       , MIN([#DLT].[St]) OVER ( PARTITION BY
                [#DLT].[emp]
                ORDER BY
                [#DLT].[St]
               ) AS [minStart]
      FROM
       [#DLT] )
      , [D]
    AS ( SELECT
       [C].[emp]
       , [C].[St]
       , [C].[Et]
       , [C].[DIFF]
       , [C].[minStart]
       , CASE
        WHEN [C].[St] = [C].[minStart] THEN
         [C].[St]
        WHEN
         (
          [C].[DIFF] = 1
          AND LAG([C].[DIFF]) OVER ( PARTITION BY
                   [C].[emp]
                   ORDER BY
                   [C].[St]
                  ) > 1
         ) THEN
         [C].[Et]
        WHEN [C].[DIFF] = 1 THEN
         NULL
        WHEN [C].[DIFF] IS NULL THEN
         [C].[Et]
        ELSE
         [C].[Et]
        END AS [daterange]
      FROM
       [C] )
      , [E]
    AS ( SELECT
       [D].[emp]
       , [D].[daterange]
       , ROW_NUMBER() OVER ( ORDER BY
               [D].[emp]
              )    AS [rn]
       , LEAD([D].[daterange]) OVER ( ORDER BY
                  [D].[emp]
                 ) AS [nextDate]
      FROM
       [D]
      WHERE
       [D].[daterange] IS NOT NULL )
    SELECT
    [E].[emp]
    , [E].[daterange] + ' - ' + [E].[nextDate]
    FROM
    [E]
    WHERE
    [E].[rn] % 2 != 0;


    A lot of that is copied from the code you provided, I added the CTE D and E (which are both horribly named and I do not recommend using those names).  This feels like a specialized "gap and island" problem, so D is used to determine what is a gap and what is an island while E is used to strip out the "middle" parts of the islands and get the end date (next Date should be the end date).
    Now I am fairly confident that there are better more efficient ways to do this, but this method does work for the given data.  There is some data that will break it (like if a range resides entirely in 1 row), but with the given sample data, it works.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Lynn Pettis - Tuesday, July 31, 2018 3:38 PM

    mw_sql_developer - Tuesday, July 31, 2018 2:37 PM

    My attempts failed. I need a SELECT STMT that will give me 2 rows as output. The 2 rows must have 2 spans as follows

    --Desired Output for member       20160701 - 20171231
    --                                                  20180105 - 29991231

    You know what i am trying here.. I am trying to condense nearby spans that are continuous together.  Hope this explains.. You may assume that the table has data for many members. But for sake of simplicity lets have data for just one buddy.

    If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;

    CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );

    INSERT INTO #DLT( emp, St, Et, RSN )
    VALUES
    ('0010970115','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20170201',    '20171130',    'Member Eligibility Change'),
    ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),

    ('0010970115','20180105',    '20180531',    'Member Eligibility Change'),
    ('0010970115','20180601',    '29991231',    'Member Eligibility Change')

    ;
    With C as
    (
    Select emp, St, Et ,
    LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt
    ,DENSE_RANK() OVER (PARTITION BY emp ORDER BY St) AS Grp2
    , DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF
    FROM
    #DLT
    )
    Select *
    ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC
    FROM
    C

    --Desired Output for member  20160701 - 20171231
    --                                20180105 - 29991231

    Does this question have anything to do with this thread: https://www.sqlservercentral.com/Forums/1977582/SQL-Syntax-HELP-on-merging-enrollment-spans#bm1979663

    Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.

  • mw_sql_developer - Tuesday, July 31, 2018 6:53 PM

    Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.

    I think they are the same.  They both concerned with determining what makes a span a span. Knowing when a single span should be broken into multiple spans or vice versa are just different sides of the same coin.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Wednesday, August 1, 2018 8:23 AM

    mw_sql_developer - Tuesday, July 31, 2018 6:53 PM

    Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.

    I think they are the same.  They both concerned with determining what makes a span a span. Knowing when a single span should be broken into multiple spans or vice versa are just different sides of the same coin.

    Drew

    Drew: That is not true. Earlier we were taking multiple spans, looking at how they overlapped each other and came up with a new set of plans. So if you started with 3 spans you  can end up generating six spans. This is different. if you have six spans and if they are end to end , then you  condense it to one span.

    Similar but not the same..

  • mw_sql_developer - Wednesday, August 1, 2018 8:32 AM

    drew.allen - Wednesday, August 1, 2018 8:23 AM

    mw_sql_developer - Tuesday, July 31, 2018 6:53 PM

    Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.

    I think they are the same.  They both concerned with determining what makes a span a span. Knowing when a single span should be broken into multiple spans or vice versa are just different sides of the same coin.

    Drew

    Drew: That is not true. Earlier we were taking multiple spans, looking at how they overlapped each other and came up with a new set of plans. So if you started with 3 spans you  can end up generating six spans. This is different. if you have six spans and if they are end to end , then you  condense it to one span.

    Similar but not the same..

    Actually, very similar to each other.  Check the articles on this site regarding gaps and islands.

  • Lynn Pettis - Wednesday, August 1, 2018 8:40 AM

    mw_sql_developer - Wednesday, August 1, 2018 8:32 AM

    drew.allen - Wednesday, August 1, 2018 8:23 AM

    mw_sql_developer - Tuesday, July 31, 2018 6:53 PM

    Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.

    I think they are the same.  They both concerned with determining what makes a span a span. Knowing when a single span should be broken into multiple spans or vice versa are just different sides of the same coin.

    Drew

    Drew: That is not true. Earlier we were taking multiple spans, looking at how they overlapped each other and came up with a new set of plans. So if you started with 3 spans you  can end up generating six spans. This is different. if you have six spans and if they are end to end , then you  condense it to one span.

    Similar but not the same..

    Actually, very similar to each other.  Check the articles on this site regarding gaps and islands.

    Ok, I got it very close, I just need one little help writing the last SELECT stmt. I am sure it might just click you like that... Just run the SQL and you will know what I mean ...

    Good Luck ...


    If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;

    CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );

    INSERT INTO #DLT( emp, St, Et, RSN )
    VALUES
    ('0010970115','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20170201',    '20171130',    'Member Eligibility Change'),
    ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),
    ('0010970115','20180105',    '20180531',    'Member Eligibility Change'),
    ('0010970115','20180601',    '29991231',    'Member Eligibility Change')

    ;
    With Step1 as
    (
    Select emp, St, Et
    ,LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt
    ,ROW_NUMBER() OVER (PARTITION BY emp ORDER BY St) AS THE_ORDER
    ,DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF -- Gap in days
    FROM
    #DLT
    )
    ,
    Step2 as
    (
    Select *
    ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC
    ,LAG(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PreDiff
    ,LEAD(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PostDiff
    FROM
    Step1
    )
    ,
    Step3 as
    (
    Select
    *
    ,
    CASE WHEN (PreDiff is NULL ) OR (PreDiff <> 1 ) THEN
        'START_OF_SPAN'
    ELSE
        CASE WHEN DEC IS NULL THEN 'END_OF_SPAN' ELSE
            'YOU ARE INSIDE A SPAN'
        END
    END as FINAL_DECISION

    FROM Step2
    )
    Select
    *
    FROM Step2

  • Was my solution no good?  It seemed to work with the data you provided...

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • bmg002 - Wednesday, August 1, 2018 11:13 AM

    Was my solution no good?  It seemed to work with the data you provided...

    That was good, but now that I also learnt something I need to finish what i started, so some help will be great

  • mw_sql_developer - Wednesday, August 1, 2018 11:14 AM

    bmg002 - Wednesday, August 1, 2018 11:13 AM

    Was my solution no good?  It seemed to work with the data you provided...

    That was good, but now that I also learnt something I need to finish what i started, so some help will be great

    What was missing in that solution?  I don't believe it was the most optimal solution, but it did provide you the resulting data you were looking for from the provided data set.
    Was there something confusing in it or that you didn't understand why I did a thing?  If so, I can try to explain.  
    There are some things to watch out for with my solution though.  I would recommend building up all the possible edge cases that can come up and put them into your sample data.  For example, if a customer only has 1 row of data, my solution breaks.  Or if the "island" exists entirely in 1 row, the solution breaks.  Or if the start date and end date columns can be NULL it'll create problems.

    EDIT - whoops... just noticed my code had a typo... working on a fix... it should be taking the st value on the start of a new island not the et value.

    EDIT 2 - corrected code (also handles the bug I mentioned above):

    IF OBJECT_ID('tempdb..#DLT') IS NOT NULL
    DROP TABLE [#DLT];

    CREATE TABLE [#DLT]
    (
      [emp] VARCHAR(13)
      , [St] VARCHAR(8)
      , [Et] VARCHAR(8)
      , [RSN] VARCHAR(100)
    );

    INSERT INTO [#DLT]
    (
      [emp]
      , [St]
      , [Et]
      , [RSN]
    )
    VALUES
    (
      '0010970115'
      , '20160701'
      , '20170131'
      , 'Member Eligibility Change'
    )
    , (
      '0010970115'
      , '20170201'
      , '20171130'
      , 'Member Eligibility Change'
      )
    , (
      '0010970115'
      , '20171201'
      , '20171231'
      , 'Member Eligibility Change'
      )
    , (
      '0010970115'
      , '20180105'
      , '20180531'
      , 'Member Eligibility Change'
      )
    , (
      '0010970115'
      , '20180601'
      , '20991231'
      , 'Member Eligibility Change'
      )
    , (
      '0010970115'
      , '22030101'
      , '29991231'
      , 'Member Eligibility Change'
      );
    WITH [C]
    AS ( SELECT
       [#DLT].[emp]
       , [#DLT].[St]
       , [#DLT].[Et]
       , DATEDIFF(
           D
           , [#DLT].[Et]
           , ( LEAD([#DLT].[St]) OVER ( PARTITION BY
                     [#DLT].[emp]
                     ORDER BY
                     [#DLT].[emp]
                     , [#DLT].[St]
                    )
            )
          )     AS [DIFF]
       , MIN([#DLT].[St]) OVER ( PARTITION BY
                [#DLT].[emp]
                ORDER BY
                [#DLT].[St]
               ) AS [minStart]
      FROM
       [#DLT] )
      , [D]
    AS ( SELECT
       [C].[emp]
       , [C].[St]
       , [C].[Et]
       , [C].[DIFF]
       , [C].[minStart]
       , CASE
        WHEN [C].[St] = [C].[minStart] THEN
         [C].[St]
        WHEN
         (
          [C].[DIFF] = 1
          AND LAG([C].[DIFF]) OVER ( PARTITION BY
                   [C].[emp]
                   ORDER BY
                   [C].[St]
                  ) > 1
         ) THEN
         [C].[St]
        WHEN [C].[DIFF] = 1 THEN
         NULL
        WHEN [C].[DIFF] IS NULL THEN
         [C].[Et]
        ELSE
         [C].[Et]
        END AS [daterange]
      FROM
       [C] )
      , [E]
    AS ( SELECT
       [D].[St]
       , [D].[emp]
       , [D].[daterange]
       , ROW_NUMBER() OVER ( ORDER BY
               [D].[emp]
              )    AS [rn]
       , LEAD([D].[daterange]) OVER ( ORDER BY
                  [D].[emp]
                 ) AS [nextDate]
      FROM
       [D]
      WHERE
       [D].[daterange] IS NOT NULL )
    SELECT
    [E].[emp]
    , ISNULL(
        [E].[daterange] + ' - ' + [E].[nextDate]
        , [E].[St] + ' - ' + [E].[daterange]
        )
    FROM
    [E]
    WHERE
    [E].[rn] % 2 != 0;

    I adjusted the data set to show that it can handle a single row island and fixed the bug where it was picking the wrong date for the start of a new island.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Done! Finally!
    I tested against single spans as well... Works beautifully


    If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;

    CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );

    INSERT INTO #DLT( emp, St, Et, RSN )
    VALUES
    ('0010970116','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20170201',    '20171130',    'Member Eligibility Change'),
    ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),
    ('0010970115','20180105',    '20180531',    'Member Eligibility Change'),
    ('0010970115','20180601',    '29991231',    'Member Eligibility Change')

    ;
    With Step1 as
    (
    Select emp, St, Et
    ,LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt
    ,ROW_NUMBER() OVER (PARTITION BY emp ORDER BY St) AS THE_ORDER
    ,DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF -- Gap in days
    FROM
    #DLT
    )
    ,
    Step2 as
    (
    Select *
    ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC
    ,LAG(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PreDiff
    ,LEAD(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PostDiff
    FROM
    Step1
    )
    ,
    Step3 as
    (
            Select
            *
            ,
            CASE WHEN (PreDiff is NULL ) OR (PreDiff <> 1 ) THEN
                'START_OF_SPAN'
            ELSE
                CASE WHEN DEC IS NULL THEN 'END_OF_SPAN' ELSE
                    'YOU ARE INSIDE A SPAN'
                END
            END as FINAL_DECISION

            FROM Step2
    )
    ,
    Step4 as
    (
            Select
            *
            FROM Step3 WHERE FINAL_DECISION = 'START_OF_SPAN'
    )
    ,
    Step5 as
    (
            Select
            *
            FROM Step3 WHERE FINAL_DECISION = 'END_OF_SPAN'
    ),
    Step6 as
    (
            Select A.EMP, A.St,
            CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT
            FROM Step4 A
            OUTER APPLY
            (
                Select top 1 * FROM Step5
                WHERE
                emp = A.emp
                AND
                THE_ORDER >= A.THE_ORDER
                ORDER BY THE_ORDER
            )D

    )
    Select * FROM Step6

  • mw_sql_developer - Wednesday, August 1, 2018 11:57 AM

    Done! Finally!
    I tested against single spans as well... Works beautifully


    If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;

    CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );

    INSERT INTO #DLT( emp, St, Et, RSN )
    VALUES
    ('0010970116','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20170201',    '20171130',    'Member Eligibility Change'),
    ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),
    ('0010970115','20180105',    '20180531',    'Member Eligibility Change'),
    ('0010970115','20180601',    '29991231',    'Member Eligibility Change')

    ;
    With Step1 as
    (
    Select emp, St, Et
    ,LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt
    ,ROW_NUMBER() OVER (PARTITION BY emp ORDER BY St) AS THE_ORDER
    ,DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF -- Gap in days
    FROM
    #DLT
    )
    ,
    Step2 as
    (
    Select *
    ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC
    ,LAG(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PreDiff
    ,LEAD(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PostDiff
    FROM
    Step1
    )
    ,
    Step3 as
    (
            Select
            *
            ,
            CASE WHEN (PreDiff is NULL ) OR (PreDiff <> 1 ) THEN
                'START_OF_SPAN'
            ELSE
                CASE WHEN DEC IS NULL THEN 'END_OF_SPAN' ELSE
                    'YOU ARE INSIDE A SPAN'
                END
            END as FINAL_DECISION

            FROM Step2
    )
    ,
    Step4 as
    (
            Select
            *
            FROM Step3 WHERE FINAL_DECISION = 'START_OF_SPAN'
    )
    ,
    Step5 as
    (
            Select
            *
            FROM Step3 WHERE FINAL_DECISION = 'END_OF_SPAN'
    ),
    Step6 as
    (
            Select A.EMP, A.St,
            CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT
            FROM Step4 A
            OUTER APPLY
            (
                Select top 1 * FROM Step5
                WHERE
                emp = A.emp
                AND
                THE_ORDER >= A.THE_ORDER
                ORDER BY THE_ORDER
            )D

    )
    Select * FROM Step6

    Discovered a bug in my version of this - a single row customer doesn't work and my output was in the form "<start date> - <end date>" which is what I thought you wanted based on your original post.
    Looking through your code, there is a little bit of tidying up you could do, no?  Like are you using "PostDiff"?  and you could likely remove steps 4, 5 and 6 if you change your final select to:

      Select A.EMP, A.St,
       CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT
       FROM Step3 A
            
       OUTER APPLY
       (
        Select top 1 * FROM Step3 B
        WHERE
        emp = A.emp
        AND
        THE_ORDER >= A.THE_ORDER
                AND .[Final_decision] = 'END_OF_SPAN'
        ORDER BY THE_ORDER
       )D
            WHERE [A].[FINAL_DECISION] = 'START_OF_SPAN'

    But now it is starting to feel like semantics without adding much to this conversation.
    To make my code work with your new data set and give the output you have, CTE D needs to be changed to:

    [D]
    AS ( SELECT
       [C].[emp]
       , [C].[St]
       , [C].[Et]
       , [C].[DIFF]
       , [C].[minStart]
       , CASE
                WHEN ([C].[St] = [C].[minStart] AND DIFF IS NULL) THEN
                    [C].[Et]
        WHEN [C].[St] = [C].[minStart] THEN
         [C].[St]
        WHEN
         (
          [C].[DIFF] = 1
          AND LAG([C].[DIFF]) OVER ( PARTITION BY
                   [C].[emp]
                   ORDER BY
                   [C].[St]
                  ) > 1
         ) THEN
         [C].[St]
        WHEN [C].[DIFF] = 1 THEN
         NULL
        WHEN [C].[DIFF] IS NULL THEN
         [C].[Et]
        ELSE
         [C].[Et]
        END AS [daterange]
      FROM
       [C] )

    and the final select needs to be changed to:

    SELECT
    [E].[emp]
    , CASE
        WHEN nextDate IS NULL
        THEN [e].[St]
        ELSE
         [E].[daterange]
         END AS st
    , CASE WHEN nextDate IS NULL
    THEN [E].daterange
    ELSE
    [E].[nextdate]
    END AS enddt
    FROM
    [E]
    WHERE
    [E].[rn] % 2 != 0;

    Not trying to pick apart your code; just seeing ways to make it a bit shorter.  It just felt to me like the problem should be solvable with fewer CTEs.  Even my version with 3 (poorly named) CTE's feels a bit large for this problem...  Plus i have some unused columns in my CTE's.  In CTE D, I only need to pull in St, and emp plus the calculated stuff.  I don't need to pull in the other columns from C; they are just used for calculations.  So my version could be optimized a bit as well.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • bmg002 - Wednesday, August 1, 2018 12:51 PM

    mw_sql_developer - Wednesday, August 1, 2018 11:57 AM

    Done! Finally!
    I tested against single spans as well... Works beautifully


    If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;

    CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );

    INSERT INTO #DLT( emp, St, Et, RSN )
    VALUES
    ('0010970116','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20160701',    '20170131',    'Member Eligibility Change'),
    ('0010970115','20170201',    '20171130',    'Member Eligibility Change'),
    ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),
    ('0010970115','20180105',    '20180531',    'Member Eligibility Change'),
    ('0010970115','20180601',    '29991231',    'Member Eligibility Change')

    ;
    With Step1 as
    (
    Select emp, St, Et
    ,LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt
    ,ROW_NUMBER() OVER (PARTITION BY emp ORDER BY St) AS THE_ORDER
    ,DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF -- Gap in days
    FROM
    #DLT
    )
    ,
    Step2 as
    (
    Select *
    ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC
    ,LAG(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PreDiff
    ,LEAD(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PostDiff
    FROM
    Step1
    )
    ,
    Step3 as
    (
            Select
            *
            ,
            CASE WHEN (PreDiff is NULL ) OR (PreDiff <> 1 ) THEN
                'START_OF_SPAN'
            ELSE
                CASE WHEN DEC IS NULL THEN 'END_OF_SPAN' ELSE
                    'YOU ARE INSIDE A SPAN'
                END
            END as FINAL_DECISION

            FROM Step2
    )
    ,
    Step4 as
    (
            Select
            *
            FROM Step3 WHERE FINAL_DECISION = 'START_OF_SPAN'
    )
    ,
    Step5 as
    (
            Select
            *
            FROM Step3 WHERE FINAL_DECISION = 'END_OF_SPAN'
    ),
    Step6 as
    (
            Select A.EMP, A.St,
            CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT
            FROM Step4 A
            OUTER APPLY
            (
                Select top 1 * FROM Step5
                WHERE
                emp = A.emp
                AND
                THE_ORDER >= A.THE_ORDER
                ORDER BY THE_ORDER
            )D

    )
    Select * FROM Step6

    Discovered a bug in my version of this - a single row customer doesn't work and my output was in the form "<start date> - <end date>" which is what I thought you wanted based on your original post.
    Looking through your code, there is a little bit of tidying up you could do, no?  Like are you using "PostDiff"?  and you could likely remove steps 4, 5 and 6 if you change your final select to:

      Select A.EMP, A.St,
       CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT
       FROM Step3 A
            
       OUTER APPLY
       (
        Select top 1 * FROM Step3 B
        WHERE
        emp = A.emp
        AND
        THE_ORDER >= A.THE_ORDER
                AND .[Final_decision] = 'END_OF_SPAN'
        ORDER BY THE_ORDER
       )D
            WHERE [A].[FINAL_DECISION] = 'START_OF_SPAN'

    But now it is starting to feel like semantics without adding much to this conversation.
    To make my code work with your new data set and give the output you have, CTE D needs to be changed to:

    [D]
    AS ( SELECT
       [C].[emp]
       , [C].[St]
       , [C].[Et]
       , [C].[DIFF]
       , [C].[minStart]
       , CASE
                WHEN ([C].[St] = [C].[minStart] AND DIFF IS NULL) THEN
                    [C].[Et]
        WHEN [C].[St] = [C].[minStart] THEN
         [C].[St]
        WHEN
         (
          [C].[DIFF] = 1
          AND LAG([C].[DIFF]) OVER ( PARTITION BY
                   [C].[emp]
                   ORDER BY
                   [C].[St]
                  ) > 1
         ) THEN
         [C].[St]
        WHEN [C].[DIFF] = 1 THEN
         NULL
        WHEN [C].[DIFF] IS NULL THEN
         [C].[Et]
        ELSE
         [C].[Et]
        END AS [daterange]
      FROM
       [C] )

    and the final select needs to be changed to:

    SELECT
    [E].[emp]
    , CASE
        WHEN nextDate IS NULL
        THEN [e].[St]
        ELSE
         [E].[daterange]
         END AS st
    , CASE WHEN nextDate IS NULL
    THEN [E].daterange
    ELSE
    [E].[nextdate]
    END AS enddt
    FROM
    [E]
    WHERE
    [E].[rn] % 2 != 0;

    Not trying to pick apart your code; just seeing ways to make it a bit shorter.  It just felt to me like the problem should be solvable with fewer CTEs.  Even my version with 3 (poorly named) CTE's feels a bit large for this problem...  Plus i have some unused columns in my CTE's.  In CTE D, I only need to pull in St, and emp plus the calculated stuff.  I don't need to pull in the other columns from C; they are just used for calculations.  So my version could be optimized a bit as well.

    I purposely made it to have many CTEs. I could have done it in a few .. Just for clarity  I like to have many CTEs. BTW - It does handle single spans

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply