HELP,TSQL, - How do I write the SELECT stmt to condense multiple time spans

Question

HELP,TSQL, - How do I write the SELECT stmt to condense multiple time spans

mw_sql_developer

SSCoach

Points: 19445
More actions
July 31, 2018 at 2:37 pm

#374021

My attempts failed. I need a SELECT STMT that will give me 2 rows as output. The 2 rows must have 2 spans as follows
--Desired Output for member 20160701 - 20171231
-- 20180105 - 29991231
You know what i am trying here.. I am trying to condense nearby spans that are continuous together. Hope this explains.. You may assume that the table has data for many members. But for sake of simplicity lets have data for just one buddy.
If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;
CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );
INSERT INTO #DLT( emp, St, Et, RSN ) VALUES ('0010970115','20160701',    '20170131',    'Member Eligibility Change'), ('0010970115','20170201',    '20171130',    'Member Eligibility Change'), ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),
('0010970115','20180105',    '20180531',    'Member Eligibility Change'), ('0010970115','20180601',    '29991231',    'Member Eligibility Change')
; With C as ( Select emp, St, Et , LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt ,DENSE_RANK() OVER (PARTITION BY emp ORDER BY St) AS Grp2 , DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF FROM #DLT ) Select * ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC FROM C
--Desired Output for member 20160701 - 20171231 --                                20180105 - 29991231

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 76999 More actions · Answer 1

There are 11 possible relationships between two intervals, requiring at least 22 rows to represent completely. You have provided less than a quarter of the minimum to adequately test anything.

Furthermore, you have another thread of this very subject where you posted just 59 minutes ago that "This is perfect". Did you read the article that I linked to in that thread? It walks through exactly what you are trying to do.

This is just another example of you moving the goal posts. Read the article that I mentioned above and post code that reflects the techniques in that article, and then I will help you if you are still having problems.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

mw_sql_developer - Tuesday, July 31, 2018 2:37 PM
My attempts failed. I need a SELECT STMT that will give me 2 rows as output. The 2 rows must have 2 spans as follows
--Desired Output for member 20160701 - 20171231
-- 20180105 - 29991231
You know what i am trying here.. I am trying to condense nearby spans that are continuous together. Hope this explains.. You may assume that the table has data for many members. But for sake of simplicity lets have data for just one buddy.
If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;
CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );
INSERT INTO #DLT( emp, St, Et, RSN ) VALUES ('0010970115','20160701',    '20170131',    'Member Eligibility Change'), ('0010970115','20170201',    '20171130',    'Member Eligibility Change'), ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),
('0010970115','20180105',    '20180531',    'Member Eligibility Change'), ('0010970115','20180601',    '29991231',    'Member Eligibility Change')
; With C as ( Select emp, St, Et , LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt ,DENSE_RANK() OVER (PARTITION BY emp ORDER BY St) AS Grp2 , DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF FROM #DLT ) Select * ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC FROM C
--Desired Output for member 20160701 - 20171231 --                                20180105 - 29991231

Does this question have anything to do with this thread: https://www.sqlservercentral.com/Forums/1977582/SQL-Syntax-HELP-on-merging-enrollment-spans#bm1979663

Mr. Brian Gale SSC-Insane Points: 24800 More actions · Answer 3

A sample solution (not an efficient or "good" one but it works with the sample data you gave, but no promises it would work with "real" data, and I imagine there are better/faster ways to solve this) could be:
IF OBJECT_ID('tempdb..#DLT') IS NOT NULL DROP TABLE [#DLT];

CREATE TABLE [#DLT] ( [emp] VARCHAR(13) , [St] VARCHAR(8) , [Et] VARCHAR(8) , [RSN] VARCHAR(100) );

INSERT INTO [#DLT] ( [emp] , [St] , [Et] , [RSN] ) VALUES ( '0010970115' , '20160701' , '20170131' , 'Member Eligibility Change' ) , ( '0010970115' , '20170201' , '20171130' , 'Member Eligibility Change' ) , ( '0010970115' , '20171201' , '20171231' , 'Member Eligibility Change' ) , ( '0010970115' , '20180105' , '20180531' , 'Member Eligibility Change' ) , ( '0010970115' , '20180601' , '29991231' , 'Member Eligibility Change' ); WITH [C] AS ( SELECT [#DLT].[emp] , [#DLT].[St] , [#DLT].[Et] , DATEDIFF( D , [#DLT].[Et] , ( LEAD([#DLT].[St]) OVER ( PARTITION BY [#DLT].[emp] ORDER BY [#DLT].[emp] , [#DLT].[St] ) ) ) AS [DIFF] , MIN([#DLT].[St]) OVER ( PARTITION BY [#DLT].[emp] ORDER BY [#DLT].[St] ) AS [minStart] FROM [#DLT] ) , [D] AS ( SELECT [C].[emp] , [C].[St] , [C].[Et] , [C].[DIFF] , [C].[minStart] , CASE WHEN [C].[St] = [C].[minStart] THEN [C].[St] WHEN ( [C].[DIFF] = 1 AND LAG([C].[DIFF]) OVER ( PARTITION BY [C].[emp] ORDER BY [C].[St] ) > 1 ) THEN [C].[Et] WHEN [C].[DIFF] = 1 THEN NULL WHEN [C].[DIFF] IS NULL THEN [C].[Et] ELSE [C].[Et] END AS [daterange] FROM [C] ) , [E] AS ( SELECT [D].[emp] , [D].[daterange] , ROW_NUMBER() OVER ( ORDER BY [D].[emp] ) AS [rn] , LEAD([D].[daterange]) OVER ( ORDER BY [D].[emp] ) AS [nextDate] FROM [D] WHERE [D].[daterange] IS NOT NULL ) SELECT [E].[emp] , [E].[daterange] + ' - ' + [E].[nextDate] FROM [E] WHERE [E].[rn] % 2 != 0;

A lot of that is copied from the code you provided, I added the CTE D and E (which are both horribly named and I do not recommend using those names). This feels like a specialized "gap and island" problem, so D is used to determine what is a gap and what is an island while E is used to strip out the "middle" parts of the islands and get the end date (next Date should be the end date).
Now I am fairly confident that there are better more efficient ways to do this, but this method does work for the given data. There is some data that will break it (like if a range resides entirely in 1 row), but with the given sample data, it works.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

mw_sql_developer SSCoach Points: 19445 More actions · Answer 4

Lynn Pettis - Tuesday, July 31, 2018 3:38 PM
mw_sql_developer - Tuesday, July 31, 2018 2:37 PM
My attempts failed. I need a SELECT STMT that will give me 2 rows as output. The 2 rows must have 2 spans as follows
--Desired Output for member 20160701 - 20171231
-- 20180105 - 29991231
You know what i am trying here.. I am trying to condense nearby spans that are continuous together. Hope this explains.. You may assume that the table has data for many members. But for sake of simplicity lets have data for just one buddy.
If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;
CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );
INSERT INTO #DLT( emp, St, Et, RSN ) VALUES ('0010970115','20160701',    '20170131',    'Member Eligibility Change'), ('0010970115','20170201',    '20171130',    'Member Eligibility Change'), ('0010970115','20171201',    '20171231',    'Member Eligibility Change'),
('0010970115','20180105',    '20180531',    'Member Eligibility Change'), ('0010970115','20180601',    '29991231',    'Member Eligibility Change')
; With C as ( Select emp, St, Et , LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt ,DENSE_RANK() OVER (PARTITION BY emp ORDER BY St) AS Grp2 , DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF FROM #DLT ) Select * ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC FROM C
--Desired Output for member 20160701 - 20171231 --                                20180105 - 29991231
Does this question have anything to do with this thread: https://www.sqlservercentral.com/Forums/1977582/SQL-Syntax-HELP-on-merging-enrollment-spans#bm1979663

Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.

drew.allen SSC Guru Points: 76999 More actions · Answer 5

mw_sql_developer - Tuesday, July 31, 2018 6:53 PM
Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.

I think they are the same. They both concerned with determining what makes a span a span. Knowing when a single span should be broken into multiple spans or vice versa are just different sides of the same coin.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

mw_sql_developer SSCoach Points: 19445 More actions · Answer 6

drew.allen - Wednesday, August 1, 2018 8:23 AM
mw_sql_developer - Tuesday, July 31, 2018 6:53 PM
Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.
I think they are the same. They both concerned with determining what makes a span a span. Knowing when a single span should be broken into multiple spans or vice versa are just different sides of the same coin.
Drew

Drew: That is not true. Earlier we were taking multiple spans, looking at how they overlapped each other and came up with a new set of plans. So if you started with 3 spans you can end up generating six spans. This is different. if you have six spans and if they are end to end , then you condense it to one span.

Similar but not the same..

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

mw_sql_developer - Wednesday, August 1, 2018 8:32 AM
drew.allen - Wednesday, August 1, 2018 8:23 AM
mw_sql_developer - Tuesday, July 31, 2018 6:53 PM
Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.
I think they are the same. They both concerned with determining what makes a span a span. Knowing when a single span should be broken into multiple spans or vice versa are just different sides of the same coin.
Drew
Drew: That is not true. Earlier we were taking multiple spans, looking at how they overlapped each other and came up with a new set of plans. So if you started with 3 spans you can end up generating six spans. This is different. if you have six spans and if they are end to end , then you condense it to one span.
Similar but not the same..

Actually, very similar to each other. Check the articles on this site regarding gaps and islands.

mw_sql_developer SSCoach Points: 19445 More actions · Answer 8

Lynn Pettis - Wednesday, August 1, 2018 8:40 AM
mw_sql_developer - Wednesday, August 1, 2018 8:32 AM
drew.allen - Wednesday, August 1, 2018 8:23 AM
mw_sql_developer - Tuesday, July 31, 2018 6:53 PM
Lynn: No, THat was actually about creating new spans. Here I am trying to make one span using 3 spans ( similar ) So if you have spans that are end to end then we can condense them to one span.
I think they are the same. They both concerned with determining what makes a span a span. Knowing when a single span should be broken into multiple spans or vice versa are just different sides of the same coin.
Drew
Drew: That is not true. Earlier we were taking multiple spans, looking at how they overlapped each other and came up with a new set of plans. So if you started with 3 spans you can end up generating six spans. This is different. if you have six spans and if they are end to end , then you condense it to one span.
Similar but not the same..
Actually, very similar to each other. Check the articles on this site regarding gaps and islands.

Ok, I got it very close, I just need one little help writing the last SELECT stmt. I am sure it might just click you like that... Just run the SQL and you will know what I mean ...

Good Luck ...

If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;

CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );

INSERT INTO #DLT( emp, St, Et, RSN ) VALUES ('0010970115','20160701', '20170131', 'Member Eligibility Change'), ('0010970115','20170201', '20171130', 'Member Eligibility Change'), ('0010970115','20171201', '20171231', 'Member Eligibility Change'), ('0010970115','20180105', '20180531', 'Member Eligibility Change'), ('0010970115','20180601', '29991231', 'Member Eligibility Change')

; With Step1 as ( Select emp, St, Et ,LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt ,ROW_NUMBER() OVER (PARTITION BY emp ORDER BY St) AS THE_ORDER ,DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF -- Gap in days FROM #DLT ) , Step2 as ( Select * ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC ,LAG(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PreDiff ,LEAD(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PostDiff FROM Step1 ) , Step3 as ( Select * , CASE WHEN (PreDiff is NULL ) OR (PreDiff <> 1 ) THEN 'START_OF_SPAN' ELSE CASE WHEN DEC IS NULL THEN 'END_OF_SPAN' ELSE 'YOU ARE INSIDE A SPAN' END END as FINAL_DECISION

FROM Step2 ) Select * FROM Step2

Mr. Brian Gale SSC-Insane Points: 24800 More actions · Answer 9

Was my solution no good? It seemed to work with the data you provided...

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

mw_sql_developer SSCoach Points: 19445 More actions · Answer 10

bmg002 - Wednesday, August 1, 2018 11:13 AM
Was my solution no good? It seemed to work with the data you provided...

That was good, but now that I also learnt something I need to finish what i started, so some help will be great

Mr. Brian Gale SSC-Insane Points: 24800 More actions · Answer 11

mw_sql_developer - Wednesday, August 1, 2018 11:14 AM
bmg002 - Wednesday, August 1, 2018 11:13 AM
Was my solution no good? It seemed to work with the data you provided...
That was good, but now that I also learnt something I need to finish what i started, so some help will be great

What was missing in that solution? I don't believe it was the most optimal solution, but it did provide you the resulting data you were looking for from the provided data set.
Was there something confusing in it or that you didn't understand why I did a thing? If so, I can try to explain.
There are some things to watch out for with my solution though. I would recommend building up all the possible edge cases that can come up and put them into your sample data. For example, if a customer only has 1 row of data, my solution breaks. Or if the "island" exists entirely in 1 row, the solution breaks. Or if the start date and end date columns can be NULL it'll create problems.

EDIT - whoops... just noticed my code had a typo... working on a fix... it should be taking the st value on the start of a new island not the et value.

EDIT 2 - corrected code (also handles the bug I mentioned above):
IF OBJECT_ID('tempdb..#DLT') IS NOT NULL DROP TABLE [#DLT];

CREATE TABLE [#DLT] ( [emp] VARCHAR(13) , [St] VARCHAR(8) , [Et] VARCHAR(8) , [RSN] VARCHAR(100) );

INSERT INTO [#DLT] ( [emp] , [St] , [Et] , [RSN] ) VALUES ( '0010970115' , '20160701' , '20170131' , 'Member Eligibility Change' ) , ( '0010970115' , '20170201' , '20171130' , 'Member Eligibility Change' ) , ( '0010970115' , '20171201' , '20171231' , 'Member Eligibility Change' ) , ( '0010970115' , '20180105' , '20180531' , 'Member Eligibility Change' ) , ( '0010970115' , '20180601' , '20991231' , 'Member Eligibility Change' ) , ( '0010970115' , '22030101' , '29991231' , 'Member Eligibility Change' ); WITH [C] AS ( SELECT [#DLT].[emp] , [#DLT].[St] , [#DLT].[Et] , DATEDIFF( D , [#DLT].[Et] , ( LEAD([#DLT].[St]) OVER ( PARTITION BY [#DLT].[emp] ORDER BY [#DLT].[emp] , [#DLT].[St] ) ) ) AS [DIFF] , MIN([#DLT].[St]) OVER ( PARTITION BY [#DLT].[emp] ORDER BY [#DLT].[St] ) AS [minStart] FROM [#DLT] ) , [D] AS ( SELECT [C].[emp] , [C].[St] , [C].[Et] , [C].[DIFF] , [C].[minStart] , CASE WHEN [C].[St] = [C].[minStart] THEN [C].[St] WHEN ( [C].[DIFF] = 1 AND LAG([C].[DIFF]) OVER ( PARTITION BY [C].[emp] ORDER BY [C].[St] ) > 1 ) THEN [C].[St] WHEN [C].[DIFF] = 1 THEN NULL WHEN [C].[DIFF] IS NULL THEN [C].[Et] ELSE [C].[Et] END AS [daterange] FROM [C] ) , [E] AS ( SELECT [D].[St] , [D].[emp] , [D].[daterange] , ROW_NUMBER() OVER ( ORDER BY [D].[emp] ) AS [rn] , LEAD([D].[daterange]) OVER ( ORDER BY [D].[emp] ) AS [nextDate] FROM [D] WHERE [D].[daterange] IS NOT NULL ) SELECT [E].[emp] , ISNULL( [E].[daterange] + ' - ' + [E].[nextDate] , [E].[St] + ' - ' + [E].[daterange] ) FROM [E] WHERE [E].[rn] % 2 != 0;

I adjusted the data set to show that it can handle a single row island and fixed the bug where it was picking the wrong date for the start of a new island.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

mw_sql_developer SSCoach Points: 19445 More actions · Answer 12

Done! Finally!
I tested against single spans as well... Works beautifully

If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;

CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );

INSERT INTO #DLT( emp, St, Et, RSN ) VALUES ('0010970116','20160701', '20170131', 'Member Eligibility Change'), ('0010970115','20160701', '20170131', 'Member Eligibility Change'), ('0010970115','20170201', '20171130', 'Member Eligibility Change'), ('0010970115','20171201', '20171231', 'Member Eligibility Change'), ('0010970115','20180105', '20180531', 'Member Eligibility Change'), ('0010970115','20180601', '29991231', 'Member Eligibility Change')

; With Step1 as ( Select emp, St, Et ,LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt ,ROW_NUMBER() OVER (PARTITION BY emp ORDER BY St) AS THE_ORDER ,DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF -- Gap in days FROM #DLT ) , Step2 as ( Select * ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC ,LAG(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PreDiff ,LEAD(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PostDiff FROM Step1 ) , Step3 as ( Select * , CASE WHEN (PreDiff is NULL ) OR (PreDiff <> 1 ) THEN 'START_OF_SPAN' ELSE CASE WHEN DEC IS NULL THEN 'END_OF_SPAN' ELSE 'YOU ARE INSIDE A SPAN' END END as FINAL_DECISION

FROM Step2 ) , Step4 as ( Select * FROM Step3 WHERE FINAL_DECISION = 'START_OF_SPAN' ) , Step5 as ( Select * FROM Step3 WHERE FINAL_DECISION = 'END_OF_SPAN' ), Step6 as ( Select A.EMP, A.St, CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT FROM Step4 A OUTER APPLY ( Select top 1 * FROM Step5 WHERE emp = A.emp AND THE_ORDER >= A.THE_ORDER ORDER BY THE_ORDER )D

) Select * FROM Step6

Mr. Brian Gale SSC-Insane Points: 24800 More actions · Answer 13

mw_sql_developer - Wednesday, August 1, 2018 11:57 AM
Done! Finally!
I tested against single spans as well... Works beautifully
If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;
CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );
INSERT INTO #DLT( emp, St, Et, RSN ) VALUES ('0010970116','20160701',    '20170131',    'Member Eligibility Change'), ('0010970115','20160701',    '20170131',    'Member Eligibility Change'), ('0010970115','20170201',    '20171130',    'Member Eligibility Change'), ('0010970115','20171201',    '20171231',    'Member Eligibility Change'), ('0010970115','20180105',    '20180531',    'Member Eligibility Change'), ('0010970115','20180601',    '29991231',    'Member Eligibility Change')
; With Step1 as ( Select emp, St, Et ,LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt ,ROW_NUMBER() OVER (PARTITION BY emp ORDER BY St) AS THE_ORDER ,DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF -- Gap in days FROM #DLT ) , Step2 as ( Select * ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC ,LAG(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PreDiff ,LEAD(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PostDiff FROM Step1 ) , Step3 as (         Select         *         ,         CASE WHEN (PreDiff is NULL ) OR (PreDiff <> 1 ) THEN             'START_OF_SPAN'         ELSE             CASE WHEN DEC IS NULL THEN 'END_OF_SPAN' ELSE                 'YOU ARE INSIDE A SPAN'             END         END as FINAL_DECISION
        FROM Step2 ) , Step4 as (         Select         *         FROM Step3 WHERE FINAL_DECISION = 'START_OF_SPAN' ) , Step5 as (         Select         *         FROM Step3 WHERE FINAL_DECISION = 'END_OF_SPAN' ), Step6 as (         Select A.EMP, A.St,         CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT         FROM Step4 A         OUTER APPLY         (             Select top 1 * FROM Step5             WHERE             emp = A.emp             AND             THE_ORDER >= A.THE_ORDER             ORDER BY THE_ORDER         )D
) Select * FROM Step6

Discovered a bug in my version of this - a single row customer doesn't work and my output was in the form "<start date> - <end date>" which is what I thought you wanted based on your original post.
Looking through your code, there is a little bit of tidying up you could do, no? Like are you using "PostDiff"? and you could likely remove steps 4, 5 and 6 if you change your final select to:
Select A.EMP, A.St, CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT FROM Step3 A OUTER APPLY ( Select top 1 * FROM Step3 B WHERE emp = A.emp AND THE_ORDER >= A.THE_ORDER AND .[Final_decision] = 'END_OF_SPAN' ORDER BY THE_ORDER )D WHERE [A].[FINAL_DECISION] = 'START_OF_SPAN'
But now it is starting to feel like semantics without adding much to this conversation.
To make my code work with your new data set and give the output you have, CTE D needs to be changed to:
[D] AS ( SELECT [C].[emp] , [C].[St] , [C].[Et] , [C].[DIFF] , [C].[minStart] , CASE WHEN ([C].[St] = [C].[minStart] AND DIFF IS NULL) THEN [C].[Et] WHEN [C].[St] = [C].[minStart] THEN [C].[St] WHEN ( [C].[DIFF] = 1 AND LAG([C].[DIFF]) OVER ( PARTITION BY [C].[emp] ORDER BY [C].[St] ) > 1 ) THEN [C].[St] WHEN [C].[DIFF] = 1 THEN NULL WHEN [C].[DIFF] IS NULL THEN [C].[Et] ELSE [C].[Et] END AS [daterange] FROM [C] )

and the final select needs to be changed to:
SELECT [E].[emp] , CASE WHEN nextDate IS NULL THEN [e].[St] ELSE [E].[daterange] END AS st , CASE WHEN nextDate IS NULL THEN [E].daterange ELSE [E].[nextdate] END AS enddt FROM [E] WHERE [E].[rn] % 2 != 0;

Not trying to pick apart your code; just seeing ways to make it a bit shorter. It just felt to me like the problem should be solvable with fewer CTEs. Even my version with 3 (poorly named) CTE's feels a bit large for this problem... Plus i have some unused columns in my CTE's. In CTE D, I only need to pull in St, and emp plus the calculated stuff. I don't need to pull in the other columns from C; they are just used for calculations. So my version could be optimized a bit as well.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

mw_sql_developer SSCoach Points: 19445 More actions · Answer 14

bmg002 - Wednesday, August 1, 2018 12:51 PM
mw_sql_developer - Wednesday, August 1, 2018 11:57 AM
Done! Finally!
I tested against single spans as well... Works beautifully
If object_id('tempdb..#DLT') IS NOT NULL DROP TABLE #DLT;
CREATE TABLE #DLT( emp VARCHAR(13), St VARCHAR(8), Et VARCHAR(8), RSN VARCHAR(100) );
INSERT INTO #DLT( emp, St, Et, RSN ) VALUES ('0010970116','20160701',    '20170131',    'Member Eligibility Change'), ('0010970115','20160701',    '20170131',    'Member Eligibility Change'), ('0010970115','20170201',    '20171130',    'Member Eligibility Change'), ('0010970115','20171201',    '20171231',    'Member Eligibility Change'), ('0010970115','20180105',    '20180531',    'Member Eligibility Change'), ('0010970115','20180601',    '29991231',    'Member Eligibility Change')
; With Step1 as ( Select emp, St, Et ,LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St) AS NextSt ,ROW_NUMBER() OVER (PARTITION BY emp ORDER BY St) AS THE_ORDER ,DATEDIFF(D, Et , (LEAD(St) OVER(PARTITION BY Emp ORDER BY emp, St)) ) as DIFF -- Gap in days FROM #DLT ) , Step2 as ( Select * ,CASE WHEN DATEADD(D, 1, CAST( ET as DATETIME)) = CAST( NEXTST as DATETIME) THEN 'OK' ELSE NULL END as DEC ,LAG(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PreDiff ,LEAD(DIFF) OVER(PARTITION BY Emp ORDER BY emp, St) AS PostDiff FROM Step1 ) , Step3 as (         Select         *         ,         CASE WHEN (PreDiff is NULL ) OR (PreDiff <> 1 ) THEN             'START_OF_SPAN'         ELSE             CASE WHEN DEC IS NULL THEN 'END_OF_SPAN' ELSE                 'YOU ARE INSIDE A SPAN'             END         END as FINAL_DECISION
        FROM Step2 ) , Step4 as (         Select         *         FROM Step3 WHERE FINAL_DECISION = 'START_OF_SPAN' ) , Step5 as (         Select         *         FROM Step3 WHERE FINAL_DECISION = 'END_OF_SPAN' ), Step6 as (         Select A.EMP, A.St,         CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT         FROM Step4 A         OUTER APPLY         (             Select top 1 * FROM Step5             WHERE             emp = A.emp             AND             THE_ORDER >= A.THE_ORDER             ORDER BY THE_ORDER         )D
) Select * FROM Step6
Discovered a bug in my version of this - a single row customer doesn't work and my output was in the form "<start date> - <end date>" which is what I thought you wanted based on your original post.
Looking through your code, there is a little bit of tidying up you could do, no? Like are you using "PostDiff"? and you could likely remove steps 4, 5 and 6 if you change your final select to:
Select A.EMP, A.St,    CASE WHEN D.Et IS NULL THEN A.ET ELSE D.Et END as ENDDT    FROM Step3 A             OUTER APPLY    (     Select top 1 * FROM Step3 B     WHERE     emp = A.emp     AND     THE_ORDER >= A.THE_ORDER             AND .[Final_decision] = 'END_OF_SPAN'     ORDER BY THE_ORDER    )D         WHERE [A].[FINAL_DECISION] = 'START_OF_SPAN'
But now it is starting to feel like semantics without adding much to this conversation.
To make my code work with your new data set and give the output you have, CTE D needs to be changed to:
[D] AS ( SELECT    [C].[emp]    , [C].[St]    , [C].[Et]    , [C].[DIFF]    , [C].[minStart]    , CASE             WHEN ([C].[St] = [C].[minStart] AND DIFF IS NULL) THEN                 [C].[Et]     WHEN [C].[St] = [C].[minStart] THEN      [C].[St]     WHEN      (       [C].[DIFF] = 1       AND LAG([C].[DIFF]) OVER ( PARTITION BY                [C].[emp]                ORDER BY                [C].[St]               ) > 1      ) THEN      [C].[St]     WHEN [C].[DIFF] = 1 THEN      NULL     WHEN [C].[DIFF] IS NULL THEN      [C].[Et]     ELSE      [C].[Et]     END AS [daterange] FROM    [C] )
and the final select needs to be changed to:
SELECT [E].[emp] , CASE     WHEN nextDate IS NULL     THEN [e].[St]     ELSE      [E].[daterange]      END AS st , CASE WHEN nextDate IS NULL THEN [E].daterange ELSE [E].[nextdate] END AS enddt FROM [E] WHERE [E].[rn] % 2 != 0;
Not trying to pick apart your code; just seeing ways to make it a bit shorter. It just felt to me like the problem should be solvable with fewer CTEs. Even my version with 3 (poorly named) CTE's feels a bit large for this problem... Plus i have some unused columns in my CTE's. In CTE D, I only need to pull in St, and emp plus the calculated stuff. I don't need to pull in the other columns from C; they are just used for calculations. So my version could be optimized a bit as well.

I purposely made it to have many CTEs. I could have done it in a few .. Just for clarity I like to have many CTEs. BTW - It does handle single spans