Is there a way to combine/merge multiple continuous rows by date into a single row without using a loop/cursor?

Question

Post reply

Is there a way to combine/merge multiple continuous rows by date into a single row without using a loop/cursor?

dawalker-1068762

Say Hey Kid

Points: 707
More actions
March 25, 2011 at 8:57 am

#99284

I work with a lot of data that is based on membership enrollment segments. Each segment has an effective and term date and in many cases one starts right after the next. One of the things I have to do is put together a table of continuous enrollments by member. I already have code that works fine using a loop, but I was wondering if there was another way to handle it that I'm overlooking. I just have not been able to come up with a way to do it with a single query. I was thinking that it might be possible to use a common table expression with recursion, but have been unsuccessful in my attempts. Below is a sample of the data for one member. If you look closely, you will see there is a gap in enrollment between the 5th and 6th segment so obviously you can't use a simple group by with a min and max to get the job done. If you have a solution for this type of query I would really like to see it. My goal is to create a view that returns what I want.
[font="Courier New"]
Member_ID Eff_Date Term_Date Sequence_ID
--------- ---------- ---------- --------------------
123456789 2005-01-01 2006-01-31 1
123456789 2006-02-01 2007-02-14 2
123456789 2007-02-15 2007-05-11 3
123456789 2007-05-12 2008-02-28 4
123456789 2008-02-29 2008-12-31 5
123456789 2009-11-01 2009-12-31 6
123456789 2010-01-01 2010-08-11 7
123456789 2010-08-12 2017-12-31 8
[/font]
The end result I'm looking for would be two records for this member like this.
[font="Courier New"]
Member_ID Eff_Date Term_Date Sequence_ID
--------- ---------- ---------- --------------------
123456789 2005-01-01 2008-12-31 1
123456789 2009-11-01 2017-12-31 2
[/font]

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Search this site for "gaps and islands" - that should get you started.

John

dawalker-1068762 Say Hey Kid Points: 707 More actions · Answer 2

Thanks, I'll take a look at that. I seem to have found a solution using a series of CTEs that build upon each other. I did it by identifying the gaps and then figuring out which gap date (if any) is closest to each segment. Basically, once you link each segment with a gap then you can group by that and take the min effective date.

Mark Cowne One Orange Chip Points: 26990 More actions · Answer 3

Try this

SELECT s1.Member_ID,

s1.Eff_Date,

MIN(t1.Term_Date) AS Term_Date,

ROW_NUMBER() OVER(ORDER BY s1.Eff_Date) AS Sequence_ID

FROM MyTable s1

INNER JOIN MyTable t1 ON t1.Member_ID=s1.Member_ID

AND s1.Eff_Date <= t1.Term_Date

AND NOT EXISTS(SELECT * FROM MyTable t2

WHERE t2.Member_ID=t1.Member_ID

AND (t1.Term_Date+1) >= t2.Eff_Date AND t1.Term_Date < t2.Term_Date)

WHERE NOT EXISTS(SELECT * FROM MyTable s2

WHERE s2.Member_ID=s1.Member_ID

AND s1.Eff_Date > s2.Eff_Date AND (s1.Eff_Date-1) <= s2.Term_Date)

GROUP BY s1.Member_ID,s1.Eff_Date

ORDER BY s1.Member_ID,s1.Eff_Date;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

channa wijesinghe Old Hand Points: 347 More actions · Answer 4

channa wijesinghe

Old Hand

Points: 347

July 26, 2011 at 9:50 pm

#1358935

awesome solution!

dawalker-1068762 Say Hey Kid Points: 707 More actions · Answer 5

I never did post my solution so here it is. Again, this was an attempt to do this with a single select statement. It uses CTEs much like you would with a series of steps and temp tables. You can achieve probably achieve better performance in a procedure or function, but this does work. Another thing I did not mention in my original post is that the same member can have multiple records that overlap with one another. The solution below takes care of all these issues. I did not test the other code that was posted, however I don't see how it could handle all of those scenarios. Thanks to everyone who chimed in.

;WITH cteAnchor AS(

--This provides our anchor per member. We know that any changes

--in enrollment must have occurred between those dates. The anchor

--is important and can be used to easily change the logic in case

--you are only interested in certain date ranges.

SELECT

e.Member_ID

, Eff_Date = min(e.Eff_Date)

, Term_Date = max(e.Term_Date)

FROM dbo.tbl_Member_Enrollment e

GROUP BY e.Member_ID

)

, cteChanges AS(

--This step establishes all the segment change dates per member

--that occurred during the anchor period. The term date being

--returned will alway be from the anchor record. The actual term

--date for each segment will be determined in the next step.

--A change point is defined as any new effective date during the

--anchor period. We use term dates to establish new effective

--dates as well by adding a day to it. This gives us the ability

--to establish all of the segments we need.

SELECT

dt.Member_ID

, dt.Eff_Date

, dt.Term_Date

, Sequence_ID = ROW_NUMBER() OVER(PARTITION BY dt.Member_ID ORDER BY dt.Eff_Date)

FROM

(--derived table

SELECT

a.Member_ID

, e.Eff_Date

, a.Term_Date

FROM

cteAnchor a

INNER JOIN dbo.tbl_Member_Enrollment e

ON e.Member_ID = a.Member_ID

AND e.Eff_Date BETWEEN a.Eff_Date AND a.Term_Date

--==============

UNION

--==============

SELECT

a.Member_ID

, Eff_Date = dateadd(day, 1, e.Term_Date)

, a.Term_Date

FROM

cteAnchor a

INNER JOIN dbo.tbl_Member_Enrollment e

ON e.Member_ID = a.Member_ID

AND e.Term_Date BETWEEN a.Eff_Date AND a.Term_Date

AND e.Term_Date < a.Term_Date

) dt

)

, cteSegments AS(

--This step establishes all of the continuous non-overlapping segments

--of the anchor period. It uses the sequence ID to look at the next record

--to establish the term date by subtracting one day from the effective

--date of the next segment. This ensures there are no overlapping records.

SELECT

c1.Member_ID

, c1.Eff_Date

, Term_Date = (

CASE

WHEN c2.Member_ID IS NULL THEN

c1.Term_Date

ELSE

dateadd(day, -1, c2.Eff_Date)

END

)

, c1.Sequence_ID

, Enrollment_Count = (

SELECT count(*)

FROM dbo.tbl_Member_Enrollment e

WHERE

e.Member_ID = c1.Member_ID

AND c1.Eff_Date BETWEEN e.Eff_Date AND e.Term_Date

)

FROM

cteChanges c1

LEFT OUTER JOIN cteChanges c2

ON c2.Member_ID = c1.Member_ID

AND c2.Sequence_ID = c1.Sequence_ID + 1 --join with the next record

)

, cteContinuousChanges AS(

--This step establishes all the segment gap dates per member

--that occurred during the anchor period. The term date being

--returned will always be from the anchor record. The actual term

--date for each segment will be determined in the next step.

SELECT

dt.Member_ID

, dt.Eff_Date

, dt.Term_Date

, Sequence_ID = ROW_NUMBER() OVER(PARTITION BY dt.Member_ID ORDER BY dt.Eff_Date)

FROM

(--derived table

SELECT

a.Member_ID

, a.Eff_Date

, a.Term_Date

FROM cteAnchor a

--==============

UNION

--==============

SELECT

a.Member_ID

, s.Eff_Date

, a.Term_Date

FROM

cteAnchor a

INNER JOIN cteSegments s

ON s.Member_ID = a.Member_ID

AND s.Enrollment_Count = 0

--==============

UNION

--==============

SELECT

a.Member_ID

, Eff_Date = dateadd(day, 1, s.Term_Date)

, a.Term_Date

FROM

cteAnchor a

INNER JOIN cteSegments s

ON s.Member_ID = a.Member_ID

AND s.Enrollment_Count = 0

AND s.Term_Date BETWEEN a.Eff_Date AND a.Term_Date

AND s.Term_Date < a.Term_Date

) dt

)

, cteContinuousSegments AS(

--This step establishes all of the continuous non-overlapping segments

--of the anchor period using the gaps above. It uses the sequence ID

--to look at the next record to establish the term date by subtracting

--one day from the effective date of the next segment. This ensures there

--are no overlapping records.

SELECT

c1.Member_ID

, c1.Eff_Date

, Term_Date = (

CASE

WHEN c2.Member_ID IS NULL THEN

c1.Term_Date

ELSE

dateadd(day, -1, c2.Eff_Date)

END

)

, c1.Sequence_ID

, s.Enrollment_Count

FROM

cteContinuousChanges c1

LEFT OUTER JOIN cteContinuousChanges c2

ON c2.Member_ID = c1.Member_ID

AND c2.Sequence_ID = c1.Sequence_ID + 1 --join with the next record

INNER JOIN cteSegments s

ON s.Member_ID = c1.Member_ID

AND c1.Eff_Date BETWEEN s.Eff_Date AND s.Term_Date

)

SELECT *

FROM cteContinuousSegments cs

channa wijesinghe Old Hand Points: 347 More actions · Answer 6

In my problem I had a Queue of work to be performed by an SSIS package, we wanted an efficient way to group these request to lessen the burden on the ssis package (basically we didn't want it to work on duplicate requests)

CampaignIDStartDateEndDate

16177/22/2011 00:00:007/24/2011 00:00:00

16177/25/2011 00:00:007/25/2011 00:00:00

16177/22/2011 00:00:007/24/2011 00:00:00

16177/25/2011 00:00:007/25/2011 00:00:00

16177/23/2011 00:00:007/23/2011 00:00:00

16177/25/2011 00:00:007/25/2011 00:00:00

16177/22/2011 00:00:007/24/2011 00:00:00

16177/25/2011 00:00:007/25/2011 00:00:00

16177/24/2011 00:00:007/24/2011 00:00:00

16177/25/2011 00:00:007/25/2011 00:00:00

Using the solution this is what I see:

16172011-07-22 00:00:00.0002011-07-25 00:00:00.000

which is perfect. Given a Group By gets me almost there. @DaWalker thanks for the explanation around your solution.

dawalker-1068762 Say Hey Kid Points: 707 More actions · Answer 7

Glad you could use it. Unfortunately, I had to come up with it on my own because nothing else I found worked the way I needed it to. It was definitely one of the more challenging queries I've had to write. I'm constantly having to deal with date segments in my job so this logic is going to have a lot of uses for us especially in our new data warehouse. I verified it against the craziest set of records I could find in our system and it worked perfectly. If I ever improve upon it I will repost any updates. It is actually a very flexible query because you can easily return only the gaps or the covered periods just by changing the where clause on the final select. Considering what it does, it actually performs pretty well in our system. My stored procedure that uses a loop to do it is still faster though. I think that's mainly because I'm using several temp tables that I index and I only have to loop enough times to cover that max sequence ID.

SQLNewbie2019 Newbie Points: 7 More actions · Answer 8

Mark Cowne - Friday, March 25, 2011 2:03 PM
Try thisSELECT s1.Member_ID, s1.Eff_Date, MIN(t1.Term_Date) AS Term_Date, ROW_NUMBER() OVER(ORDER BY s1.Eff_Date) AS Sequence_IDFROM MyTable s1 INNER JOIN MyTable t1 ON t1.Member_ID=s1.Member_ID AND s1.Eff_Date <= t1.Term_Date AND NOT EXISTS(SELECT * FROM MyTable t2 WHERE t2.Member_ID=t1.Member_ID AND (t1.Term_Date+1) >= t2.Eff_Date AND t1.Term_Date < t2.Term_Date) WHERE NOT EXISTS(SELECT * FROM MyTable s2 WHERE s2.Member_ID=s1.Member_ID AND s1.Eff_Date > s2.Eff_Date AND (s1.Eff_Date-1) <= s2.Term_Date) GROUP BY s1.Member_ID,s1.Eff_DateORDER BY s1.Member_ID,s1.Eff_Date;

Thanks @MarkCowne your solution worked like a charm after 8 years for me 🙂
Would be great if you or someone here could explain this solution as I am not able to get my head around this approach.
Thanks in advance

srinivas.evuri.reddy Grasshopper Points: 20 More actions · Answer 9

Hi,

I have recently come across this problem, but none of the solutions posted here convinced me as I have millions of records to process and I believe the solutions present here couldn't scale (may be I was wrong in my assumption) and also I am looking for a simple code.

I tried different options, but nothing did match my expectation. But, after working on this problem for couple of days, finally found a solution when tried to solve this as a logical puzzle and more than 13 million rows have been processed under less than 40 seconds.

Thought of posting this solutions as it would help someone who is facing this issue.

Share your thoughts.

IF OBJECT_ID('TEMPDB..#TEST') IS NOT NULL
DROP TABLE #TEST
CREATE TABLE #TEST(ID INT, FromDate DATETIME, ToDate DATETIME) 
GO

INSERT #TEST
VALUES
(1, '01-01-2018', '01-10-2018'),
(1, '01-11-2018', '01-25-2018'),
(1, '01-26-2018', '01-28-2018'),
(1, '01-29-2018', '01-30-2018'),
(1, '02-01-2018', '02-10-2018'),
(1, '02-11-2018', '02-20-2018'),
(1, '02-21-2018', '02-24-2018'),
(1, '03-01-2018', '03-10-2018'),
(1, '03-11-2018', '03-20-2018'),
(2, '01-01-2019', '01-28-2019'),
(2, '01-29-2019', '01-31-2019')

GO


SELECT ID, FromDate, ToDate
FROM
(
SELECT ID,
CASE 
WHEN FromDate1 IS NOT NULL THEN FromDate1
ELSE LAG(FromDate1, 1) OVER(PARTITION BY ID ORDER BY RowNum) END FromDate, 
ToDate1 AS ToDate
FROM
(
SELECT  ID, FromDate1, ToDate1, RowNum
FROM
(
SELECT ID, FromDate, ToDate, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY FromDate,ToDate) RowNum,
CASE 
WHEN LAG(ToDate,1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate) IS NOT NULL 
AND DATEDIFF(D, LAG(ToDate, 1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate),FromDate) <= 1 THEN NULL 
ELSE FromDate 
END FromDate1,
CASE 
WHEN LEAD(ToDate,1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate) IS NOT NULL 
AND DATEDIFF(D,ToDate, LEAD(FromDate, 1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate)) <= 1 THEN NULL 
ELSE ToDate 
END ToDate1
FROM #test
) a WHERE FromDate1 IS NOT NULL OR ToDate1 IS NOT NULL
) b 
) c WHERE ToDate IS NOT NULL

GO