March 25, 2011 at 8:57 am
I work with a lot of data that is based on membership enrollment segments. Each segment has an effective and term date and in many cases one starts right after the next. One of the things I have to do is put together a table of continuous enrollments by member. I already have code that works fine using a loop, but I was wondering if there was another way to handle it that I'm overlooking. I just have not been able to come up with a way to do it with a single query. I was thinking that it might be possible to use a common table expression with recursion, but have been unsuccessful in my attempts. Below is a sample of the data for one member. If you look closely, you will see there is a gap in enrollment between the 5th and 6th segment so obviously you can't use a simple group by with a min and max to get the job done. If you have a solution for this type of query I would really like to see it. My goal is to create a view that returns what I want.
[font="Courier New"]
Member_ID Eff_Date Term_Date Sequence_ID
--------- ---------- ---------- --------------------
123456789 2005-01-01 2006-01-31 1
123456789 2006-02-01 2007-02-14 2
123456789 2007-02-15 2007-05-11 3
123456789 2007-05-12 2008-02-28 4
123456789 2008-02-29 2008-12-31 5
123456789 2009-11-01 2009-12-31 6
123456789 2010-01-01 2010-08-11 7
123456789 2010-08-12 2017-12-31 8
[/font]
The end result I'm looking for would be two records for this member like this.
[font="Courier New"]
Member_ID Eff_Date Term_Date Sequence_ID
--------- ---------- ---------- --------------------
123456789 2005-01-01 2008-12-31 1
123456789 2009-11-01 2017-12-31 2
[/font]
March 25, 2011 at 9:02 am
Search this site for "gaps and islands" - that should get you started.
John
March 25, 2011 at 11:56 am
Thanks, I'll take a look at that. I seem to have found a solution using a series of CTEs that build upon each other. I did it by identifying the gaps and then figuring out which gap date (if any) is closest to each segment. Basically, once you link each segment with a gap then you can group by that and take the min effective date.
March 25, 2011 at 2:03 pm
Try this
SELECT s1.Member_ID,
s1.Eff_Date,
MIN(t1.Term_Date) AS Term_Date,
ROW_NUMBER() OVER(ORDER BY s1.Eff_Date) AS Sequence_ID
FROM MyTable s1
INNER JOIN MyTable t1 ON t1.Member_ID=s1.Member_ID
AND s1.Eff_Date <= t1.Term_Date
AND NOT EXISTS(SELECT * FROM MyTable t2
WHERE t2.Member_ID=t1.Member_ID
AND (t1.Term_Date+1) >= t2.Eff_Date AND t1.Term_Date < t2.Term_Date)
WHERE NOT EXISTS(SELECT * FROM MyTable s2
WHERE s2.Member_ID=s1.Member_ID
AND s1.Eff_Date > s2.Eff_Date AND (s1.Eff_Date-1) <= s2.Term_Date)
GROUP BY s1.Member_ID,s1.Eff_Date
ORDER BY s1.Member_ID,s1.Eff_Date;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537July 26, 2011 at 9:50 pm
awesome solution!
July 27, 2011 at 12:56 pm
I never did post my solution so here it is. Again, this was an attempt to do this with a single select statement. It uses CTEs much like you would with a series of steps and temp tables. You can achieve probably achieve better performance in a procedure or function, but this does work. Another thing I did not mention in my original post is that the same member can have multiple records that overlap with one another. The solution below takes care of all these issues. I did not test the other code that was posted, however I don't see how it could handle all of those scenarios. Thanks to everyone who chimed in.
;WITH cteAnchor AS(
--This provides our anchor per member. We know that any changes
--in enrollment must have occurred between those dates. The anchor
--is important and can be used to easily change the logic in case
--you are only interested in certain date ranges.
SELECT
e.Member_ID
, Eff_Date = min(e.Eff_Date)
, Term_Date = max(e.Term_Date)
FROM dbo.tbl_Member_Enrollment e
GROUP BY e.Member_ID
)
, cteChanges AS(
--This step establishes all the segment change dates per member
--that occurred during the anchor period. The term date being
--returned will alway be from the anchor record. The actual term
--date for each segment will be determined in the next step.
--A change point is defined as any new effective date during the
--anchor period. We use term dates to establish new effective
--dates as well by adding a day to it. This gives us the ability
--to establish all of the segments we need.
SELECT
dt.Member_ID
, dt.Eff_Date
, dt.Term_Date
, Sequence_ID = ROW_NUMBER() OVER(PARTITION BY dt.Member_ID ORDER BY dt.Eff_Date)
FROM
(--derived table
SELECT
a.Member_ID
, e.Eff_Date
, a.Term_Date
FROM
cteAnchor a
INNER JOIN dbo.tbl_Member_Enrollment e
ON e.Member_ID = a.Member_ID
AND e.Eff_Date BETWEEN a.Eff_Date AND a.Term_Date
--==============
UNION
--==============
SELECT
a.Member_ID
, Eff_Date = dateadd(day, 1, e.Term_Date)
, a.Term_Date
FROM
cteAnchor a
INNER JOIN dbo.tbl_Member_Enrollment e
ON e.Member_ID = a.Member_ID
AND e.Term_Date BETWEEN a.Eff_Date AND a.Term_Date
AND e.Term_Date < a.Term_Date
) dt
)
, cteSegments AS(
--This step establishes all of the continuous non-overlapping segments
--of the anchor period. It uses the sequence ID to look at the next record
--to establish the term date by subtracting one day from the effective
--date of the next segment. This ensures there are no overlapping records.
SELECT
c1.Member_ID
, c1.Eff_Date
, Term_Date = (
CASE
WHEN c2.Member_ID IS NULL THEN
c1.Term_Date
ELSE
dateadd(day, -1, c2.Eff_Date)
END
)
, c1.Sequence_ID
, Enrollment_Count = (
SELECT count(*)
FROM dbo.tbl_Member_Enrollment e
WHERE
e.Member_ID = c1.Member_ID
AND c1.Eff_Date BETWEEN e.Eff_Date AND e.Term_Date
)
FROM
cteChanges c1
LEFT OUTER JOIN cteChanges c2
ON c2.Member_ID = c1.Member_ID
AND c2.Sequence_ID = c1.Sequence_ID + 1 --join with the next record
)
, cteContinuousChanges AS(
--This step establishes all the segment gap dates per member
--that occurred during the anchor period. The term date being
--returned will always be from the anchor record. The actual term
--date for each segment will be determined in the next step.
SELECT
dt.Member_ID
, dt.Eff_Date
, dt.Term_Date
, Sequence_ID = ROW_NUMBER() OVER(PARTITION BY dt.Member_ID ORDER BY dt.Eff_Date)
FROM
(--derived table
SELECT
a.Member_ID
, a.Eff_Date
, a.Term_Date
FROM cteAnchor a
--==============
UNION
--==============
SELECT
a.Member_ID
, s.Eff_Date
, a.Term_Date
FROM
cteAnchor a
INNER JOIN cteSegments s
ON s.Member_ID = a.Member_ID
AND s.Enrollment_Count = 0
--==============
UNION
--==============
SELECT
a.Member_ID
, Eff_Date = dateadd(day, 1, s.Term_Date)
, a.Term_Date
FROM
cteAnchor a
INNER JOIN cteSegments s
ON s.Member_ID = a.Member_ID
AND s.Enrollment_Count = 0
AND s.Term_Date BETWEEN a.Eff_Date AND a.Term_Date
AND s.Term_Date < a.Term_Date
) dt
)
, cteContinuousSegments AS(
--This step establishes all of the continuous non-overlapping segments
--of the anchor period using the gaps above. It uses the sequence ID
--to look at the next record to establish the term date by subtracting
--one day from the effective date of the next segment. This ensures there
--are no overlapping records.
SELECT
c1.Member_ID
, c1.Eff_Date
, Term_Date = (
CASE
WHEN c2.Member_ID IS NULL THEN
c1.Term_Date
ELSE
dateadd(day, -1, c2.Eff_Date)
END
)
, c1.Sequence_ID
, s.Enrollment_Count
FROM
cteContinuousChanges c1
LEFT OUTER JOIN cteContinuousChanges c2
ON c2.Member_ID = c1.Member_ID
AND c2.Sequence_ID = c1.Sequence_ID + 1 --join with the next record
INNER JOIN cteSegments s
ON s.Member_ID = c1.Member_ID
AND c1.Eff_Date BETWEEN s.Eff_Date AND s.Term_Date
)
SELECT *
FROM cteContinuousSegments cs
July 28, 2011 at 8:07 am
In my problem I had a Queue of work to be performed by an SSIS package, we wanted an efficient way to group these request to lessen the burden on the ssis package (basically we didn't want it to work on duplicate requests)
CampaignIDStartDateEndDate
16177/22/2011 00:00:007/24/2011 00:00:00
16177/22/2011 00:00:007/24/2011 00:00:00
16177/22/2011 00:00:007/24/2011 00:00:00
16177/22/2011 00:00:007/24/2011 00:00:00
16177/22/2011 00:00:007/24/2011 00:00:00
16177/22/2011 00:00:007/24/2011 00:00:00
16177/25/2011 00:00:007/25/2011 00:00:00
16177/22/2011 00:00:007/24/2011 00:00:00
16177/25/2011 00:00:007/25/2011 00:00:00
16177/25/2011 00:00:007/25/2011 00:00:00
16177/23/2011 00:00:007/23/2011 00:00:00
16177/25/2011 00:00:007/25/2011 00:00:00
16177/22/2011 00:00:007/24/2011 00:00:00
16177/25/2011 00:00:007/25/2011 00:00:00
16177/24/2011 00:00:007/24/2011 00:00:00
16177/25/2011 00:00:007/25/2011 00:00:00
Using the solution this is what I see:
16172011-07-22 00:00:00.0002011-07-25 00:00:00.000
which is perfect. Given a Group By gets me almost there. @dawalker thanks for the explanation around your solution.
July 28, 2011 at 8:46 am
Glad you could use it. Unfortunately, I had to come up with it on my own because nothing else I found worked the way I needed it to. It was definitely one of the more challenging queries I've had to write. I'm constantly having to deal with date segments in my job so this logic is going to have a lot of uses for us especially in our new data warehouse. I verified it against the craziest set of records I could find in our system and it worked perfectly. If I ever improve upon it I will repost any updates. It is actually a very flexible query because you can easily return only the gaps or the covered periods just by changing the where clause on the final select. Considering what it does, it actually performs pretty well in our system. My stored procedure that uses a loop to do it is still faster though. I think that's mainly because I'm using several temp tables that I index and I only have to loop enough times to cover that max sequence ID.
January 10, 2019 at 11:53 am
Mark Cowne - Friday, March 25, 2011 2:03 PMTry thisSELECT s1.Member_ID, s1.Eff_Date, MIN(t1.Term_Date) AS Term_Date, ROW_NUMBER() OVER(ORDER BY s1.Eff_Date) AS Sequence_IDFROM MyTable s1 INNER JOIN MyTable t1 ON t1.Member_ID=s1.Member_ID AND s1.Eff_Date <= t1.Term_Date AND NOT EXISTS(SELECT * FROM MyTable t2 WHERE t2.Member_ID=t1.Member_ID AND (t1.Term_Date+1) >= t2.Eff_Date AND t1.Term_Date < t2.Term_Date) WHERE NOT EXISTS(SELECT * FROM MyTable s2 WHERE s2.Member_ID=s1.Member_ID AND s1.Eff_Date > s2.Eff_Date AND (s1.Eff_Date-1) <= s2.Term_Date) GROUP BY s1.Member_ID,s1.Eff_DateORDER BY s1.Member_ID,s1.Eff_Date;
Thanks @MarkCowne your solution worked like a charm after 8 years for me 🙂
Would be great if you or someone here could explain this solution as I am not able to get my head around this approach.
Thanks in advance
April 15, 2019 at 11:21 am
Hi,
I have recently come across this problem, but none of the solutions posted here convinced me as I have millions of records to process and I believe the solutions present here couldn't scale (may be I was wrong in my assumption) and also I am looking for a simple code.
I tried different options, but nothing did match my expectation. But, after working on this problem for couple of days, finally found a solution when tried to solve this as a logical puzzle and more than 13 million rows have been processed under less than 40 seconds.
Thought of posting this solutions as it would help someone who is facing this issue.
Share your thoughts.
IF OBJECT_ID('TEMPDB..#TEST') IS NOT NULL
DROP TABLE #TEST
CREATE TABLE #TEST(ID INT, FromDate DATETIME, ToDate DATETIME)
GO
INSERT #TEST
VALUES
(1, '01-01-2018', '01-10-2018'),
(1, '01-11-2018', '01-25-2018'),
(1, '01-26-2018', '01-28-2018'),
(1, '01-29-2018', '01-30-2018'),
(1, '02-01-2018', '02-10-2018'),
(1, '02-11-2018', '02-20-2018'),
(1, '02-21-2018', '02-24-2018'),
(1, '03-01-2018', '03-10-2018'),
(1, '03-11-2018', '03-20-2018'),
(2, '01-01-2019', '01-28-2019'),
(2, '01-29-2019', '01-31-2019')
GO
SELECT ID, FromDate, ToDate
FROM
(
SELECT ID,
CASE
WHEN FromDate1 IS NOT NULL THEN FromDate1
ELSE LAG(FromDate1, 1) OVER(PARTITION BY ID ORDER BY RowNum) END FromDate,
ToDate1 AS ToDate
FROM
(
SELECT ID, FromDate1, ToDate1, RowNum
FROM
(
SELECT ID, FromDate, ToDate, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY FromDate,ToDate) RowNum,
CASE
WHEN LAG(ToDate,1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate) IS NOT NULL
AND DATEDIFF(D, LAG(ToDate, 1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate),FromDate) <= 1 THEN NULL
ELSE FromDate
END FromDate1,
CASE
WHEN LEAD(ToDate,1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate) IS NOT NULL
AND DATEDIFF(D,ToDate, LEAD(FromDate, 1) OVER(PARTITION BY ID ORDER BY FromDate,ToDate)) <= 1 THEN NULL
ELSE ToDate
END ToDate1
FROM #test
) a WHERE FromDate1 IS NOT NULL OR ToDate1 IS NOT NULL
) b
) c WHERE ToDate IS NOT NULL
GO
May 14, 2019 at 9:49 am
This was removed by the editor as SPAM
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply