May 26, 2011 at 7:31 am
Most examples of "gaps and islands" queries I've seen focus on an identity column or when tracking a change in a status flag (similar to win/loss streaks for a sports team).
I've run into a situation that is a bit more complex than mentioned above. I'm working with healthcare data where a person could have multiple active insurance coverage.
For example, let's say a patient has been on Insurance "1" for the past 5 months and has also been on Insurance "2" for the past three months. Prior to either of these insurances, the patient had a completely different insurance known as "9". The data in a table appears as follows:
PatID InsID YrMonth
===== ===== =======
1 1 201105
1 2 201105
1 1 201104
1 2 201104
1 1 201103
1 2 201103
1 1 201102
1 1 201101
1 9 201012
The trick is to first answer the question "What is/are the patient's current insurance(s)?"
The second question to answer is "How long has the patient been on the current insurance(s)?"
The desired output would yield something like this:
PatID InsID StartYrMo EndYrMo CntOfYrMonth
===== ===== ========= ======= =====
1 1 201101 201105 5
1 2 201103 201105 3
Here is the SQL to populate the sample dataset:
DECLARE @t TABLE
(
PATID INT
,INSID INT
,YRMONTH INT
)
INSERT @t (PATID, INSID, YRMONTH)
SELECT 1,1,201105
UNION ALL
SELECT 1,2,201105
UNION ALL
SELECT 1,1,201104
UNION ALL
SELECT 1,2,201104
UNION ALL
SELECT 1,1,201103
UNION ALL
SELECT 1,2,201103
UNION ALL
SELECT 1,1,201102
UNION ALL
SELECT 1,1,201101
UNION ALL
SELECT 1,9,201012
SELECT * FROM @t
Below is code from a related discussion I had about tracking a change in status flag -- the code presented by Mark-101232 is in its original form but can be easily adapted to track a patient's insurance coverage if a patient only has one insurance (substitute ProcessID with PatID, Compliant with InsID, and Month with YrMonth ):
WITH CTE AS (
SELECT ProcessID, Month, Compliant,
ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month) -
ROW_NUMBER() OVER(PARTITION BY ProcessID,Compliant ORDER BY Month) AS rnDiff
FROM @t)
SELECT ProcessID,
MIN(Month) AS StartMonth,
MAX(Month) AS EndMonth,
Compliant
FROM CTE x
GROUP BY ProcessID, Compliant, rnDiff
ORDER BY ProcessID,StartMonth;
For patients with only one insurance, I can pull off the gaps/islands query just fine. But, I have yet to find a good way to modify the code to allow for the tracking of multiple active insurances. I'm wondering if it can be accomplished without using a cursor or similar loop? (I'm not against using cursors or loops, but merely wondering if a query can be written without these techniques.)
Many thanks in advance,
Pete
May 26, 2011 at 7:50 am
Here's my initial solution, although it uses a cursor to loop through the multiple insurances.
DECLARE @t TABLE
(
PATID INT
,INSID INT
,YRMONTH INT
)
INSERT @t (PATID, INSID, YRMONTH)
SELECT 1,1,201105
UNION ALL
SELECT 1,2,201105
UNION ALL
SELECT 1,1,201104
UNION ALL
SELECT 1,2,201104
UNION ALL
SELECT 1,1,201103
UNION ALL
SELECT 1,2,201103
UNION ALL
SELECT 1,1,201102
UNION ALL
SELECT 1,1,201101
UNION ALL
SELECT 1,9,201012
/* SHOW ORIGINAL DATA */
SELECT * FROM @t
/* CREATE OUTPUT TABLE */
DECLARE @RESULTS TABLE
(
PATID INT
,INSID INT
,STARTMONTH INT
,ENDMONTH INT
,CNT INT
)
/* FIND CURRENT INSURANCES */
DECLARE C CURSOR
FOR SELECT DISTINCT
INSID
FROM @T
WHERE YRMONTH = (SELECT MAX(YRMONTH) FROM @T)
DECLARE @ins INT
OPEN C
FETCH NEXT FROM C INTO @ins
WHILE ( @@fetch_status = 0 )
BEGIN
;
WITH CTE
AS ( SELECT PatID
,YRMONTH
,InsID
,ROW_NUMBER() OVER ( PARTITION BY PatID ORDER BY YRMONTH )
- ROW_NUMBER() OVER ( PARTITION BY PatID,
InsID ORDER BY YRMONTH ) AS rnDiff
FROM @t
WHERE INSID = @ins
)
INSERT @RESULTS (PATID, INSID, STARTMONTH, ENDMONTH,CNT)
SELECT PatID
,InsID
,MIN(YRMONTH) AS StartMonth
,MAX(YRMONTH) AS EndMonth
,COUNT(*) AS CNT
FROM CTE x
GROUP BY PatID
,InsID
,rnDiff
ORDER BY PatID
,StartMonth ;
FETCH NEXT FROM C INTO @ins
END
CLOSE C
DEALLOCATE C
/* SHOW RESULTS */
SELECT * FROM @RESULTS
GO
May 26, 2011 at 9:10 am
peterzeke (5/26/2011)
WITH CTE AS (
SELECT ProcessID, Month, Compliant,
ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month) -
ROW_NUMBER() OVER(PARTITION BY ProcessID,Compliant ORDER BY Month) AS rnDiff
FROM @t)
SELECT ProcessID,
MIN(Month) AS StartMonth,
MAX(Month) AS EndMonth,
Compliant
FROM CTE x
GROUP BY ProcessID, Compliant, rnDiff
ORDER BY ProcessID,StartMonth;
In order for this particular method of gap/island analysis to work, at a minimum, you need all transition points for the attribute you are analyzing--in this case, compliance. Since your sample data only contains compliant months, you're missing the transition points to non-compliance.
It's possible to write the query using the incomplete data, but if your data is actually complete, it would be more efficient to start from that.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 26, 2011 at 10:01 am
I came up with the following solution using a calendar table fragment. You can create a calendar table on the fly using a tally table. I also added another data item to provide a gap.
Here is the updated data.
DECLARE @t TABLE
(
PATID INT
,INSID INT
,YRMONTH INT
)
INSERT @t (PATID, INSID, YRMONTH)
SELECT 1,1,201005
UNION ALL
SELECT 1,1,201105
UNION ALL
SELECT 1,2,201105
UNION ALL
SELECT 1,1,201104
UNION ALL
SELECT 1,2,201104
UNION ALL
SELECT 1,1,201103
UNION ALL
SELECT 1,2,201103
UNION ALL
SELECT 1,1,201102
UNION ALL
SELECT 1,1,201101
UNION ALL
SELECT 1,9,201012
Here is the calendar fragment
DECLARE @cal TABLE(
YrMonth int
, StartDate datetime
)
INSERT @cal
SELECT 201005, '2010-05-01'
UNION ALL
SELECT 201005, '2010-05-01'
UNION ALL
SELECT 201006, '2010-06-01'
UNION ALL
SELECT 201007, '2010-07-01'
UNION ALL
SELECT 201008, '2010-08-01'
UNION ALL
SELECT 201009, '2010-09-01'
UNION ALL
SELECT 201010, '2010-10-01'
UNION ALL
SELECT 201011, '2010-11-01'
UNION ALL
SELECT 201012, '2010-12-01'
UNION ALL
SELECT 201101, '2011-01-01'
UNION ALL
SELECT 201102, '2011-02-01'
UNION ALL
SELECT 201103, '2011-03-01'
UNION ALL
SELECT 201104, '2011-04-01'
UNION ALL
SELECT 201105, '2011-05-01'
and here is the query based on your original CTE
; WITH CTE1 AS (
SELECT t.PatID, t.InsID, t.YrMonth, 1 AS Compliance, c.StartDate
FROM @t AS t
INNER JOIN @cal AS c
ON t.YrMonth = c.YrMonth
UNION
SELECT t.PatID, t.InsID, c2.YrMonth, CASE WHEN n.PatID IS NULL THEN 0 ELSE 1 END AS Compliance, c2.StartDate
FROM @t AS t
INNER JOIN @cal AS c
ON t.YrMonth = c.YrMonth
INNER JOIN @cal AS c2
ON c.StartDate = DateAdd(Month, -1, c2.StartDate)
LEFT OUTER JOIN @t AS n
ON t.PatID = n.PatID
AND t.InsID = n.InsID
AND c2.YrMonth = n.YrMonth
)
--;
, CTE AS (
SELECT PatID, YrMonth, InsID, Compliance, StartDate,
ROW_NUMBER() OVER(PARTITION BY PatID, InsID ORDER BY YrMonth) -
ROW_NUMBER() OVER(PARTITION BY PatID, InsID, Compliance ORDER BY YrMonth) AS rnDiff
FROM CTE1
)
SELECT PatID,
MIN(YrMonth) AS StartMonth,
MAX(YrMonth) AS EndMonth,
InsID
FROM CTE
GROUP BY PatID, InsID, Compliance, rnDiff
HAVING Max(StartDate) >= DateAdd(Month, DateDiff(Month, 0, GetDate()), 0)
ORDER BY PatID,InsID, StartMonth;
The gap/island analysis finds changes in an attribute within a key. The key must be used in both Row_Number() partitions and the attribute is only used in the second. In your original, the key was just the individual, but in the updated, the key is the individual/insurance combination.
Drew
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 27, 2011 at 2:15 am
Unless I'm misunderstanding this, I'm don't think this is a groups and islands problem, more
a case of just joining to the insurance for the latest month
WITH CTE AS (
SELECT PATID, INSID, YRMONTH,
RANK() OVER(PARTITION BY PATID ORDER BY YRMONTH DESC) AS rk
FROM @t)
SELECT c.PATID,c.INSID,
MIN(t.YRMONTH) AS STARTMONTH,
c.YRMONTH AS ENDMONTH,
COUNT(*) AS CNT
FROM CTE c
INNER JOIN @t t ON t.PATID=c.PATID AND t.INSID=c.INSID
WHERE c.rk=1
GROUP BY c.PATID,c.INSID,c.YRMONTH;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537May 27, 2011 at 6:49 am
drew.allen (5/26/2011)
The gap/island analysis finds changes in an attribute within a key. The key must be used in both Row_Number() partitions and the attribute is only used in the second. In your original, the key was just the individual, but in the updated, the key is the individual/insurance combination.
Drew - your explanation (and code) provides terrific clarification to me regarding the essential "ingredients" to using/modifying the code for gaps/islands. It's much clearer to me now.
Turns out the requirements for the project I'm working on now include needing to display complete insurance coverage history for a given patient -- your code will do the trick.
Thanks again,
Pete
May 27, 2011 at 7:01 am
Mark-101232 (5/27/2011)
Unless I'm misunderstanding this, I'm don't think this is a groups and islands problem, morea case of just joining to the insurance for the latest month
WITH CTE AS (
SELECT PATID, INSID, YRMONTH,
RANK() OVER(PARTITION BY PATID ORDER BY YRMONTH DESC) AS rk
FROM @t)
SELECT c.PATID,c.INSID,
MIN(t.YRMONTH) AS STARTMONTH,
c.YRMONTH AS ENDMONTH,
COUNT(*) AS CNT
FROM CTE c
INNER JOIN @t t ON t.PATID=c.PATID AND t.INSID=c.INSID
WHERE c.rk=1
GROUP BY c.PATID,c.INSID,c.YRMONTH;
Mark! -- This is great! Indeed, you interpreted my original need, correctly... so I guess my situation wasn't quite a gaps/islands problem.
Nonetheless, your code clarifies a much better technique for getting at a patient's current insurance coverage timespan than anything I'd been trying to develop on my own. (I'd originally started to develop something that looked somewhat like your proposed code, but it just wasn't coming together...)
Thanks again for your help,
Pete
May 27, 2011 at 7:19 am
peterzeke (5/27/2011)
Mark-101232 (5/27/2011)
Unless I'm misunderstanding this, I'm don't think this is a groups and islands problem, morea case of just joining to the insurance for the latest month
WITH CTE AS (
SELECT PATID, INSID, YRMONTH,
RANK() OVER(PARTITION BY PATID ORDER BY YRMONTH DESC) AS rk
FROM @t)
SELECT c.PATID,c.INSID,
MIN(t.YRMONTH) AS STARTMONTH,
c.YRMONTH AS ENDMONTH,
COUNT(*) AS CNT
FROM CTE c
INNER JOIN @t t ON t.PATID=c.PATID AND t.INSID=c.INSID
WHERE c.rk=1
GROUP BY c.PATID,c.INSID,c.YRMONTH;
Mark! -- This is great! Indeed, you interpreted my original need, correctly... so I guess my situation wasn't quite a gaps/islands problem.
Uh oh!.... Mark, I just realized why I initially considered my problem as an actual gaps/islands analysis. Your code won't work if a patient has been enrolled in the same insurance at different time periods.
For example, add one additional row to the sample data table that has PatID 1 with InsID 1 with YrMonth 200001
Union All
Select 1,1,200001
... Using your code, this will yield a StartMonth for InsID 1 equal to 200001 & Endmonth of 201105 for a Cnt of 6 -- which is incorrect, unfortunately.
Therefore, to evaluate the patient's current insurance coverage, one needs to find the first "island" of insurance coverage so that any previous similar coverage is not included.
Does my assessment make sense?
--Pete
May 27, 2011 at 7:38 am
peterzeke (5/27/2011)
peterzeke (5/27/2011)
Mark-101232 (5/27/2011)
Unless I'm misunderstanding this, I'm don't think this is a groups and islands problem, morea case of just joining to the insurance for the latest month
WITH CTE AS (
SELECT PATID, INSID, YRMONTH,
RANK() OVER(PARTITION BY PATID ORDER BY YRMONTH DESC) AS rk
FROM @t)
SELECT c.PATID,c.INSID,
MIN(t.YRMONTH) AS STARTMONTH,
c.YRMONTH AS ENDMONTH,
COUNT(*) AS CNT
FROM CTE c
INNER JOIN @t t ON t.PATID=c.PATID AND t.INSID=c.INSID
WHERE c.rk=1
GROUP BY c.PATID,c.INSID,c.YRMONTH;
Mark! -- This is great! Indeed, you interpreted my original need, correctly... so I guess my situation wasn't quite a gaps/islands problem.
Uh oh!.... Mark, I just realized why I initially considered my problem as an actual gaps/islands analysis. Your code won't work if a patient has been enrolled in the same insurance at different time periods.
For example, add one additional row to the sample data table that has PatID 1 with InsID 1 with YrMonth 200001
Union All
Select 1,1,200001
... Using your code, this will yield a StartMonth for InsID 1 equal to 200001 & Endmonth of 201105 for a Cnt of 6 -- which is incorrect, unfortunately.
Therefore, to evaluate the patient's current insurance coverage, one needs to find the first "island" of insurance coverage so that any previous similar coverage is not included.
Does my assessment make sense?
--Pete
...in which case you'll need to use Drew's approach.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537May 27, 2011 at 1:11 pm
peterzeke (5/27/2011)
peterzeke (5/27/2011)
Mark-101232 (5/27/2011)
Unless I'm misunderstanding this, I'm don't think this is a groups and islands problem, morea case of just joining to the insurance for the latest month
WITH CTE AS (
SELECT PATID, INSID, YRMONTH,
RANK() OVER(PARTITION BY PATID ORDER BY YRMONTH DESC) AS rk
FROM @t)
SELECT c.PATID,c.INSID,
MIN(t.YRMONTH) AS STARTMONTH,
c.YRMONTH AS ENDMONTH,
COUNT(*) AS CNT
FROM CTE c
INNER JOIN @t t ON t.PATID=c.PATID AND t.INSID=c.INSID
WHERE c.rk=1
GROUP BY c.PATID,c.INSID,c.YRMONTH;
Mark! -- This is great! Indeed, you interpreted my original need, correctly... so I guess my situation wasn't quite a gaps/islands problem.
Uh oh!.... Mark, I just realized why I initially considered my problem as an actual gaps/islands analysis. Your code won't work if a patient has been enrolled in the same insurance at different time periods.
For example, add one additional row to the sample data table that has PatID 1 with InsID 1 with YrMonth 200001
Union All
Select 1,1,200001
... Using your code, this will yield a StartMonth for InsID 1 equal to 200001 & Endmonth of 201105 for a Cnt of 6 -- which is incorrect, unfortunately.
Therefore, to evaluate the patient's current insurance coverage, one needs to find the first "island" of insurance coverage so that any previous similar coverage is not included.
Does my assessment make sense?
--Pete
Pete,
2 questions:
1. Based upon the new row being added to the sample data, what are your expected results?
2. Does Drew's approach solve your issue?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 27, 2011 at 2:25 pm
WayneS (5/27/2011)
Pete,
2 questions:
1. Based upon the new row being added to the sample data, what are your expected results?
2. Does Drew's approach solve your issue?
Starting with #2 -- yes, Drew's approach solves the issue by displaying all "islands" of insurance coverage, which my colleague actually prefers now. Originally, my colleague's request was to show current insurance coverage(s), only.
As for #1 -- my response depends on which SQL code is used (Mark's vs Drew's) and how to interpret your meaning of "expected results." The correct result should treat the new row as a separate row in the final output. Drew's code will produce this correct result; Mark's code will consider YrMonth 200001 as the Min() value for InsID 1, which falsely suggests that the patient has had InsID 1 insurance coverage from 200001 thru 201105. (My initial code before posting to SSC for help looked similar to Mark's.)
Do you have a different solution, per chance?
--Pete
May 27, 2011 at 2:49 pm
peterzeke (5/27/2011)
WayneS (5/27/2011)
Pete,
2 questions:
1. Based upon the new row being added to the sample data, what are your expected results?
2. Does Drew's approach solve your issue?
Starting with #2 -- yes, Drew's approach solves the issue by displaying all "islands" of insurance coverage, which my colleague actually prefers now. Originally, my colleague's request was to show current insurance coverage(s), only.
As for #1 -- my response depends on which SQL code is used (Mark's vs Drew's) and how to interpret your meaning of "expected results." The correct result should treat the new row as a separate row in the final output. Drew's code will produce this correct result; Mark's code will consider YrMonth 200001 as the Min() value for InsID 1, which falsely suggests that the patient has had InsID 1 insurance coverage from 200001 thru 201105. (My initial code before posting to SSC for help looked similar to Mark's.)
Do you have a different solution, per chance?
--Pete
Starting with your last question... I'm willing to try if you want something different.
My definition of "expected results" is ... what results do you desire, based upon the sample data supplied? Show each column, each row, and the values should be there.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 31, 2011 at 6:39 am
WayneS (5/27/2011)
peterzeke (5/27/2011)
WayneS (5/27/2011)
Pete,
2 questions:
1. Based upon the new row being added to the sample data, what are your expected results?
2. Does Drew's approach solve your issue?
Starting with #2 -- yes, Drew's approach solves the issue by displaying all "islands" of insurance coverage, which my colleague actually prefers now. Originally, my colleague's request was to show current insurance coverage(s), only.
As for #1 -- my response depends on which SQL code is used (Mark's vs Drew's) and how to interpret your meaning of "expected results." The correct result should treat the new row as a separate row in the final output. Drew's code will produce this correct result; Mark's code will consider YrMonth 200001 as the Min() value for InsID 1, which falsely suggests that the patient has had InsID 1 insurance coverage from 200001 thru 201105. (My initial code before posting to SSC for help looked similar to Mark's.)
Do you have a different solution, per chance?
--Pete
Starting with your last question... I'm willing to try if you want something different.
My definition of "expected results" is ... what results do you desire, based upon the sample data supplied? Show each column, each row, and the values should be there.
OK -- The initial desired result was to show only current insurance coverage(s):
(Figure 1)
PatID InsID StartYrMo EndYrMo CntOfYrMonth
===== ===== ========= ======= =====
1 1 201101 201105 5
1 2 201103 201105 3
Then, my colleague prefers to see all insurance coverage history. So, based on the original sample data, here are the expected results:
(Figure 2)
PatID InsID StartYrMo EndYrMo CntOfYrMonth
===== ===== ========= ======= =====
1 1 201101 201105 5
1 2 201103 201105 3
1 9 201012 201012 1
Given the expected results as show in Figure 2, the addition of another row of (historical) data where the PatID=1, InsID=1 and YrMonth = 200001 , the expected results should be...
(Figure 3)
PatID InsID StartYrMo EndYrMo CntOfYrMonth
===== ===== ========= ======= =====
1 1 201101 201105 5
1 2 201103 201105 3
1 9 201012 201012 1
1 1200001 200001 1
Ultimately, I think Drew's code provides the solution I've been needing to achieve. Although I don't see a need to rework Drew's code, I'm always interested in reviewing other people's SQL that is efficient, correct, and comprehendable (i.e., your signature reflects a mantra that one should understand the code that is copied....) So, if you have another way to "skin the cat", I'd really like to learn from it.
May 31, 2011 at 9:16 am
peterzeke (5/31/2011)
WayneS (5/27/2011)
peterzeke (5/27/2011)
WayneS (5/27/2011)
Pete,
2 questions:
1. Based upon the new row being added to the sample data, what are your expected results?
2. Does Drew's approach solve your issue?
Starting with #2 -- yes, Drew's approach solves the issue by displaying all "islands" of insurance coverage, which my colleague actually prefers now. Originally, my colleague's request was to show current insurance coverage(s), only.
As for #1 -- my response depends on which SQL code is used (Mark's vs Drew's) and how to interpret your meaning of "expected results." The correct result should treat the new row as a separate row in the final output. Drew's code will produce this correct result; Mark's code will consider YrMonth 200001 as the Min() value for InsID 1, which falsely suggests that the patient has had InsID 1 insurance coverage from 200001 thru 201105. (My initial code before posting to SSC for help looked similar to Mark's.)
Do you have a different solution, per chance?
--Pete
Starting with your last question... I'm willing to try if you want something different.
My definition of "expected results" is ... what results do you desire, based upon the sample data supplied? Show each column, each row, and the values should be there.
OK -- The initial desired result was to show only current insurance coverage(s):
(Figure 1)
PatID InsID StartYrMo EndYrMo CntOfYrMonth
===== ===== ========= ======= =====
1 1 201101 201105 5
1 2 201103 201105 3
Then, my colleague prefers to see all insurance coverage history. So, based on the original sample data, here are the expected results:
(Figure 2)
PatID InsID StartYrMo EndYrMo CntOfYrMonth
===== ===== ========= ======= =====
1 1 201101 201105 5
1 2 201103 201105 3
1 9 201012 201012 1
Given the expected results as show in Figure 2, the addition of another row of (historical) data where the PatID=1, InsID=1 and YrMonth = 200001 , the expected results should be...
(Figure 3)
PatID InsID StartYrMo EndYrMo CntOfYrMonth
===== ===== ========= ======= =====
1 1 201101 201105 5
1 2 201103 201105 3
1 9 201012 201012 1
1 1200001 200001 1
Ultimately, I think Drew's code provides the solution I've been needing to achieve. Although I don't see a need to rework Drew's code, I'm always interested in reviewing other people's SQL that is efficient, correct, and comprehendable (i.e., your signature reflects a mantra that one should understand the code that is copied....) So, if you have another way to "skin the cat", I'd really like to learn from it.
I really don't see anything to improve upon with Drew's code. Since it's doing what is needed, let's just use that.
Yeah, I feel pretty strongly about not just using code you found somewhere, but in understanding it. Personally, I try to learn something new every day...
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply