Calculate Groups based on date ranges (aka episodes of care)?

  • Anyone who works with healthcare data often has to identify an "episode" of care: i.e., a person who receives a variety of health care services over a span of time.

    I am attempting to identify rows of data that can be considered part of a group (episode). In the data below, I have a range of dates which also display the date diff between the FROM_DATE/TO_DATE ("DAYS") and whether the "DAYS" is less than 60 days.

    ROW FROM_DATE TO_DATE DAYS LESS_THAN_60

    1 2005-01-01 2005-02-04 34 1

    2 2005-02-04 2005-02-20 16 1

    3 2005-02-20 2005-03-05 13 1

    4 2005-03-05 2005-06-01 88 0

    5 2005-06-01 2005-07-10 39 1

    6 2005-07-10 2005-08-01 22 1

    7 2005-08-01 2005-12-10 131 0

    Any consecutive range of dates that have "DAYS" less than 60 should belong to the same group; anytime "DAYS" is greater than or equal to 60, a new episode begins.

    Therefore, in the example dataset above,

    ROWs #1 thru #3 should be flagged as an episode (group);

    ROW #4 should be its own group;

    ROWs #5 and #6 become a third group;

    ROW #7 becomes its own group.

    So, a total of four groups should be identified. For whatever reason, I am struggling to create the SQL to flag each row with an appropriate GROUP identifier (group number).

    Putting the data in excel, and whipping together a couple of calculated columns makes it possible to stamp the groups, but I can't seem to get it into SQL.

    Any suggestions will be greatly appreciated!

    --Pete

    Suggestions

  • Just to clarify. Shouldn't 4, 5, and 6 be in the same group, because the length of time between 4 and 5 is less than 60 and the length of time between 5 and 6 is also less than 60.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Hi Peter, nice challenge you got there,

    Can you provide the data in a ready consumable format along with the table definition, I'll hop on this 1, and I bet I will get beaten by more elite people! 😉

    Waiting on your data.

    Cheers,

    J-F

  • Thanks for taking a look, J F: Here's the code to create the sample table and populate it.

    IF OBJECT_ID('TEMPDB..#tbl_EpisodeGroup') IS NOT NULL

    DROP TABLE #tbl_EpisodeGroup

    CREATE TABLE #tbl_EpisodeGroup(

    [ROWNUM] int NULL,

    [FROM_DATE] [datetime] NULL,

    [TO_DATE] [datetime] NULL,

    [CNT_DAYS] [int] NULL,

    [LESS_THAN_60] [int] NOT NULL,

    [EpisodeGroup] [varchar](10) NULL

    );

    /* populate the table */

    INSERT #tbl_EpisodeGroup (ROWNUM,FROM_DATE,TO_DATE,CNT_DAYS,LESS_THAN_60,EpisodeGroup)

    SELECT 1,'20050101' ,'20050204' ,34,1,''

    UNION

    SELECT 2,'20050204' ,'20050220' ,16,1,''

    UNION

    SELECT 3,'20050220' ,'20050305' ,13,1,''

    UNION

    SELECT 4,'20050305' ,'20050601' ,88,0,''

    UNION

    SELECT 5,'20050601' ,'20050710' ,39,1,''

    UNION

    SELECT 6,'20050710' ,'20050801' ,22,1,''

    UNION

    SELECT 7,'20050801' ,'20051210' ,131,0,''

    SELECT * FROM #tbl_EpisodeGroup

    I've been trying to find a way to solve this without using cursors or running into RBAR.. perhaps Jeff Moden might show up in this discussion...

    To Drew Allen: if you look carefully at the data, you'll see that the datediff ("DAYS") between FROM_DATE (March 4, 2005) and TO_DATE (June 6, 2005) for row #4 is 88 days. Consequently, it can't be grouped with another row since it exceeds the 60 day criteria.

    --Pete

  • Try this

    SELECT ROWNUM,FROM_DATE,TO_DATE,CNT_DAYS,LESS_THAN_60,

    ROW_NUMBER() OVER(ORDER BY FROM_DATE)-

    ROW_NUMBER() OVER(PARTITION BY LESS_THAN_60 ORDER BY FROM_DATE) AS EpisodeGroup

    FROM #tbl_EpisodeGroup

    ORDER BY ROWNUM

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Speaking of Jeff Moden - the "quirky update" or pseudo cursor process should definitely help you figure out the episodes of care. After all, that's just a custom "running" operations (i.e. kind of like a running total, just not a total).

    There are some requirements missing before we can answer the question fully. I understand that the row with 88 days is a probably, but what to do with it is unclear - do it drop out of the calculation altogether? What if the timeframes overlap? what if the contiguous areas add up to more than 60 days?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • peterzeke (9/2/2009)


    To Drew Allen: if you look carefully at the data, you'll see that the datediff ("DAYS") between FROM_DATE (March 4, 2005) and TO_DATE (June 6, 2005) for row #4 is 88 days. Consequently, it can't be grouped with another row since it exceeds the 60 day criteria.

    It's not clear what the FROM_DATE and TO_DATE represent, because your sample data completely covers the dates. If they represented admission/discharge dates, you would expect gaps between the TO_DATE of one record and the FROM_DATE of the next record. Since there are no gaps, it looks like they are the differences between the admission dates (or discharge dates) of separate visits. Under those circumstances, it shouldn't matter how long the gap between two incidents is in determining whether subsequent visits are part of the same incident.

    So the big question is, what do the FROM_DATE and TO_DATE represent?

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (9/2/2009)


    peterzeke (9/2/2009)


    To Drew Allen: if you look carefully at the data, you'll see that the datediff ("DAYS") between FROM_DATE (March 4, 2005) and TO_DATE (June 6, 2005) for row #4 is 88 days. Consequently, it can't be grouped with another row since it exceeds the 60 day criteria.

    It's not clear what the FROM_DATE and TO_DATE represent, because your sample data completely covers the dates. If they represented admission/discharge dates, you would expect gaps between the TO_DATE of one record and the FROM_DATE of the next record. Since there are no gaps, it looks like they are the differences between the admission dates (or discharge dates) of separate visits. Under those circumstances, it shouldn't matter how long the gap between two incidents is in determining whether subsequent visits are part of the same incident.

    So the big question is, what do the FROM_DATE and TO_DATE represent?

    Drew

    Drew -- I sense you have experience with healthcare data! My use of "episode" isn't quite in line with the typical concept of an episode of care with admission/discharge dates, so I can certainly repsect your inquiry for clarity on what the From/To dates represent.

    For my situation, all of the dates represent a diagnosis date. The From/To dates represent the span of time between each date of diagnosis. So, in the sample dataset, 8 distinct dates of diagnosis are presented, which ultimately yields 7 rows of From/To pairings.

    Any pairings of dates that span less than 60 days are to be considered as part of an overall episode. In the sample dataset, the following dates are paired in rows 1 to 3:

    2005-01-01

    2005-02-04

    2005-02-20

    2005-03-05

    The number of days from one diagnosis date to the following diagnosis date is less than 60 days; consequently, this series of dates are part of one episode for my purposes (as defined by the clinical care team I support).

    Row #4 has From/To dates that span greater than 60 days between diagnosis dates; therefore, this situation becomes a new episode/group.

    I hope I've been able to clarify the information. Again, the use of the term "episode" isn't quite being applied in the traditional sense within healthcare data so I can understand any confusion that might come up.

    --Pete

  • Mark (9/2/2009)


    Try this

    SELECT ROWNUM,FROM_DATE,TO_DATE,CNT_DAYS,LESS_THAN_60,

    ROW_NUMBER() OVER(ORDER BY FROM_DATE)-

    ROW_NUMBER() OVER(PARTITION BY LESS_THAN_60 ORDER BY FROM_DATE) AS EpisodeGroup

    FROM #tbl_EpisodeGroup

    ORDER BY ROWNUM

    Mark! I think this might do the trick... I've done a couple of tests on a larger dataset and everything seems to be working.

    My limited use/need of ROW_NUMBER() and OVER was making it difficult for me to grasp exactly how to apply them for solving the problem. I tried various combinations, but just didn't hit upon your key segment of "ROW_NUMBER() OVER(PARTITION BY LESS_THAN_60 ORDER BY FROM_DATE)".

    Eventhough I'd like to have the EpisodeGroup value increment by one, it isn't necessary. Merely having distinct values that represent each EpisodeGroup should suffice.

    Thank you,

    Pete

  • Argh! Mark's solution almost works....

    I've discovered a scenario where the distinct EpisodeGroup misses the mark. Below is the sql script to create/populate the sample data that ends up with incorrect EpisodeGroup values:

    IF OBJECT_ID('TEMPDB..#tbl_EpisodeGroup') IS NOT NULL

    DROP TABLE #tbl_EpisodeGroup

    CREATE TABLE #tbl_EpisodeGroup(

    [ROWNUM] int NULL,

    [FROM_DATE] VARCHAR(8) NULL,

    [TO_DATE] VARCHAR(8) NULL,

    [DAYS] [int] NULL,

    [LESS_THAN_60] [int] NOT NULL,

    [EpisodeGroup] [varchar](10) NULL

    );

    /* populate the table */

    INSERT #tbl_EpisodeGroup (ROWNUM,FROM_DATE,TO_DATE,DAYS,LESS_THAN_60,EpisodeGroup)

    SELECT 1,'20090103' ,'20090301' , 57,1,''

    UNION

    SELECT 2,'20090301' ,'20090609' , 100,0,''

    UNION

    SELECT 3,'20090609' ,'20090802' , 54,1,''

    UNION

    SELECT 4,'20090802' ,'20090902' , 31,1,''

    Here's the query to view the dataset along with the EpisodeGroup stamping:

    SELECT ROWNUM,FROM_DATE,TO_DATE,DAYS,LESS_THAN_60,

    ROW_NUMBER() OVER(ORDER BY FROM_DATE)-

    ROW_NUMBER() OVER(PARTITION BY LESS_THAN_60 ORDER BY FROM_DATE) AS EpisodeGroup

    FROM #tbl_EpisodeGroup

    ORDER BY rownum

    The output appears as follows:

    ROWNUM FROM_DATE TO_DATE DAYS LESS_THAN_60 EpisodeGroup

    ----------- --------- -------- ----------- ------------ --------------------

    1 20090103 20090301 57 1 0

    2 20090301 20090609 100 0 1

    3 20090609 20090802 54 1 1

    4 20090802 20090902 31 1 1

    Unfortunately, the column EpisodeGroup incorrectly stamps Rownum #2 with the same EpisodeGroup value as rows #3 & #4. Since Rownum #2 has a 100 Days, it should have its own EpisodeGroup; therefore, rows # & 4 should be stamped with their own EpisodeGroup value. A total of 3 distinct EpisodeGroup values should exist.

    So close, Mark... I'll keep messing with variations on rownumber() and over().

    --Pete

  • peterzeke (9/2/2009)


    Argh! Mark's solution almost works....

    I've discovered a scenario where the distinct EpisodeGroup misses the mark. Below is the sql script to create/populate the sample data that ends up with incorrect EpisodeGroup values:

    IF OBJECT_ID('TEMPDB..#tbl_EpisodeGroup') IS NOT NULL

    DROP TABLE #tbl_EpisodeGroup

    CREATE TABLE #tbl_EpisodeGroup(

    [ROWNUM] int NULL,

    [FROM_DATE] VARCHAR(8) NULL,

    [TO_DATE] VARCHAR(8) NULL,

    [DAYS] [int] NULL,

    [LESS_THAN_60] [int] NOT NULL,

    [EpisodeGroup] [varchar](10) NULL

    );

    /* populate the table */

    INSERT #tbl_EpisodeGroup (ROWNUM,FROM_DATE,TO_DATE,DAYS,LESS_THAN_60,EpisodeGroup)

    SELECT 1,'20090103' ,'20090301' , 57,1,''

    UNION

    SELECT 2,'20090301' ,'20090609' , 100,0,''

    UNION

    SELECT 3,'20090609' ,'20090802' , 54,1,''

    UNION

    SELECT 4,'20090802' ,'20090902' , 31,1,''

    Here's the query to view the dataset along with the EpisodeGroup stamping:

    SELECT ROWNUM,FROM_DATE,TO_DATE,DAYS,LESS_THAN_60,

    ROW_NUMBER() OVER(ORDER BY FROM_DATE)-

    ROW_NUMBER() OVER(PARTITION BY LESS_THAN_60 ORDER BY FROM_DATE) AS EpisodeGroup

    FROM #tbl_EpisodeGroup

    ORDER BY rownum

    The output appears as follows:

    ROWNUM FROM_DATE TO_DATE DAYS LESS_THAN_60 EpisodeGroup

    ----------- --------- -------- ----------- ------------ --------------------

    1 20090103 20090301 57 1 0

    2 20090301 20090609 100 0 1

    3 20090609 20090802 54 1 1

    4 20090802 20090902 31 1 1

    Unfortunately, the column EpisodeGroup incorrectly stamps Rownum #2 with the same EpisodeGroup value as rows #3 & #4. Since Rownum #2 has a 100 Days, it should have its own EpisodeGroup; therefore, rows # & 4 should be stamped with their own EpisodeGroup value. A total of 3 distinct EpisodeGroup values should exist.

    So close, Mark... I'll keep messing with variations on rownumber() and over().

    --Pete

    I think the combination of LESS_THAN_60 and EpisodeGroup will give you a unique group, so try combining them

    SELECT ROWNUM,FROM_DATE,TO_DATE,DAYS,LESS_THAN_60,

    LESS_THAN_60 + 2*(ROW_NUMBER() OVER(ORDER BY FROM_DATE)-

    ROW_NUMBER() OVER(PARTITION BY LESS_THAN_60 ORDER BY FROM_DATE)) AS EpisodeGroup

    FROM #tbl_EpisodeGroup

    ORDER BY rownum

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • I think the combination of LESS_THAN_60 and EpisodeGroup will give you a unique group, so try combining them

    SELECT ROWNUM,FROM_DATE,TO_DATE,DAYS,LESS_THAN_60,

    LESS_THAN_60 + 2*(ROW_NUMBER() OVER(ORDER BY FROM_DATE)-

    ROW_NUMBER() OVER(PARTITION BY LESS_THAN_60 ORDER BY FROM_DATE)) AS EpisodeGroup

    FROM #tbl_EpisodeGroup

    ORDER BY rownum

    Mark, your updated code seems quite promising. On a few other test sample sets the results turned out perfectly. So, I'll apply your tweeked algorithm to a larger dataset.

    Prior to your response above, I tried using a multiplying factor in your original formula, but couldn't generate consistent satisfactory results.

    Looks like you've got the solution covered.

    I'll know more tomorrow after much more testing.

    Thanks again for your assistance.

    --Pete

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply