Best approach?

  • Greetings -

    I'm trying to decide the best approach to gathering spans of time based on whether or not a result was YES or NO. My brain hurts from thinking about it so I'm hoping someone smarter than me can help me out. Thanks in advance.

    Here's the table:

    IDNOYES

    1NULL1/1/2011

    2NULL1/2/2011

    3NULL1/2/2011

    4NULL1/3/2011

    5NULL1/4/2011

    6NULL1/5/2011

    7NULL1/6/2011

    8NULL1/7/2011

    9NULL1/7/2011

    101/8/2011NULL

    11NULL1/9/2011

    12NULL1/10/2011

    I need to calculate the number of days (span) between the result of YES and the first time the result was NO. It has to start over every time the response changes.

    For example, between 1/1 and 1/7 the result was YES for a 7 day span. Then the result was NO for a 1 day span, and YES again for a 2 day span.

    In my mind, I see it like this (pseudo code)

    do until NO is not null

    datediff(dd,YES,NO)

    next

    I know I'm going about this the wrong way but I can't seem to see it any other way; hence, the call for help.

    Thanks.

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The only difference between impossible and possible is time...and a little help from my friends.

  • Just to clarify: You don't want to calculate from the specific date given if it's in the middle of a series of YES values, you want to calculate from the first YES value to whatever NO value comes next.

    That would give 7 days for ID 1, or for ID 7, or anything in between.

    Am I understanding that correctly?

    Or, for ID 7, do you want 1 day (the number of days from the 7th to the 8th), and for ID 1 you want 7 days (days from 1st to 8th)?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Wow, thanks. Yes, you understand it perfectly. It would be the 2nd version (I think) - for IDs 1 through 7, it would be 7 days positive (YES) result. For ID 10, it would be 1 day negative (NO) result, for IDs 11 & 12, 2 days positive.

    My report would eventually look something like this (this is for lab results):

    Patient Name

    1/1 to 1/7

    Positive

    1/8

    Negative

    1/9 to 1/10

    Positive

    Hope the formatting comes across. Thanks so much.

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The only difference between impossible and possible is time...and a little help from my friends.

  • I'm pretty sure there's a part of the following query that is probably in some way not going to scale up well, although it runs in my SQL 2008 R2 instance on my laptop in very short order, there's so little data that it might be impossible to guage scaled out performance without testing on larger numbers of records. I am so sure that I remember seeing code in a book somewhere that had a very efficient means of selecting out time periods, but I can't find it, so I tried to get the best result I can. It's likely someone here may be able to improve on this, but here goes:

    CREATE TABLE #INPUT_DATA (

    ID int IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

    NO_DATE date,

    YES_DATE date

    )

    INSERT INTO #INPUT_DATA (NO_DATE, YES_DATE)

    SELECT NULL, '01/01/2011' UNION ALL

    SELECT NULL, '01/02/2011' UNION ALL

    SELECT NULL, '01/03/2011' UNION ALL

    SELECT NULL, '01/04/2011' UNION ALL

    SELECT NULL, '01/05/2011' UNION ALL

    SELECT NULL, '01/06/2011' UNION ALL

    SELECT NULL, '01/07/2011' UNION ALL

    SELECT '01/08/2011', NULL UNION ALL

    SELECT NULL, '01/09/2011' UNION ALL

    SELECT NULL, '01/10/2011'

    DECLARE @MAX_ID AS int

    SET @MAX_ID = (SELECT MAX(ID) FROM #INPUT_DATA)

    ;WITH ANSWERS AS (

    SELECT ID,

    CASE

    WHEN NO_DATE IS NOT NULL THEN 'NO'

    ELSE 'YES'

    END AS ANSWER,

    CASE

    WHEN NO_DATE IS NOT NULL THEN NO_DATE

    ELSE YES_DATE

    END AS THE_DATE

    FROM #INPUT_DATA

    ),

    STARTS_AND_STOPS AS (

    SELECT ROW_NUMBER() OVER(ORDER BY ID) AS ROW_ID, X.*

    FROM (

    SELECT ID, ANSWER, THE_DATE,

    CASE ID

    WHEN 1 THEN CAST('START' AS varchar(5))

    WHEN @MAX_ID THEN CAST('END' AS varchar(5))

    END AS DATE_TYPE

    FROM ANSWERS

    WHERE ID IN (1, @MAX_ID)

    UNION

    SELECT A.ID, A.ANSWER, A.THE_DATE,

    CASE

    WHEN (A.ANSWER <> B.ANSWER) AND (A.ANSWER <> C.ANSWER) AND (B.ANSWER = C.ANSWER) THEN 'BOTH'

    WHEN A.ANSWER <> B.ANSWER THEN 'END'

    WHEN A.ANSWER <> C.ANSWER THEN 'START'

    END AS DATE_TYPE

    FROM ANSWERS AS A

    INNER JOIN ANSWERS AS B-- TABLE B HAS RECORDS WITH ID 1 HIGHER THAN TABLE A

    ON A.ID = B.ID - 1

    INNER JOIN ANSWERS AS C-- TABLE C HAS RECORDS WITH ID 1 SMALLER THAN TABLE A

    ON A.ID = C.ID + 1

    WHERE A.ANSWER <> B.ANSWER

    OR B.ANSWER <> C.ANSWER

    ) AS X

    ),

    TOP_ROW AS (

    SELECT TOP 1 ROW_ID, ID, ANSWER, THE_DATE, DATE_TYPE

    FROM STARTS_AND_STOPS

    ORDER BY ROW_ID

    ),

    BOTTOM_ROW AS (

    SELECT TOP 1 ROW_ID, ID, ANSWER, THE_DATE, DATE_TYPE

    FROM STARTS_AND_STOPS

    ORDER BY ROW_ID DESC

    ),

    CLASSIFIED_DATA AS (

    SELECT S1.ROW_ID, S1.ID, S1.ANSWER, S1.THE_DATE, S1.DATE_TYPE

    FROM STARTS_AND_STOPS AS S1

    INNER JOIN STARTS_AND_STOPS AS S2

    ON S1.ROW_ID = S2.ROW_ID - 1

    INNER JOIN STARTS_AND_STOPS AS S3

    ON S1.ROW_ID = S3.ROW_ID + 1

    UNION

    SELECT ROW_ID, ID, ANSWER, THE_DATE, DATE_TYPE

    FROM TOP_ROW

    UNION

    SELECT ROW_ID, ID, ANSWER, THE_DATE, DATE_TYPE

    FROM BOTTOM_ROW

    )

    SELECT A.ANSWER, 'FROM ' +

    CASE A.DATE_TYPE

    WHEN 'START' THEN CAST(A.THE_DATE AS varchar(10)) + ' TO ' + CAST(B.THE_DATE AS varchar(10))

    WHEN 'BOTH' THEN CAST(A.THE_DATE AS varchar(10)) + ' TO ' + CAST(A.THE_DATE AS varchar(10))

    END AS PERIOD

    FROM CLASSIFIED_DATA AS A

    INNER JOIN CLASSIFIED_DATA AS B

    ON A.ROW_ID = B.ROW_ID - 1

    WHERE A.DATE_TYPE IN ('START', 'BOTH')

    ORDER BY A.ROW_ID

    DROP TABLE #INPUT_DATA

    Enjoy...

    Steve

    (aka sgmunson)

    :-):-):-)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I suppose that you want to do this calculation for each patient.

    This could be a nice way to do it with a single scan:

    IF OBJECT_ID('tempdb..#INPUT_DATA') IS NOT NULL DROP TABLE #INPUT_DATA

    CREATE TABLE #INPUT_DATA (

    ID int IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

    PATIENT_ID int NOT NULL,

    NO_DATE datetime NULL,

    YES_DATE datetime NULL

    )

    INSERT INTO #INPUT_DATA (PATIENT_ID, NO_DATE, YES_DATE)

    SELECT 1, NULL, '01/01/2011' UNION ALL

    SELECT 1, NULL, '01/02/2011' UNION ALL

    SELECT 1, NULL, '01/03/2011' UNION ALL

    SELECT 1, NULL, '01/04/2011' UNION ALL

    SELECT 1, NULL, '01/05/2011' UNION ALL

    SELECT 1, NULL, '01/06/2011' UNION ALL

    SELECT 1, NULL, '01/07/2011' UNION ALL

    SELECT 1, '01/08/2011', NULL UNION ALL

    SELECT 1, '01/09/2011', NULL UNION ALL

    SELECT 1, '01/10/2011', NULL UNION ALL

    SELECT 1, NULL, '01/11/2011' UNION ALL

    SELECT 1, NULL, '01/12/2011' UNION ALL

    SELECT 1, '01/13/2011', NULL UNION ALL

    SELECT 1, '01/14/2011', NULL UNION ALL

    SELECT 1, '01/15/2011', NULL UNION ALL

    SELECT 2, NULL, '01/04/2011' UNION ALL

    SELECT 2, NULL, '01/05/2011' UNION ALL

    SELECT 2, NULL, '01/06/2011' UNION ALL

    SELECT 2, NULL, '01/07/2011' UNION ALL

    SELECT 2, NULL, '01/08/2011' UNION ALL

    SELECT 2, NULL, '01/09/2011' UNION ALL

    SELECT 2, '01/10/2011', NULL UNION ALL

    SELECT 2, '01/11/2011', NULL UNION ALL

    SELECT 2, '01/12/2011', NULL UNION ALL

    SELECT 2, '01/13/2011', NULL UNION ALL

    SELECT 2, NULL, '01/14/2011' UNION ALL

    SELECT 2, '01/15/2011', NULL UNION ALL

    SELECT 2, '01/16/2011', NULL UNION ALL

    SELECT 2, '01/17/2011', NULL UNION ALL

    SELECT 2, NULL, '01/18/2011'

    SELECT PATIENT_ID, MIN_DATE = MIN(ISNULL(YES_DATE, NO_DATE)), MAX_DATE = MAX(ISNULL(YES_DATE, NO_DATE)), VALUE

    FROM (

    SELECT *, RN3 = RN - RN2

    FROM (

    SELECT *,

    RN = ROW_NUMBER() OVER (PARTITION BY PATIENT_ID ORDER BY ID),

    RN2 = ROW_NUMBER() OVER (PARTITION BY PATIENT_ID, VALUE ORDER BY ID)

    FROM (

    SELECT *, VALUE = CASE WHEN NO_DATE IS NULL THEN 'YES' ELSE 'NO' END

    FROM #INPUT_DATA

    ) AS SRC

    ) AS SRC2

    ) AS SRC3

    GROUP BY PATIENT_ID, RN3, VALUE

    ORDER BY PATIENT_ID, MIN_DATE

    Hope this helps,

    Gianluca

    -- Gianluca Sartori

  • I am so happy right now. Thanks so very much to both/all of you. My brain is no longer stuck in a do loop. I appreciate the help!!

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The only difference between impossible and possible is time...and a little help from my friends.

  • Glad we could help. 🙂

    -- Gianluca Sartori

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply