How will you write a query for this scenario

  • I have a table called patientDates

    PatientId int,

    DivDate datetime

    There can be multiple dates for a patient for eg

    PatientId DivDate

    1 2/1/2011

    1 2/10/2011

    1 6/5/2011

    1 6/30/2011

    1 7/8/2011

    2 3/1/2011

    2 5/3/2011

    I need only first records that belong within 30 day intervals by patient. For eg. for the above data I need the following rows

    1 2/1/2011 (Its the first record)

    1 6/5/2011 (This is the min date after the first 30 day interval ie min(divDate) which is greater than first minDate + 30 days)

    1 7/8/2011 (This is the min date after the 6/5/2011 + 30 days) and so on and so forth

    How can we write a sql to get only these records. I have been thinking about it for hours now and cannot really figure it out. Appreciate all your help

    PS - Looking for a set based solution and not cursor based.

    Thanks

    RK

  • Since each row is dependent on the value of the prior row, it's not a set-based dataset.

    The easiest way I can think of to do this is a recursive CTE. It's just like a hierarchy-crawl, except it uses a date value instead of a parent ID. I'm not sure recursive CTEs can use the Min() function, but if they can, that'll be easy to code.

    Otherwise, you'll need to write your own loop for it. Should be no big deal to code that if you need to.

    But it's not a true relational set, since rows in it are ordered and dependent on prior rows, so don't worry about that aspect of it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Here is a technique. It would need to be tested for usuability with a large record set.

    -- Define Table

    DECLARE @Table AS TABLE (PatientID INT, DivDate DATETIME);

    -- Populate Table

    INSERT INTO @Table(PatientID,DivDate)VALUES(1, '2/1/2011');

    INSERT INTO @Table(PatientID,DivDate)VALUES(1, '2/10/2011');

    INSERT INTO @Table(PatientID,DivDate)VALUES(1, '6/5/2011');

    INSERT INTO @Table(PatientID,DivDate)VALUES(1, '6/30/2011');

    INSERT INTO @Table(PatientID,DivDate)VALUES(1, '7/8/2011');

    INSERT INTO @Table(PatientID,DivDate)VALUES(2,'3/1/2011');

    INSERT INTO @Table(PatientID,DivDate)VALUES(2, '5/3/2011');

    --Verify Table Poluation

    SELECT * FROM @Table;

    WITH PatientSchedule AS

    (SELECT PatientID, Min(DivDate) FirstVisit, Min(DivDate) + 30 SecondVisit, Min(DivDate) + 60 ThirdVisit, Min(DivDate) + 90 FourthVisit

    FROM @Table

    GROUP BY PatientID)

    SELECT T.PatientID, Min(T.DivDate)

    FROM @Table as t

    JOIN PatientSchedule as PS ON PS.PatientID = t.PatientID

    WHERE T.DivDate BETWEEN PS.FirstVisit AND PS.SecondVisit

    GROUP BY T.PatientID

    UNION

    SELECT T.PatientID, Min(T.DivDate)

    FROM @Table as t

    JOIN PatientSchedule as PS ON PS.PatientID = t.PatientID

    WHERE T.DivDate BETWEEN PS.SecondVisit AND PS.ThirdVisit

    GROUP BY T.PatientID

    UNION

    SELECT T.PatientID, Min(T.DivDate)

    FROM @Table as t

    JOIN PatientSchedule as PS ON PS.PatientID = t.PatientID

    WHERE T.DivDate BETWEEN PS.ThirdVisit AND PS.FourthVisit

    GROUP BY T.PatientID;

  • Sorry I just went back to review the data and it's not working correctly.

    I'm getting the following for the results:

    PatientID(No column name)

    12011-02-01 00:00:00.000

    22011-03-01 00:00:00.000

    22011-05-03 00:00:00.000

  • Thanks GSquared for confirming what I was thinking.

    I tried recursive CTE and you are right it does not allow aggregate functions

    Based on your tip, I found a semi set based solution in a loop instead of using the cursor.

    EccentricDBA - thanks for your effort to help me on this.

    Thanks

    RK

  • I found a semi set based solution in a loop instead of using the cursor.

    Would you mind posting your solution up here?

    Don't forget the warnings about performance. For high volumes of data, these kinds of problems are much better handled by procedural languages. Why do you reject a cursor solution here? With a supporting covering index ordered by patient number and date, I think you could get very satisfactory performance out of a cursor solution as opposed to a while loop.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():

    http://www.sqlservercentral.com/Forums/FindPost1137945.aspx

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/3/2011)


    Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():

    http://www.sqlservercentral.com/Forums/FindPost1137945.aspx

    It's probably going to be slower than a hand-crafted loop in this kind of situation. Row_Number and the other ranking functions can often be performance hogs, and recursive CTEs are often low-performance anyway. Mixing the two is usually slow compared to a well-built loop.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (8/3/2011)


    ChrisM@Work (8/3/2011)


    Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():

    http://www.sqlservercentral.com/Forums/FindPost1137945.aspx

    It's probably going to be slower than a hand-crafted loop in this kind of situation. Row_Number and the other ranking functions can often be performance hogs, and recursive CTEs are often low-performance anyway. Mixing the two is usually slow compared to a well-built loop.

    That was my experience too, before PW posted this πŸ˜‰

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/3/2011)


    GSquared (8/3/2011)


    ChrisM@Work (8/3/2011)


    Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():

    http://www.sqlservercentral.com/Forums/FindPost1137945.aspx

    It's probably going to be slower than a hand-crafted loop in this kind of situation. Row_Number and the other ranking functions can often be performance hogs, and recursive CTEs are often low-performance anyway. Mixing the two is usually slow compared to a well-built loop.

    That was my experience too, before PW posted this πŸ˜‰

    Yep. But your mileage may vary, so test multiple solutions. That's what I always do.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Another possibility using quirky update.......thoughts anyone please?

    regards gah

    --=====

    USE TempDB

    ;

    --===== conditionally drop the test table

    IF OBJECT_ID('TempDB.dbo.PatientDates','U') IS NOT NULL

    DROP TABLE TempDB.dbo.PatientDates

    ;

    ----===== to test data as provided in OP -- RowId added

    ----===== start from here

    CREATE TABLE [dbo].[PatientDates]

    (

    [PatientId] [int] NOT NULL,

    [DivDate] [datetime] NOT NULL,

    [RowId] [int] IDENTITY(1,1)

    )

    INSERT INTO [dbo].[PatientDates]([PatientId], [DivDate])

    SELECT 1, '20110201 00:00:00.000' UNION ALL

    SELECT 1, '20110210 00:00:00.000' UNION ALL

    SELECT 1, '20110605 00:00:00.000' UNION ALL

    SELECT 1, '20110630 00:00:00.000' UNION ALL

    SELECT 1, '20110708 00:00:00.000' UNION ALL

    SELECT 2, '20110302 00:00:00.000' UNION ALL

    SELECT 2, '20110503 00:00:00.000'

    ;

    --== end here

    --===== to test larger data sets -- create test table with 100000 records and 1000 patients --- dates from 01/01/10 to 31/12/2012 --- patient ID 1 to 1000

    --===== overall runtime is for total process about 4 secs on my PC

    --===== start from here

    --SELECT TOP 100000

    -- PatientID = CAST(RAND(CHECKSUM(NEWID()))*1000 +1 as INT) ,

    -- Divdate = dateadd(dd, datediff(dd, 0, (CAST(RAND(CHECKSUM(NEWID()))*1096+40177 AS DATETIME))), 0),

    -- IDENTITY(int,1,1) AS RowId

    -- INTO PatientDates

    -- FROM sys.All_Columns t1,

    -- sys.All_Columns t2

    --;

    --== end here

    ----===== Conditionally drop the Temp Table

    IF OBJECT_ID('TempDB..#PatientDates','U') IS NOT NULL

    DROP TABLE #PatientDates

    ;

    --===== Create and copy the table on the fly

    SELECT PatientId, DivDate, RowId, Divdate as periodstartdate

    INTO #PatientDates

    FROM PatientDates

    ;

    --===== Check and make sure there's at least 1 row to work with.

    IF @@ROWCOUNT = 0

    BEGIN

    RAISERROR('No rows copied . Exiting early.',10,1) WITH NOWAIT;

    RETURN;

    END

    ;

    --======= Create Clustered Index....required by quirky update

    CREATE UNIQUE CLUSTERED INDEX IX_#PatientDates_CI

    ON #PatientDates (PatientId, DivDate, RowId) WITH FILLFACTOR = 100

    ;

    -----------------------------------------------------------------------------------------

    -- Calculate the running total using "quirky update" ....PLEASE READ FOLLOWING ARTICLE for detailed information.

    -- http://www.sqlservercentral.com/articles/T-SQL/68467/

    --- by Jeff Moden

    DECLARE @SafetyCounter BIGINT,

    @PrevPatientID INT,

    @periodstartdate DATETIME

    ;

    --===== Preset the Safety Counter (used as "Anchor")

    SELECT @SafetyCounter = 1

    ;

    WITH

    cteWork AS

    (

    SELECT SafetyCounter = ROW_NUMBER() OVER (ORDER BY PatientId, DivDate),

    PatientId, DivDate, periodstartdate

    FROM #PatientDates

    )

    UPDATE cteWork

    SET

    @periodstartdate

    = periodstartdate

    = CASE

    WHEN SafetyCounter = @SafetyCounter --Makes sure we're working on the correct row.

    THEN CASE

    WHEN PatientId = @PrevPatientID

    AND datediff(day,@periodstartdate,divdate) < 31 --- for 30 day intervals

    THEN @periodstartdate

    ELSE periodstartdate

    END

    ELSE 1/0 --Forces error if out of sync.

    END,

    @PrevpatientID = PatientId,

    @SafetyCounter = @SafetyCounter + 1

    FROM cteWork WITH (TABLOCKX)

    OPTION (MAXDOP 1)

    --==== RESULTS

    SELECT PatientID, periodstartdate, COUNT(periodstartdate)as No_of_Visits FROM #PatientDates

    GROUP BY PatientID, periodstartdate

    order by PatientID, periodstartdate

    ;

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • The Dixie Flatline (8/2/2011)


    I found a semi set based solution in a loop instead of using the cursor.

    Would you mind posting your solution up here?

    Don't forget the warnings about performance. For high volumes of data, these kinds of problems are much better handled by procedural languages. Why do you reject a cursor solution here? With a supporting covering index ordered by patient number and date, I think you could get very satisfactory performance out of a cursor solution as opposed to a while loop.

    Hi RK

    I am keen to see the solution you have found as well...I have a similar problem that I need to resolve ....pls see my earlier post..would appreciate your feedback

    kind regards gah

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • I had to do something like that for a client a couple years back. The eventual solution was partially set-based, with a while loop tossed in and came out at almost 2 pages of T-SQL by the end.

    It's not an easily-solved problem, especially if the data is 'interesting' ...

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ben-Gan has some very interesting date-span code that runs like the wind. It would be perfect for this and actually negates the need for a Quirky Update. In fact, it beats the Quirky Update for performance... heh... yeah, I tried. πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • vermarajiv (8/2/2011)


    Thanks GSquared for confirming what I was thinking.

    I tried recursive CTE and you are right it does not allow aggregate functions

    Based on your tip, I found a semi set based solution in a loop instead of using the cursor.

    EccentricDBA - thanks for your effort to help me on this.

    Thanks

    RK

    Please... it's a 2 way street here. Please post the solution you found so that we may all learn. Thanks. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply