August 2, 2011 at 2:21 pm
I have a table called patientDates
PatientId int,
DivDate datetime
There can be multiple dates for a patient for eg
PatientId DivDate
1 2/1/2011
1 2/10/2011
1 6/5/2011
1 6/30/2011
1 7/8/2011
2 3/1/2011
2 5/3/2011
I need only first records that belong within 30 day intervals by patient. For eg. for the above data I need the following rows
1 2/1/2011 (Its the first record)
1 6/5/2011 (This is the min date after the first 30 day interval ie min(divDate) which is greater than first minDate + 30 days)
1 7/8/2011 (This is the min date after the 6/5/2011 + 30 days) and so on and so forth
How can we write a sql to get only these records. I have been thinking about it for hours now and cannot really figure it out. Appreciate all your help
PS - Looking for a set based solution and not cursor based.
Thanks
RK
August 2, 2011 at 2:31 pm
Since each row is dependent on the value of the prior row, it's not a set-based dataset.
The easiest way I can think of to do this is a recursive CTE. It's just like a hierarchy-crawl, except it uses a date value instead of a parent ID. I'm not sure recursive CTEs can use the Min() function, but if they can, that'll be easy to code.
Otherwise, you'll need to write your own loop for it. Should be no big deal to code that if you need to.
But it's not a true relational set, since rows in it are ordered and dependent on prior rows, so don't worry about that aspect of it.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 2, 2011 at 2:44 pm
Here is a technique. It would need to be tested for usuability with a large record set.
-- Define Table
DECLARE @Table AS TABLE (PatientID INT, DivDate DATETIME);
-- Populate Table
INSERT INTO @Table(PatientID,DivDate)VALUES(1, '2/1/2011');
INSERT INTO @Table(PatientID,DivDate)VALUES(1, '2/10/2011');
INSERT INTO @Table(PatientID,DivDate)VALUES(1, '6/5/2011');
INSERT INTO @Table(PatientID,DivDate)VALUES(1, '6/30/2011');
INSERT INTO @Table(PatientID,DivDate)VALUES(1, '7/8/2011');
INSERT INTO @Table(PatientID,DivDate)VALUES(2,'3/1/2011');
INSERT INTO @Table(PatientID,DivDate)VALUES(2, '5/3/2011');
--Verify Table Poluation
SELECT * FROM @Table;
WITH PatientSchedule AS
(SELECT PatientID, Min(DivDate) FirstVisit, Min(DivDate) + 30 SecondVisit, Min(DivDate) + 60 ThirdVisit, Min(DivDate) + 90 FourthVisit
FROM @Table
GROUP BY PatientID)
SELECT T.PatientID, Min(T.DivDate)
FROM @Table as t
JOIN PatientSchedule as PS ON PS.PatientID = t.PatientID
WHERE T.DivDate BETWEEN PS.FirstVisit AND PS.SecondVisit
GROUP BY T.PatientID
UNION
SELECT T.PatientID, Min(T.DivDate)
FROM @Table as t
JOIN PatientSchedule as PS ON PS.PatientID = t.PatientID
WHERE T.DivDate BETWEEN PS.SecondVisit AND PS.ThirdVisit
GROUP BY T.PatientID
UNION
SELECT T.PatientID, Min(T.DivDate)
FROM @Table as t
JOIN PatientSchedule as PS ON PS.PatientID = t.PatientID
WHERE T.DivDate BETWEEN PS.ThirdVisit AND PS.FourthVisit
GROUP BY T.PatientID;
August 2, 2011 at 2:51 pm
Sorry I just went back to review the data and it's not working correctly.
I'm getting the following for the results:
PatientID(No column name)
12011-02-01 00:00:00.000
22011-03-01 00:00:00.000
22011-05-03 00:00:00.000
August 2, 2011 at 4:55 pm
Thanks GSquared for confirming what I was thinking.
I tried recursive CTE and you are right it does not allow aggregate functions
Based on your tip, I found a semi set based solution in a loop instead of using the cursor.
EccentricDBA - thanks for your effort to help me on this.
Thanks
RK
August 2, 2011 at 9:47 pm
I found a semi set based solution in a loop instead of using the cursor.
Would you mind posting your solution up here?
Don't forget the warnings about performance. For high volumes of data, these kinds of problems are much better handled by procedural languages. Why do you reject a cursor solution here? With a supporting covering index ordered by patient number and date, I think you could get very satisfactory performance out of a cursor solution as opposed to a while loop.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 3, 2011 at 2:30 am
Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():
http://www.sqlservercentral.com/Forums/FindPost1137945.aspx
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 3, 2011 at 7:30 am
ChrisM@Work (8/3/2011)
Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():
It's probably going to be slower than a hand-crafted loop in this kind of situation. Row_Number and the other ranking functions can often be performance hogs, and recursive CTEs are often low-performance anyway. Mixing the two is usually slow compared to a well-built loop.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 3, 2011 at 7:40 am
GSquared (8/3/2011)
ChrisM@Work (8/3/2011)
Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():It's probably going to be slower than a hand-crafted loop in this kind of situation. Row_Number and the other ranking functions can often be performance hogs, and recursive CTEs are often low-performance anyway. Mixing the two is usually slow compared to a well-built loop.
That was my experience too, before PW posted this π
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 3, 2011 at 7:42 am
ChrisM@Work (8/3/2011)
GSquared (8/3/2011)
ChrisM@Work (8/3/2011)
Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():It's probably going to be slower than a hand-crafted loop in this kind of situation. Row_Number and the other ranking functions can often be performance hogs, and recursive CTEs are often low-performance anyway. Mixing the two is usually slow compared to a well-built loop.
That was my experience too, before PW posted this π
Yep. But your mileage may vary, so test multiple solutions. That's what I always do.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 5, 2011 at 3:19 pm
Another possibility using quirky update.......thoughts anyone please?
regards gah
--=====
USE TempDB
;
--===== conditionally drop the test table
IF OBJECT_ID('TempDB.dbo.PatientDates','U') IS NOT NULL
DROP TABLE TempDB.dbo.PatientDates
;
----===== to test data as provided in OP -- RowId added
----===== start from here
CREATE TABLE [dbo].[PatientDates]
(
[PatientId] [int] NOT NULL,
[DivDate] [datetime] NOT NULL,
[RowId] [int] IDENTITY(1,1)
)
INSERT INTO [dbo].[PatientDates]([PatientId], [DivDate])
SELECT 1, '20110201 00:00:00.000' UNION ALL
SELECT 1, '20110210 00:00:00.000' UNION ALL
SELECT 1, '20110605 00:00:00.000' UNION ALL
SELECT 1, '20110630 00:00:00.000' UNION ALL
SELECT 1, '20110708 00:00:00.000' UNION ALL
SELECT 2, '20110302 00:00:00.000' UNION ALL
SELECT 2, '20110503 00:00:00.000'
;
--== end here
--===== to test larger data sets -- create test table with 100000 records and 1000 patients --- dates from 01/01/10 to 31/12/2012 --- patient ID 1 to 1000
--===== overall runtime is for total process about 4 secs on my PC
--===== start from here
--SELECT TOP 100000
-- PatientID = CAST(RAND(CHECKSUM(NEWID()))*1000 +1 as INT) ,
-- Divdate = dateadd(dd, datediff(dd, 0, (CAST(RAND(CHECKSUM(NEWID()))*1096+40177 AS DATETIME))), 0),
-- IDENTITY(int,1,1) AS RowId
-- INTO PatientDates
-- FROM sys.All_Columns t1,
-- sys.All_Columns t2
--;
--== end here
----===== Conditionally drop the Temp Table
IF OBJECT_ID('TempDB..#PatientDates','U') IS NOT NULL
DROP TABLE #PatientDates
;
--===== Create and copy the table on the fly
SELECT PatientId, DivDate, RowId, Divdate as periodstartdate
INTO #PatientDates
FROM PatientDates
;
--===== Check and make sure there's at least 1 row to work with.
IF @@ROWCOUNT = 0
BEGIN
RAISERROR('No rows copied . Exiting early.',10,1) WITH NOWAIT;
RETURN;
END
;
--======= Create Clustered Index....required by quirky update
CREATE UNIQUE CLUSTERED INDEX IX_#PatientDates_CI
ON #PatientDates (PatientId, DivDate, RowId) WITH FILLFACTOR = 100
;
-----------------------------------------------------------------------------------------
-- Calculate the running total using "quirky update" ....PLEASE READ FOLLOWING ARTICLE for detailed information.
-- http://www.sqlservercentral.com/articles/T-SQL/68467/
--- by Jeff Moden
DECLARE @SafetyCounter BIGINT,
@PrevPatientID INT,
@periodstartdate DATETIME
;
--===== Preset the Safety Counter (used as "Anchor")
SELECT @SafetyCounter = 1
;
WITH
cteWork AS
(
SELECT SafetyCounter = ROW_NUMBER() OVER (ORDER BY PatientId, DivDate),
PatientId, DivDate, periodstartdate
FROM #PatientDates
)
UPDATE cteWork
SET
@periodstartdate
= periodstartdate
= CASE
WHEN SafetyCounter = @SafetyCounter --Makes sure we're working on the correct row.
THEN CASE
WHEN PatientId = @PrevPatientID
AND datediff(day,@periodstartdate,divdate) < 31 --- for 30 day intervals
THEN @periodstartdate
ELSE periodstartdate
END
ELSE 1/0 --Forces error if out of sync.
END,
@PrevpatientID = PatientId,
@SafetyCounter = @SafetyCounter + 1
FROM cteWork WITH (TABLOCKX)
OPTION (MAXDOP 1)
--==== RESULTS
SELECT PatientID, periodstartdate, COUNT(periodstartdate)as No_of_Visits FROM #PatientDates
GROUP BY PatientID, periodstartdate
order by PatientID, periodstartdate
;
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 7, 2011 at 2:16 pm
The Dixie Flatline (8/2/2011)
I found a semi set based solution in a loop instead of using the cursor.
Would you mind posting your solution up here?
Don't forget the warnings about performance. For high volumes of data, these kinds of problems are much better handled by procedural languages. Why do you reject a cursor solution here? With a supporting covering index ordered by patient number and date, I think you could get very satisfactory performance out of a cursor solution as opposed to a while loop.
Hi RK
I am keen to see the solution you have found as well...I have a similar problem that I need to resolve ....pls see my earlier post..would appreciate your feedback
kind regards gah
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 7, 2011 at 2:34 pm
I had to do something like that for a client a couple years back. The eventual solution was partially set-based, with a while loop tossed in and came out at almost 2 pages of T-SQL by the end.
It's not an easily-solved problem, especially if the data is 'interesting' ...
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 7, 2011 at 2:58 pm
Ben-Gan has some very interesting date-span code that runs like the wind. It would be perfect for this and actually negates the need for a Quirky Update. In fact, it beats the Quirky Update for performance... heh... yeah, I tried. π
--Jeff Moden
Change is inevitable... Change for the better is not.
August 7, 2011 at 3:51 pm
vermarajiv (8/2/2011)
Thanks GSquared for confirming what I was thinking.I tried recursive CTE and you are right it does not allow aggregate functions
Based on your tip, I found a semi set based solution in a loop instead of using the cursor.
EccentricDBA - thanks for your effort to help me on this.
Thanks
RK
Please... it's a 2 way street here. Please post the solution you found so that we may all learn. Thanks. π
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 32 total)
You must be logged in to reply to this topic. Login to reply