November 3, 2010 at 9:03 am
Hi! I'm processing patient data on thousands of patients and I'm trying to determine a way of identifying a set of records from a single table of events that meet certain criteria and are within 7 days of each other. Identifying the records is not an issue, but my problem is determining which records are within 7 days of each other. For example, a patient may have the following set of events:
Pat123 Code3 03/05/2010
Pat123 Code1 05/19/2010
Pat123 Code3 05/20/2010
Pat123 Code2 05/21/2010
Pat123 Code4 05/22/2010
Pat123 Code1 06/20/2010
Pat123 Code3 06/30/2010
Of course, this table contains other patient records as well. I'm looking for a solution that doesn't involve loops or cursors. The answer from the above would be:
Pat123 Code1 05/19/2010
Pat123 Code3 05/20/2010
Pat123 Code2 05/21/2010
Pat123 Code4 05/22/2010
Any thoughts?
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
November 3, 2010 at 9:24 am
So you have a specific code, then you want to see which items fall between a 7 day period for that code from the same table?
I'm not 100% on what you are doing, but you could do that with an inner join. Your first select would pull the code and date and then the inner join could be used to pull the records within the 7 day span.
November 3, 2010 at 9:29 am
Try this:
DECLARE @PatientsInfo TABLE
(
PatientID VARCHAR(15),
PatientCode VARCHAR(5),
Activity DATETIME
)
INSERT INTO @PatientsInfo
SELECT 'Pat123', 'Code3', '03/05/2010'
UNION ALL SELECT 'Pat123', 'Code1', '05/19/2010'
UNION ALL SELECT 'Pat123', 'Code3', '05/20/2010'
UNION ALL SELECT 'Pat123', 'Code2', '05/21/2010'
UNION ALL SELECT 'Pat123', 'Code4', '05/22/2010'
UNION ALL SELECT 'Pat123', 'Code1', '06/20/2010'
UNION ALL SELECT 'Pat123', 'Code3', '06/30/2010'
; WITH GiveARowNumber AS
(
SELECT
RowNum = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
,PatientID
,PatientCode
,Activity
FROM
@PatientsInfo
),
GroupedSet AS
(
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY RowNum)
,RowNum
,PatientID
,PatientCode
,Activity
FROM
GiveARowNumber
)
SELECT *
FROM GroupedSet GS1
INNER JOIN GroupedSet GS2
ON GS1.RN +1 = GS2.RN
AND GS1.PatientID = GS2.PatientID
WHERE
DATEDIFF(dd,GS2.Activity,GS1.Activity) <= 7
Please take a peek at how i am producing my table and how i am setting up sample data. You might want to do the same to attract more people to view your thread; more than attracting, you will get top-brainers to work on your request..
November 3, 2010 at 9:52 am
I think Monsieur ColdCoffee's code can be simplified - I don't think that the first CTE is necessary if in the second one, we order by the Activity in the ROW_NUMBER() function call.
; WITH GroupedSet AS
(
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY Activity)
,PatientID
,PatientCode
,Activity
FROM
@PatientsInfo
)
SELECT *
FROM GroupedSet GS1
INNER JOIN GroupedSet GS2
ON GS1.RN +1 = GS2.RN
AND GS1.PatientID = GS2.PatientID
WHERE
DATEDIFF(dd,GS2.Activity,GS1.Activity) <= 7
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 3, 2010 at 9:57 am
WayneS (11/3/2010)
I think Monsieur ColdCoffee's code can be simplified - I don't think that the first CTE is necessary if in the second one, we order by the Activity in the ROW_NUMBER() function call.
Wayne, specifically, did 2 ROW_NUMBERs just for the fact tht, if the PatientIDs dont follow an order (meaning a mx of PatientIDs in each row) , then that extra ROW_NUMBER will be effective..
Consider this sample data
DECLARE @PatientsInfo TABLE
(
PatientID VARCHAR(15),
PatientCode VARCHAR(5),
Activity DATETIME
)
INSERT INTO @PatientsInfo
SELECT 'Pat123', 'Code3', '03/05/2010'
UNION ALL SELECT 'Pat123', 'Code1', '05/19/2010'
UNION ALL SELECT 'Pat123', 'Code3', '05/20/2010'
UNION ALL SELECT 'Pat123', 'Code2', '05/21/2010'
UNION ALL SELECT 'Pat123', 'Code4', '05/22/2010'
UNION ALL SELECT 'Pat123', 'Code1', '06/20/2010'
UNION ALL SELECT 'Pat123', 'Code3', '06/30/2010'
UNION ALL SELECT 'Pat456', 'Code1', '05/19/2010'
UNION ALL SELECT 'Pat456', 'Code3', '05/20/2010'
UNION ALL SELECT 'Pat456', 'Code2', '05/21/2010'
UNION ALL SELECT 'Pat456', 'Code4', '05/22/2010'
UNION ALL SELECT 'Pat456', 'Code1', '06/20/2010'
UNION ALL SELECT 'Pat456', 'Code3', '06/30/2010'
UNION ALL SELECT 'Pat123', 'Code1', '05/19/2010'
UNION ALL SELECT 'Pat123', 'Code3', '05/20/2010'
UNION ALL SELECT 'Pat123', 'Code2', '05/21/2010'
UNION ALL SELECT 'Pat123', 'Code4', '05/22/2010'
UNION ALL SELECT 'Pat456', 'Code1', '06/20/2010'
UNION ALL SELECT 'Pat456', 'Code3', '06/30/2010'
November 3, 2010 at 10:00 am
WayneS (11/3/2010)
I think Monsieur ColdCoffee's code can be simplified - I don't think that the first CTE is necessary if in the second one, we order by the Activity in the ROW_NUMBER() function call.
; WITH GroupedSet AS
(
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY Activity)
,PatientID
,PatientCode
,Activity
FROM
@PatientsInfo
)
SELECT *
FROM GroupedSet GS1
INNER JOIN GroupedSet GS2
ON GS1.RN +1 = GS2.RN
AND GS1.PatientID = GS2.PatientID
WHERE
DATEDIFF(dd,GS2.Activity,GS1.Activity) <= 7
YEs, looking closely at the sample data provided in my first post and presuming ActivityDate column follows the order as u said, then your simplified version of the code wins 🙂
November 3, 2010 at 10:19 am
ColdCoffee (11/3/2010)You might want to do the same to attract more people to view your thread; more than attracting, you will get top-brainers to work on your request..
Thanks for your help. I fully intended to work up a better example using some code, but I was late for a meeting that I had to get to.
I'll try out your suggestions and see what develops. Again, thanks a lot!!
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
November 3, 2010 at 10:40 am
ColdCoffee (11/3/2010)
WayneS (11/3/2010)
I think Monsieur ColdCoffee's code can be simplified - I don't think that the first CTE is necessary if in the second one, we order by the Activity in the ROW_NUMBER() function call.Wayne, specifically, did 2 ROW_NUMBERs just for the fact tht, if the PatientIDs dont follow an order (meaning a mx of PatientIDs in each row) , then that extra ROW_NUMBER will be effective..
I could potentially see the need for that... we'll see what the OP says about it.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 3, 2010 at 10:41 am
Aaron N. Cutshall (11/3/2010)
ColdCoffee (11/3/2010)You might want to do the same to attract more people to view your thread; more than attracting, you will get top-brainers to work on your request..
Thanks for your help. I fully intended to work up a better example using some code, but I was late for a meeting that I had to get to.
I'll try out your suggestions and see what develops. Again, thanks a lot!!
Good. Let us know how it works out... you've got two curious folks (plus, I'm sure, some lurking).
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 3, 2010 at 4:34 pm
Thanks, Wayne. It will be a few days until I get to try it again. I'm off to a wee bit of a vacation to my brother's wedding. I'll be back next Wednesday and I'll give it a shot then.
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
November 11, 2010 at 7:27 am
OK, I finally had a chance to review the suggested code. To get it to work, I had to make a few modifications. I'm not sure that it's the best way, but it does work. Take a look at it and let me know what you think:
DECLARE @PatientsInfo TABLE (
PatientID VARCHAR(15),
PatientCode VARCHAR(5),
Activity DATETIME
);
INSERT INTO @PatientsInfo
VALUES ('Pat123', 'Code3', '03/05/2010'),
('Pat123', 'Code1', '05/19/2010'),
('Pat123', 'Code3', '05/20/2010'),
('Pat123', 'Code2', '05/21/2010'),
('Pat123', 'Code4', '05/22/2010'),
('Pat123', 'Code1', '06/20/2010'),
('Pat123', 'Code3', '06/30/2010'),
('Pat456', 'Code1', '05/19/2010'),
('Pat456', 'Code3', '05/20/2010'),
('Pat456', 'Code2', '05/21/2010'),
('Pat456', 'Code4', '05/22/2010'),
('Pat456', 'Code1', '06/20/2010'),
('Pat456', 'Code3', '06/30/2010');
WITH cteGroupedSet(PatientID, PatientCode, Activity, RowNbr) AS (
SELECT PatientID, PatientCode, Activity,
ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY Activity)
FROM @PatientsInfo
),
cteMatches(PatientID, PatientCode1, Activity1, PatientCode2, Activity2) AS (
SELECT GS1.PatientID, GS1.PatientCode, GS1.Activity, GS2.PatientCode, GS2.Activity
FROM cteGroupedSet GS1
INNER JOIN cteGroupedSet GS2 ON GS2.PatientID = GS1.PatientID
AND (GS2.RowNbr = GS1.RowNbr-1)
WHERE DATEDIFF(dd,GS2.Activity,GS1.Activity) <= 7
)
SELECT DISTINCT P.PatientID, P.PatientCode, P.Activity
FROM cteMatches M
INNER JOIN @PatientsInfo P ON P.PatientID = M.PatientID
AND (P.PatientCode = M.PatientCode1 OR P.PatientCode = M.PatientCode2)
AND (P.Activity = M.Activity1 OR P.Activity = M.Activity2)
ORDER BY P.PatientID, P.Activity;
I really don't care for using the DISTINCT clause on the last query, but I couldn't come up with a better solution. Any ideas?
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
November 11, 2010 at 2:25 pm
I've had an opportunity to work some more on this and the situation is a bit more complex than I originally thought. Therefore, the following code has been revised:
DECLARE @PatientsInfo TABLE (
PatientID VARCHAR(15),
EventCode VARCHAR(5),
EventDate DATETIME
);
INSERT INTO @PatientsInfo
VALUES('Pat123', 'Code3', '03/05/2010'),
('Pat123', 'Code3', '03/06/2010'),
('Pat123', 'Code1', '05/19/2010'),
('Pat123', 'Code3', '05/20/2010'),
('Pat123', 'Code2', '05/21/2010'),
('Pat123', 'Code4', '05/22/2010'),
('Pat123', 'Code1', '06/20/2010'),
('Pat123', 'Code3', '06/30/2010'),
('Pat456', 'Code3', '05/17/2010'),
('Pat456', 'Code1', '05/19/2010'),
('Pat456', 'Code3', '05/20/2010'),
('Pat456', 'Code2', '05/21/2010'),
('Pat456', 'Code4', '05/24/2010'),
('Pat456', 'Code4', '05/28/2010'),
('Pat456', 'Code1', '06/20/2010'),
('Pat456', 'Code3', '06/30/2010'),
('Pat123', 'Code1', '07/01/2010'),
('Pat123', 'Code2', '06/29/2010'),
('Pat123', 'Code4', '07/01/2010'),
('Pat123', 'Code2', '06/19/2010');
WITH cteGroupedSet(PatientID, EventCode, EventDate, RowNbr) AS (
SELECT PatientID, EventCode, EventDate,
ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY EventDate)
FROM @PatientsInfo
),
cteMatches(PatientID, EventCode, EventDate, RowNbr, GroupNbr) AS (
SELECT PatientID, EventCode, EventDate, RowNbr, RowNbr
FROM cteGroupedSet
WHERE RowNbr = 1
UNION ALL
SELECT G.PatientID, G.EventCode, G.EventDate, G.RowNbr,
CASE WHEN (ABS(DATEDIFF(dd,M.EventDate,G.EventDate))<=7) THEN M.GroupNbr ELSE M.GroupNbr+1 END
FROM cteMatches M
INNER JOIN cteGroupedSet G ON G.PatientID = M.PatientID AND G.RowNbr-1 = M.RowNbr
WHERE G.RowNbr > 1
),
cteUnique(PatientID, GroupNbr, EventCode, EventDate) AS (
SELECT PatientID, GroupNbr, EventCode, MAX(EventDate)
FROM cteMatches
GROUP BY PatientID, GroupNbr, EventCode
),
cteCompleteGroups(PatientID, GroupNbr, AvgEventDate) AS (
SELECT PatientID, GroupNbr, CAST(AVG(CAST(EventDate as int)) as datetime)
FROM cteUnique
GROUP BY PatientID, GroupNbr
HAVING COUNT(*) = 4
)
SELECT CG.PatientID, P.EventCode, CG.GroupNbr, P.EventDate
FROM cteCompleteGroups CG
INNER JOIN @PatientsInfo P ON P.PatientID = CG.PatientID
WHERE ABS(DATEDIFF(dd,P.EventDate,CAST(AvgEventDate as datetime))) <= 3
ORDER BY CG.PatientID, P.EventDate;
This now accommodates situations wherein we want to ignore partial clusters, multiple codes clustered together; and when the entire cluster goes beyond the 7 day range.
I think that I can be happy with this arrangement! 🙂
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
November 11, 2010 at 9:18 pm
Aaron,
It does look a bit complicated, but it looks efficient also. Thanks for the feedback!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply