June 28, 2015 at 10:12 pm
Hello-
I've been struggling with this for some time. we have to group data based on Patients admission date and discharge date. If any Patients discharge date + 1 = admission date then we have group both rows into one row and sum costs from both the rows. Please check out the sample input and expected output for details. Any help will be highly appreciated
Sample Input
PatientID AdmissionDate DischargeDate Cost
1009 27-07-2014 31-07-2014 1050
1009 01-08-2014 23-08-2014 1070
1009 31-08-2014 31-08-2014 1900
1009 01-09-2014 14-09-2014 1260
1009 01-12-2014 31-12-2014 2090
1024 07-06-2014 28-06-2014 1900
1024 29-06-2014 31-07-2014 2900
1024 01-08-2014 02-08-2014 1800
Expected Output
PatientId AdminssionDate DischargeDate Cost
1009 27-07-2014 23-08-2014 2120
1009 31-08-2014 14-09-2014 3160
1009 01-12-2014 31-12-2014 2090
1024 07-06-2014 02-08-2014 6600
Please Use the below script to generate the source table and fill them up with the sample data.
--Create Table
CREATE TABLE PatientProblem
(
PatientID INT,
AdmissionDate DATETIME,
DischargeDate DATETIME,
Cost MONEY
)
GO
--Insert Data
INSERT INTO PatientProblem(PatientID,AdmissionDate,DischargeDate
,Cost)
VALUES
(1009,'2014-07-27','2014-07-31',1050.00),
(1009,'2014-08-01','2014-08-23',1070.00),
(1009,'2014-08-31','2014-08-31',1900.00),
(1009,'2014-09-01','2014-09-14',1260.00),
(1009,'2014-12-01','2014-12-31',2090.00),
(1024,'2014-06-07','2014-06-28',1900.00),
(1024,'2014-06-29','2014-07-31',2900.00),
(1024,'2014-08-01','2014-08-02',1800.00)
--Verify Data
SELECT PatientID,AdmissionDate,DischargeDate,Cost
FROM PatientProblem
June 28, 2015 at 11:07 pm
This should get you what you are looking for...
-- test data --
IF OBJECT_ID('tempdb..#PatientProblem') IS NOT NULL
DROP TABLE #PatientProblem;
CREATE TABLE #PatientProblem
(
PatientID INT,
AdmissionDate DATETIME,
DischargeDate DATETIME,
Cost MONEY
)
GO
--Insert Data
INSERT INTO #PatientProblem(PatientID,AdmissionDate,DischargeDate,Cost)
VALUES
(1009,'2014-07-27','2014-07-31',1050.00),
(1009,'2014-08-01','2014-08-23',1070.00),
(1009,'2014-08-31','2014-08-31',1900.00),
(1009,'2014-09-01','2014-09-14',1260.00),
(1009,'2014-12-01','2014-12-31',2090.00),
(1024,'2014-06-07','2014-06-28',1900.00),
(1024,'2014-06-29','2014-07-31',2900.00),
(1024,'2014-08-01','2014-08-02',1800.00);
-- index to eliminate 1 sort operator and lessen the impact of the other. --
CREATE NONCLUSTERED INDEX ix_PatientProblem_PatientID_AdminDate ON #PatientProblem (
PatientID, AdmissionDate)
INCLUDE (DischargeDate, Cost);
WITH AdminDateGroup AS (-- use datediff in combo with lag to determine were the groups are to be split.
SELECT
pp.PatientID,
pp.AdmissionDate,
pp.DischargeDate,
pp.Cost,
CASE WHEN DATEDIFF(dd, LAG(pp.DischargeDate, 1) OVER (PARTITION BY pp.PatientID ORDER BY pp.AdmissionDate), pp.AdmissionDate) > 1 OR LAG(pp.DischargeDate, 1) OVER (PARTITION BY pp.PatientID ORDER BY pp.AdmissionDate) IS NULL THEN pp.AdmissionDate END AS AdminDateGroup
FROM
#PatientProblem pp
), AdminDateGroupFill AS (-- fill in null values with the appropriate group value
SELECT
adg.PatientID,
adg.AdmissionDate,
adg.DischargeDate,
adg.Cost,
MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup
FROM
AdminDateGroup adg
)
-- at this point it's just a simple query with a couple of aggrigates.
SELECT
adgf.PatientID,
adgf.AdminDateGroup AS AdmissionDate,
MAX(adgf.DischargeDate) AS DischargeDate,
SUM(adgf.Cost) AS Cost
FROM
AdminDateGroupFill adgf
GROUP BY
adgf.PatientID,
adgf.AdminDateGroup
Here are the results...
PatientID AdmissionDate DischargeDate Cost
----------- ----------------------- ----------------------- ---------------------
1024 2014-06-07 00:00:00.000 2014-08-02 00:00:00.000 6600.00
1009 2014-07-27 00:00:00.000 2014-08-23 00:00:00.000 2120.00
1009 2014-08-31 00:00:00.000 2014-09-14 00:00:00.000 3160.00
1009 2014-12-01 00:00:00.000 2014-12-31 00:00:00.000 2090.00
HTH,
Jason
PS.. Thank you for posting DDL & test data... It makes things much easier to work on. 😀
June 28, 2015 at 11:48 pm
Perfect. Thanks for your help, this will solve the problem, will tryout and give you a feedback. I like how you used the LAG function and partition - Brilliant. :w00t:
June 29, 2015 at 3:39 pm
Hi Jason,
I'm using version 2008. Did not really pay attention to it, and it doesn't come with LAG and LEAD functions. I've just re-posted this on the 2008 forum. Any help will be highly appreciated.
Thanks
June 29, 2015 at 4:14 pm
June 29, 2015 at 7:42 pm
sratemo (6/29/2015)
Hi Jason,I'm using version 2008. Did not really pay attention to it, and it doesn't come with LAG and LEAD functions. I've just re-posted this on the 2008 forum. Any help will be highly appreciated.
Thanks
No worries. Just in future don't double post. It only leads to further confusion.
In any case it's not a problem to accomplish the same results without the LAG function. (Not as efficient but still very doable).
I tried to keep the two solutions as similar as possible to make it easier to see the differences.
-- test data --
IF OBJECT_ID('tempdb..#PatientProblem') IS NOT NULL
DROP TABLE #PatientProblem;
CREATE TABLE #PatientProblem
(
PatientID INT,
AdmissionDate DATETIME,
DischargeDate DATETIME,
Cost MONEY
)
GO
-- index to eliminate 1 sort operator and lessen the impact of the other. --
CREATE NONCLUSTERED INDEX ix_PatientProblem_PatientID_AdminDate ON #PatientProblem (
PatientID, AdmissionDate)
INCLUDE (DischargeDate, Cost);
--Insert Data
INSERT INTO #PatientProblem(PatientID,AdmissionDate,DischargeDate,Cost)
VALUES
(1009,'2014-07-27','2014-07-31',1050.00),
(1009,'2014-08-01','2014-08-23',1070.00),
(1009,'2014-08-31','2014-08-31',1900.00),
(1009,'2014-09-01','2014-09-14',1260.00),
(1009,'2014-12-01','2014-12-31',2090.00),
(1024,'2014-06-07','2014-06-28',1900.00),
(1024,'2014-06-29','2014-07-31',2900.00),
(1024,'2014-08-01','2014-08-02',1800.00);
WITH PatientProblemAddRN AS (-- add a row number to the data set.
SELECT
pp.PatientID,
pp.AdmissionDate,
pp.DischargeDate,
pp.Cost,
ROW_NUMBER() OVER (PARTITION BY pp.PatientID ORDER BY pp.AdmissionDate) AS RN
FROM
#PatientProblem pp
), AdminDateGroup AS (-- join the prev CTE to itself... note that the row number is offset by one in the ON clause of the join. this is the mechanism that replaces the LAG function of the previous solution.
SELECT
p1.PatientID,
p1.AdmissionDate,
p1.DischargeDate,
p1.Cost,
CASE WHEN DATEDIFF(dd, p2.DischargeDate, p1.AdmissionDate) > 1 OR p2.AdmissionDate IS NULL THEN p1.AdmissionDate END AS AdminDateGroup
FROM
PatientProblemAddRN p1
LEFT JOIN PatientProblemAddRN p2
ON p1.PatientID = p2.PatientID
AND p1.RN = p2.RN + 1
), AdminDateGroupFill AS (-- fill in null values with the appropriate group value
SELECT
adg.PatientID,
adg.AdmissionDate,
adg.DischargeDate,
adg.Cost,
MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup
FROM
AdminDateGroup adg
)
-- at this point it's just a simple query with a couple of aggrigates.
SELECT
adgf.PatientID,
adgf.AdminDateGroup AS AdmissionDate,
MAX(adgf.DischargeDate) AS DischargeDate,
SUM(adgf.Cost) AS Cost
FROM
AdminDateGroupFill adgf
GROUP BY
adgf.PatientID,
adgf.AdminDateGroup
Again... The same results as before...
PatientID AdmissionDate DischargeDate Cost
----------- ----------------------- ----------------------- ---------------------
1024 2014-06-07 00:00:00.000 2014-08-02 00:00:00.000 6600.00
1009 2014-07-27 00:00:00.000 2014-08-23 00:00:00.000 2120.00
1009 2014-08-31 00:00:00.000 2014-09-14 00:00:00.000 3160.00
1009 2014-12-01 00:00:00.000 2014-12-31 00:00:00.000 2090.00
Also note that the index I suggested in the 1st post remains unchanged. The same sort operation take place as before... Just with ROW_NUMBER instead of LAG.
HTH,
Jason
July 1, 2015 at 2:06 am
Oops, wrong thread.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 1, 2015 at 5:42 pm
Here's another quite clever method originating with Itzik Ben-Gan. Unfortunately, I don't believe that the original link is still active but I used the technique myself in this article (with attribution of course).
Calculating Gaps Between Overlapping Time Intervals in SQL[/url]
Using Jason's test data:
WITH C1 AS
(
-- Since the data table contains rows with a start and end date, we'll first unpivot
-- those using CROSS APPLY VALUES and assign a type to each. The columns e and s will
-- either be NULL (s NULL for an end date, e NULL for a start date) or row numbers
-- sequenced by the time. UserID has been eliminated because we're looking for overlapping
-- intervals across all users.
SELECT PatientID, ts, [Type], Cost
,e=CASE [Type]
WHEN 1 THEN NULL
ELSE ROW_NUMBER() OVER (PARTITION BY PatientID, [Type] ORDER BY DischargeDate) END
,s=CASE [Type]
WHEN -1 THEN NULL
ELSE ROW_NUMBER() OVER (PARTITION BY PatientID, [Type] ORDER BY AdmissionDate) END
FROM #PatientProblem
CROSS APPLY (VALUES (1, AdmissionDate-1), (-1, DischargeDate+1)) a([Type], ts)
),
C2 AS
(
-- Add a row number ordered as shown
SELECT C1.*, se=ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY ts, [Type] DESC)
FROM C1
),
C3 AS
(
-- Create a grpnm that pairs the rows
SELECT ts, grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY ts)-1) / 2 + 1), PatientID, Cost
FROM C2
-- This filter is the magic that eliminates the overlaps
WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0
),
C4 AS
(
-- Grouping by grpnm restores the records to only non-overlapped intervals
-- *** Adjust intervals back to original
SELECT AdmissionDate=MIN(ts)+1, DischargeDate=MAX(ts)-1, PatientID, Cost=SUM(Cost)
FROM C3
GROUP BY PatientID, grpnm
)
SELECT PatientID, AdmissionDate, DischargeDate
-- *** Subquery to compute the cost
,Cost =
(
SELECT SUM(Cost)
FROM #PatientProblem b
WHERE a.PatientID = b.PatientID AND
b.AdmissionDate >= a.AdmissionDate AND
b.DischargeDate <= a.DischargeDate
)
FROM C4 a
ORDER BY PatientID, AdmissionDate;
You should find this approach to be pretty fast. I had to make just a couple of modifications to handle the 1 day apart issue between your end/start date overlaps (*** commented).
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 2, 2015 at 9:26 am
Amazing. Works perfectly. Thank you:-):-):-)
July 2, 2015 at 9:35 am
Hi Jason,
Your solution looks solid but the query wouldn't execute past:
"MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup".
error - Incorrect syntax near 'order'.
Tried going around this but then got discrepancies in the results.
July 2, 2015 at 9:57 am
That's because prior to 2012, ORDER BY clauses were not supported in aggregate window functions; they were supported only in ranking window functions.
See https://msdn.microsoft.com/en-us/library/ms189461(v=sql.105).aspx for the appropriate documentation.
Cheers!
July 2, 2015 at 11:43 am
sratemo (7/2/2015)
Hi Jason,Your solution looks solid but the query wouldn't execute past:
"MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup".
error - Incorrect syntax near 'order'.
Tried going around this but then got discrepancies in the results.
Yea... That's my bad. I knew that 2008 supported windowed aggregates but it looks like it only supports partitioning, not ordering... So completely my fault on that...
Luckily Dwain was able to step up to the plate and come up with a good working solution. I haven't had a chance to dig into his code but I definitely will. He's always good at supplying fantastic new tools for the toolbox. 🙂
July 2, 2015 at 5:24 pm
Jason A. Long (7/2/2015)
sratemo (7/2/2015)
Hi Jason,Your solution looks solid but the query wouldn't execute past:
"MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup".
error - Incorrect syntax near 'order'.
Tried going around this but then got discrepancies in the results.
Yea... That's my bad. I knew that 2008 supported windowed aggregates but it looks like it only supports partitioning, not ordering... So completely my fault on that...
Luckily Dwain was able to step up to the plate and come up with a good working solution. I haven't had a chance to dig into his code but I definitely will. He's always good at supplying fantastic new tools for the toolbox. 🙂
Well, that's not exactly a new tool (IBG's approach to collapsing overlapping intervals) but I thank you nonetheless.
Note that I tried to avoid the extra sub-query I used to pull the cost but I couldn't see a way to make that work within the interval collapsing algorithm. That "magic filter" that IBG came up with seems to preclude doing it in there. Had I been able to, I believe it would have removed the extra INDEX SCAN (or SEEK).
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply