Admission dates overlap and suming of inpatient costs

Question

Post reply

Admission dates overlap and suming of inpatient costs

sratemo

SSC Enthusiast

Points: 107
More actions
June 28, 2015 at 10:12 pm

#309527

Hello-
I've been struggling with this for some time. we have to group data based on Patients admission date and discharge date. If any Patients discharge date + 1 = admission date then we have group both rows into one row and sum costs from both the rows. Please check out the sample input and expected output for details. Any help will be highly appreciated
Sample Input
PatientID AdmissionDate DischargeDate Cost
1009 27-07-2014 31-07-2014 1050
1009 01-08-2014 23-08-2014 1070
1009 31-08-2014 31-08-2014 1900
1009 01-09-2014 14-09-2014 1260
1009 01-12-2014 31-12-2014 2090
1024 07-06-2014 28-06-2014 1900
1024 29-06-2014 31-07-2014 2900
1024 01-08-2014 02-08-2014 1800
Expected Output
PatientId AdminssionDate DischargeDate Cost
1009 27-07-2014 23-08-2014 2120
1009 31-08-2014 14-09-2014 3160
1009 01-12-2014 31-12-2014 2090
1024 07-06-2014 02-08-2014 6600
Please Use the below script to generate the source table and fill them up with the sample data.
--Create Table
CREATE TABLE PatientProblem
(
PatientID INT,
AdmissionDate DATETIME,
DischargeDate DATETIME,
Cost MONEY
)
GO
--Insert Data
INSERT INTO PatientProblem(PatientID,AdmissionDate,DischargeDate
,Cost)
VALUES
(1009,'2014-07-27','2014-07-31',1050.00),
(1009,'2014-08-01','2014-08-23',1070.00),
(1009,'2014-08-31','2014-08-31',1900.00),
(1009,'2014-09-01','2014-09-14',1260.00),
(1009,'2014-12-01','2014-12-31',2090.00),
(1024,'2014-06-07','2014-06-28',1900.00),
(1024,'2014-06-29','2014-07-31',2900.00),
(1024,'2014-08-01','2014-08-02',1800.00)
--Verify Data
SELECT PatientID,AdmissionDate,DischargeDate,Cost
FROM PatientProblem

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 1

This should get you what you are looking for...

-- test data --

IF OBJECT_ID('tempdb..#PatientProblem') IS NOT NULL

DROP TABLE #PatientProblem;

CREATE TABLE #PatientProblem

(

PatientID INT,

AdmissionDate DATETIME,

DischargeDate DATETIME,

Cost MONEY

)

GO

--Insert Data

INSERT INTO #PatientProblem(PatientID,AdmissionDate,DischargeDate,Cost)

VALUES

(1009,'2014-07-27','2014-07-31',1050.00),

(1009,'2014-08-01','2014-08-23',1070.00),

(1009,'2014-08-31','2014-08-31',1900.00),

(1009,'2014-09-01','2014-09-14',1260.00),

(1009,'2014-12-01','2014-12-31',2090.00),

(1024,'2014-06-07','2014-06-28',1900.00),

(1024,'2014-06-29','2014-07-31',2900.00),

(1024,'2014-08-01','2014-08-02',1800.00);

-- index to eliminate 1 sort operator and lessen the impact of the other. --

CREATE NONCLUSTERED INDEX ix_PatientProblem_PatientID_AdminDate ON #PatientProblem (

PatientID, AdmissionDate)

INCLUDE (DischargeDate, Cost);

WITH AdminDateGroup AS (-- use datediff in combo with lag to determine were the groups are to be split.

SELECT

pp.PatientID,

pp.AdmissionDate,

pp.DischargeDate,

pp.Cost,

CASE WHEN DATEDIFF(dd, LAG(pp.DischargeDate, 1) OVER (PARTITION BY pp.PatientID ORDER BY pp.AdmissionDate), pp.AdmissionDate) > 1 OR LAG(pp.DischargeDate, 1) OVER (PARTITION BY pp.PatientID ORDER BY pp.AdmissionDate) IS NULL THEN pp.AdmissionDate END AS AdminDateGroup

FROM

#PatientProblem pp

), AdminDateGroupFill AS (-- fill in null values with the appropriate group value

SELECT

adg.PatientID,

adg.AdmissionDate,

adg.DischargeDate,

adg.Cost,

MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup

FROM

AdminDateGroup adg

)

-- at this point it's just a simple query with a couple of aggrigates.

SELECT

adgf.PatientID,

adgf.AdminDateGroup AS AdmissionDate,

MAX(adgf.DischargeDate) AS DischargeDate,

SUM(adgf.Cost) AS Cost

FROM

AdminDateGroupFill adgf

GROUP BY

adgf.PatientID,

adgf.AdminDateGroup

Here are the results...

PatientID AdmissionDate DischargeDate Cost

----------- ----------------------- ----------------------- ---------------------

1024 2014-06-07 00:00:00.000 2014-08-02 00:00:00.000 6600.00

1009 2014-07-27 00:00:00.000 2014-08-23 00:00:00.000 2120.00

1009 2014-08-31 00:00:00.000 2014-09-14 00:00:00.000 3160.00

1009 2014-12-01 00:00:00.000 2014-12-31 00:00:00.000 2090.00

HTH,

Jason

PS.. Thank you for posting DDL & test data... It makes things much easier to work on. 😀

sratemo SSC Enthusiast Points: 107 More actions · Answer 2

Perfect. Thanks for your help, this will solve the problem, will tryout and give you a feedback. I like how you used the LAG function and partition - Brilliant. :w00t:

sratemo SSC Enthusiast Points: 107 More actions · Answer 3

Hi Jason,

I'm using version 2008. Did not really pay attention to it, and it doesn't come with LAG and LEAD functions. I've just re-posted this on the 2008 forum. Any help will be highly appreciated.

Thanks

pietlinden SSC Guru Points: 63509 More actions · Answer 4

Maybe you can use this article[/url] to simulate LAG and LEAD.

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 5

sratemo (6/29/2015)
Hi Jason,
I'm using version 2008. Did not really pay attention to it, and it doesn't come with LAG and LEAD functions. I've just re-posted this on the 2008 forum. Any help will be highly appreciated.
Thanks

No worries. Just in future don't double post. It only leads to further confusion.

In any case it's not a problem to accomplish the same results without the LAG function. (Not as efficient but still very doable).

I tried to keep the two solutions as similar as possible to make it easier to see the differences.

-- test data --

IF OBJECT_ID('tempdb..#PatientProblem') IS NOT NULL

DROP TABLE #PatientProblem;

CREATE TABLE #PatientProblem

(

PatientID INT,

AdmissionDate DATETIME,

DischargeDate DATETIME,

Cost MONEY

)

GO

-- index to eliminate 1 sort operator and lessen the impact of the other. --

CREATE NONCLUSTERED INDEX ix_PatientProblem_PatientID_AdminDate ON #PatientProblem (

PatientID, AdmissionDate)

INCLUDE (DischargeDate, Cost);

--Insert Data

INSERT INTO #PatientProblem(PatientID,AdmissionDate,DischargeDate,Cost)

VALUES

(1009,'2014-07-27','2014-07-31',1050.00),

(1009,'2014-08-01','2014-08-23',1070.00),

(1009,'2014-08-31','2014-08-31',1900.00),

(1009,'2014-09-01','2014-09-14',1260.00),

(1009,'2014-12-01','2014-12-31',2090.00),

(1024,'2014-06-07','2014-06-28',1900.00),

(1024,'2014-06-29','2014-07-31',2900.00),

(1024,'2014-08-01','2014-08-02',1800.00);

WITH PatientProblemAddRN AS (-- add a row number to the data set.

SELECT

pp.PatientID,

pp.AdmissionDate,

pp.DischargeDate,

pp.Cost,

ROW_NUMBER() OVER (PARTITION BY pp.PatientID ORDER BY pp.AdmissionDate) AS RN

FROM

#PatientProblem pp

), AdminDateGroup AS (-- join the prev CTE to itself... note that the row number is offset by one in the ON clause of the join. this is the mechanism that replaces the LAG function of the previous solution.

SELECT

p1.PatientID,

p1.AdmissionDate,

p1.DischargeDate,

p1.Cost,

CASE WHEN DATEDIFF(dd, p2.DischargeDate, p1.AdmissionDate) > 1 OR p2.AdmissionDate IS NULL THEN p1.AdmissionDate END AS AdminDateGroup

FROM

PatientProblemAddRN p1

LEFT JOIN PatientProblemAddRN p2

ON p1.PatientID = p2.PatientID

AND p1.RN = p2.RN + 1

), AdminDateGroupFill AS (-- fill in null values with the appropriate group value

SELECT

adg.PatientID,

adg.AdmissionDate,

adg.DischargeDate,

adg.Cost,

MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup

FROM

AdminDateGroup adg

)

-- at this point it's just a simple query with a couple of aggrigates.

SELECT

adgf.PatientID,

adgf.AdminDateGroup AS AdmissionDate,

MAX(adgf.DischargeDate) AS DischargeDate,

SUM(adgf.Cost) AS Cost

FROM

AdminDateGroupFill adgf

GROUP BY

adgf.PatientID,

adgf.AdminDateGroup

Again... The same results as before...

PatientID AdmissionDate DischargeDate Cost

----------- ----------------------- ----------------------- ---------------------

1024 2014-06-07 00:00:00.000 2014-08-02 00:00:00.000 6600.00

1009 2014-07-27 00:00:00.000 2014-08-23 00:00:00.000 2120.00

1009 2014-08-31 00:00:00.000 2014-09-14 00:00:00.000 3160.00

1009 2014-12-01 00:00:00.000 2014-12-31 00:00:00.000 2090.00

Also note that the index I suggested in the 1st post remains unchanged. The same sort operation take place as before... Just with ROW_NUMBER instead of LAG.

HTH,

Jason

Dwain Camps SSC Guru Points: 86978 More actions · Answer 6

Oops, wrong thread.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 7

Here's another quite clever method originating with Itzik Ben-Gan. Unfortunately, I don't believe that the original link is still active but I used the technique myself in this article (with attribution of course).

Calculating Gaps Between Overlapping Time Intervals in SQL[/url]

Using Jason's test data:

WITH C1 AS

(

-- Since the data table contains rows with a start and end date, we'll first unpivot

-- those using CROSS APPLY VALUES and assign a type to each. The columns e and s will

-- either be NULL (s NULL for an end date, e NULL for a start date) or row numbers

-- sequenced by the time. UserID has been eliminated because we're looking for overlapping

-- intervals across all users.

SELECT PatientID, ts, [Type], Cost

,e=CASE [Type]

WHEN 1 THEN NULL

ELSE ROW_NUMBER() OVER (PARTITION BY PatientID, [Type] ORDER BY DischargeDate) END

,s=CASE [Type]

WHEN -1 THEN NULL

ELSE ROW_NUMBER() OVER (PARTITION BY PatientID, [Type] ORDER BY AdmissionDate) END

FROM #PatientProblem

CROSS APPLY (VALUES (1, AdmissionDate-1), (-1, DischargeDate+1)) a([Type], ts)

),

C2 AS

(

-- Add a row number ordered as shown

SELECT C1.*, se=ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY ts, [Type] DESC)

FROM C1

),

C3 AS

(

-- Create a grpnm that pairs the rows

SELECT ts, grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY ts)-1) / 2 + 1), PatientID, Cost

FROM C2

-- This filter is the magic that eliminates the overlaps

WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0

),

C4 AS

(

-- Grouping by grpnm restores the records to only non-overlapped intervals

-- *** Adjust intervals back to original

SELECT AdmissionDate=MIN(ts)+1, DischargeDate=MAX(ts)-1, PatientID, Cost=SUM(Cost)

FROM C3

GROUP BY PatientID, grpnm

)

SELECT PatientID, AdmissionDate, DischargeDate

-- *** Subquery to compute the cost

,Cost =

(

SELECT SUM(Cost)

FROM #PatientProblem b

WHERE a.PatientID = b.PatientID AND

b.AdmissionDate >= a.AdmissionDate AND

b.DischargeDate <= a.DischargeDate

)

FROM C4 a

ORDER BY PatientID, AdmissionDate;

You should find this approach to be pretty fast. I had to make just a couple of modifications to handle the 1 day apart issue between your end/start date overlaps (*** commented).

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

sratemo SSC Enthusiast Points: 107 More actions · Answer 8

sratemo

SSC Enthusiast

Points: 107

July 2, 2015 at 9:26 am

#1809954

Amazing. Works perfectly. Thank you:-):-):-)

sratemo SSC Enthusiast Points: 107 More actions · Answer 9

Hi Jason,

Your solution looks solid but the query wouldn't execute past:

"MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup".

error - Incorrect syntax near 'order'.

Tried going around this but then got discrepancies in the results.

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 10

That's because prior to 2012, ORDER BY clauses were not supported in aggregate window functions; they were supported only in ranking window functions.

See https://msdn.microsoft.com/en-us/library/ms189461(v=sql.105).aspx for the appropriate documentation.

Cheers!

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 11

sratemo (7/2/2015)
Hi Jason,
Your solution looks solid but the query wouldn't execute past:
"MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup".
error - Incorrect syntax near 'order'.
Tried going around this but then got discrepancies in the results.

Yea... That's my bad. I knew that 2008 supported windowed aggregates but it looks like it only supports partitioning, not ordering... So completely my fault on that...

Luckily Dwain was able to step up to the plate and come up with a good working solution. I haven't had a chance to dig into his code but I definitely will. He's always good at supplying fantastic new tools for the toolbox. 🙂

Dwain Camps SSC Guru Points: 86978 More actions · Answer 12

Jason A. Long (7/2/2015)
sratemo (7/2/2015)
Hi Jason,
Your solution looks solid but the query wouldn't execute past:
"MAX(adg.AdminDateGroup) OVER (PARTITION BY adg.PatientID ORDER BY adg.AdmissionDate) AS AdminDateGroup".
error - Incorrect syntax near 'order'.
Tried going around this but then got discrepancies in the results.
Yea... That's my bad. I knew that 2008 supported windowed aggregates but it looks like it only supports partitioning, not ordering... So completely my fault on that...
Luckily Dwain was able to step up to the plate and come up with a good working solution. I haven't had a chance to dig into his code but I definitely will. He's always good at supplying fantastic new tools for the toolbox. 🙂

Well, that's not exactly a new tool (IBG's approach to collapsing overlapping intervals) but I thank you nonetheless.

Note that I tried to avoid the extra sub-query I used to pull the cost but I couldn't see a way to make that work within the interval collapsing algorithm. That "magic filter" that IBG came up with seems to preclude doing it in there. Had I been able to, I believe it would have removed the extra INDEX SCAN (or SEEK).

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St