March 27, 2011 at 10:09 am
Hi
Having recently set up a Data Warehouse at work, we have been asked to come up with a solution to the following. I was wondering if anyone counld advise on the best way to achieve this.
We need to be able to count how many patients were in the hospital at any given time in the past.
Each patient is given a an admissions and discharge date. if a patient does not have a discharge date then they are considered to still be in the hospital.
I need to be able to enter a start and end date for any given period in the past (upto and including today) and have a count of patients who were admitted and discharged in that period, by dates within that period.
Below is some sample data followed by expected results. Thanks in advance for any advice.
SAMPLE DATA
drop table PS_TestForOnline
CREATE TABLE PS_TestForOnline
(
rowkey int,
crn int,
admdate date ,
disdate date,
)
INSERT INTO PS_TestForOnline
VALUES('1','11111','2011/01/01','2011/01/07' );
INSERT INTO PS_TestForOnline
VALUES('2','22222','2011/01/02','2011/01/07');
INSERT INTO PS_TestForOnline
VALUES('3','33333','2011/01/03','2011/01/07');
INSERT INTO PS_TestForOnline
VALUES('4','44444','2011/01/04','2011/01/07');
INSERT INTO PS_TestForOnline
VALUES('5','55555','2011/01/05','2011/01/07' );
INSERT INTO PS_TestForOnline
VALUES('6','66666','2011/01/06','2011/01/07' );
INSERT INTO PS_TestForOnline
VALUES('7','77777','2011/01/07','2011/01/07');
INSERT INTO PS_TestForOnline
VALUES('8','88888','2011/01/08', (NULL));
select * from PS_TestForOnline
EXPECTED RESULTS for period 1st to 8th Jan
CREATE TABLE PS_TestForOnline_Answer
(
DATE DATE ,
COUNT_OF_PATIENTS INT,
);
INSERT INTO PS_TestForOnline_Answer
VALUES('2011/01/01','1' );
INSERT INTO PS_TestForOnline_Answer
VALUES('2011/01/02','2' );
INSERT INTO PS_TestForOnline_Answer
VALUES('2011/01/03','3' );
INSERT INTO PS_TestForOnline_Answer
VALUES('2011/01/04','4' );
INSERT INTO PS_TestForOnline_Answer
VALUES('2011/01/05','5' );
INSERT INTO PS_TestForOnline_Answer
VALUES('2011/01/06','6' );
INSERT INTO PS_TestForOnline_Answer
VALUES('2011/01/07','7' );
INSERT INTO PS_TestForOnline_Answer
VALUES('2011/01/08','1' );
select * from PS_TestForOnline_Answer
March 27, 2011 at 12:41 pm
[EDIT: Just found this is wrong and gives wrong count on last day; left here as it may give some inspiration. Am working on as output is currently annoying me :)]
This'll get you somewhere near there.
You can either use a loop (instead of params table) to iterate from first date to last date.
Alternatively one of them fancy recursive CTEs might get you there
DROP TABLE params;
CREATE TABLE params (StartDate date, EndDate date);
INSERT INTO params VALUES ('2011-01-01', '2011-01-01');
INSERT INTO params VALUES ('2011-01-01', '2011-01-02');
INSERT INTO params VALUES ('2011-01-01', '2011-01-03');
INSERT INTO params VALUES ('2011-01-01', '2011-01-04');
INSERT INTO params VALUES ('2011-01-01', '2011-01-05');
INSERT INTO params VALUES ('2011-01-01', '2011-01-06');
INSERT INTO params VALUES ('2011-01-01', '2011-01-07');
INSERT INTO params VALUES ('2011-01-01', '2011-01-08');
WITH dateparams AS ( SELECT StartDate, EndDate FROM params )
SELECT COALESCE(t2.EndDate, t1.admdate) AS DateCounted, COUNT(*)
FROM PS_TestForOnline AS t1
LEFT JOIN dateparams AS t2
ON t1.admdate BETWEEN t2.StartDate AND t2.EndDate
OR t1.disdate BETWEEN t2.StartDate AND t2.EndDate
GROUP BY t2.StartDate, COALESCE(t2.EndDate, t1.admdate)
ORDER BY COALESCE(t2.EndDate, t1.admdate)
---------------------------------------------
If data can screw you, it will; never assume the data are correct.
March 27, 2011 at 1:39 pm
A conditional SUM() based on a calendar table should do it.
March 27, 2011 at 4:43 pm
Sorry, the query has stumped me...
However taking a step back from the problem makes me wonder whether it can be resolved with SQL, maybe a snapshot table is required, that can be updated daily.
Even if we want 1st to the 8th, any query will only show patients from that time period, unless it goes back to day one to keep a tally.
In short.
Even on the 1st Jan 2011, there could already be 40 patients from previous days.
Therefore, if on 1st Jan a patient is admitted that takes the tally to 41, not necessarily 1 as shown in the sample data (which is where I went flying off, trying to write SQL earlier :blush: ).
If a nightly job were run to add admissions and discharged each day, a snapshot table could be filled with these values and easily referenced.
Dunno, stumped...
---------------------------------------------
If data can screw you, it will; never assume the data are correct.
March 28, 2011 at 2:53 am
I have the following script which works.
Although when running against the live data in the warehouse it take a long, long time. I am running it against a table of about 800,000 admissions.
I does give me correct answer. i have indexed the date fields but it still takes a good 2 mins to do the last 2 months of activity.
Any Ideas on how to enhance performace??
declare @startDate datetime, @endDate datetime;
set @startDate = '20101230';
set @endDate = '20110112';
with dates as -- this part is just generating a calendar for period
(
select @startDate as [date]
union all
select dateadd(dd,1,[date]) from dates where [date] <= @endDate
)
select
[date],
p.count_of_patients
from
dates d
cross apply
(
select
count(*) as count_of_patients
from
PS_TestForOnline
where
admdate <= d.date
and (disdate >= d.date or disdate is null)
) p
March 28, 2011 at 2:35 pm
Rewriting as a join (using your logic, which I couldn't get my head around yesterday - trying to get back up to speed with SQL) gives:
With (550000 source rows)
DECLARE @startDate datetime, @endDate datetime;
SET @startDate = '20101230';
SET @endDate = '20110112';
WITH dates AS -- this part is just generating a calendar for period
(
SELECT @startDate as [date]
UNION ALL
SELECT DATEADD(dd, 1, [date]) FROM dates WHERE [date] <= @endDate
)
SELECT [date], COUNT(PS_TestForOnline.rowkey)
FROM PS_TestForOnline
RIGHT JOIN dates
ON admdate <= dates.[date]
AND (disdate >= dates.[date] OR disdate IS NULL)
GROUP BY [date]
But it's no faster than yours and consumes more resource.
Looking at statistics the JOIN gives.
Table 'Worktable'. Scan count 4, logical reads 3094315, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PS_TestForOnline'. Scan count 2, logical reads 3322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 9922 ms, elapsed time = 5089 ms
CROSS APPLY gives
Table 'PS_TestForOnline'. Scan count 15, logical reads 24915, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 91, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2636 ms, elapsed time = 2634 ms.
Conclusion CROSS APPLY quicker and more efficient, time taken JOIN takes ~5s APPLY ~2.5s
Looking at covering indexes gives:
JOIN
CREATE NONCLUSTERED INDEX idx_admdatedisdate
ON [dbo].[PS_TestForOnline] ([admdate],[disdate])
INCLUDE ([rowkey])
Table 'PS_TestForOnline'. Scan count 15, logical reads 24870, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 91, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3744 ms, elapsed time = 3780 ms.
CROSS APPLY gives
CREATE NONCLUSTERED INDEX idx_admdatedisdate2
ON [dbo].[PS_TestForOnline] ([admdate],[disdate])
Table 'PS_TestForOnline'. Scan count 15, logical reads 20775, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 91, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1794 ms, elapsed time = 1782 ms.
Conclusion CROSS APPLY quicker and more efficient, time taken JOIN takes ~4s APPLY ~2s
Interested to see where this ends up.
---------------------------------------------
If data can screw you, it will; never assume the data are correct.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply