May 5, 2017 at 3:57 pm
Hello,
We have a Staff/Program Assignment that captures to whom and what program a particular patient is assigned. It's a historical table. Sometimes a new staff person is assigned but the program remains the same. I need to extract the start and end date for each episode that a patient was assigned to DG. Below I have provided some sample data. For example, the first 3 rows reflect that patient # 000001 was assigned to 3 different staff all during their first episode in DG.
For Account # 000001, the episodes they were assigned to DG should be: 3/24/17 through 4/2/17; 7/31/17 through 8/9/17
For Account # 000002, the episodes they were assigned to DG should be: 3/21/17 through 3/24/17
CREATE TABLE #TEMP(ACCOUNT VARCHAR(15),STAFF VARCHAR(10),PROGRAM VARCHAR(10),STARTDATE DATETIME,ENDDATE DATETIME)
INSERT INTO #TEMP VALUES ('000001','JOHN','DG','3/24/17','3/26/17')
INSERT INTO #TEMP VALUES ('000001','JILL','DG','3/27/17','3/28/17')
INSERT INTO #TEMP VALUES ('000001','STEVE','DG','3/29/17','4/2/17')
INSERT INTO #TEMP VALUES ('000001','','INACTIVE','4/3/17','4/23/17')
INSERT INTO #TEMP VALUES ('000001','JOHN','AA','6/1/17','6/20/17')
INSERT INTO #TEMP VALUES ('000001','JILL','DH','6/30/17','7/30/17')
INSERT INTO #TEMP VALUES ('000001','CHRIS','DG','7/31/17','8/09/17')
INSERT INTO #TEMP VALUES ('000001','PHILLIP','AD','8/10/17',NULL)
INSERT INTO #TEMP VALUES ('000002','STEVE','DG','3/21/17','3/24/17')
SELECT * FROM #TEMP ORDER BY ACCOUNT,STARTDATE
DROP TABLE #TEMP
Thanks in advance for your help.
May 5, 2017 at 4:45 pm
This is a classic Islands problem - You can Google for "SQL Gaps and Islands" for other examples
Test Data
CREATE TABLE #TEMP(ACCOUNT VARCHAR(15),STAFF VARCHAR(10),PROGRAM VARCHAR(10),STARTDATE DATETIME,ENDDATE DATETIME)
INSERT INTO #TEMP VALUES ('000001','JOHN','DG','3/24/17','3/26/17')
INSERT INTO #TEMP VALUES ('000001','JILL','DG','3/27/17','3/28/17')
INSERT INTO #TEMP VALUES ('000001','STEVE','DG','3/29/17','4/2/17')
INSERT INTO #TEMP VALUES ('000001','','INACTIVE','4/3/17','4/23/17')
INSERT INTO #TEMP VALUES ('000001','JOHN','AA','6/1/17','6/20/17')
INSERT INTO #TEMP VALUES ('000001','JILL','DH','6/30/17','7/30/17')
INSERT INTO #TEMP VALUES ('000001','CHRIS','DG','7/31/17','8/09/17')
INSERT INTO #TEMP VALUES ('000001','PHILLIP','AD','8/10/17',NULL)
INSERT INTO #TEMP VALUES ('000002','STEVE','DG','3/21/17','3/24/17');
The solution
/*********************************************************************************
NOTE: This technique was based on a solution from pg 196 of the book
"Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions"
by Itzik Ben-Gan.
*********************************************************************************/
WITH
--=== Generate a list of numbers to use for padding the dates
T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0)) AS X(N))
, Nums(n) AS (SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM T T1 -- Max = 64
, T T2 -- Max = 4,096
)
, cteData AS (
--=== Generate a padded list of all the dates between STARTDATE and ENDDATE for each record
SELECT
ACCOUNT
, PROGRAM
, CALCDATE = DATEADD(DD, f.n -1, STARTDATE)
, DATEADD(day, -1 * DENSE_RANK() OVER(ORDER BY DATEADD(DD, f.n -1, STARTDATE)), DATEADD(DD, f.n -1, STARTDATE)) AS diff
FROM #TEMP as d
CROSS APPLY Nums as f
WHERE f.n <= DATEDIFF(DD, STARTDATE, ENDDATE)+1
)
SELECT ACCOUNT
, PROGRAM
, MIN(CALCDATE) AS rangeStart
, MAX(CALCDATE) AS rangeEnd
FROM cteData
-- WHERE PROGRAM = 'dg'
GROUP BY ACCOUNT, PROGRAM, diff
ORDER BY ACCOUNT, PROGRAM, rangeStart;
May 6, 2017 at 12:47 am
After re-reading my post from last night, I notice that the solution will only work if STARTTIME is always unique.
Below is a modified query that will handle multiple ACCOUNT, PROGRAM combinations.
/*********************************************************************************
NOTE: This technique was based on a solution from pg 196 of the book
"Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions"
by Itzik Ben-Gan.
*********************************************************************************/
WITH
--=== Generate a list of numbers to use for padding the dates
T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0)) AS X(N))
, Nums(n) AS (SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM T T1 -- Max = 64
, T T2 -- Max = 4,096
)
, cteData AS (
--=== Generate a padded list of all the dates between STARTDATE and ENDDATE for each record
SELECT
ACCOUNT
, PROGRAM
, CALCDATE = DATEADD(DD, f.n -1, STARTDATE)
, DATEADD(day, -1 * DENSE_RANK() OVER(PARTITION BY ACCOUNT, PROGRAM
ORDER BY DATEADD(DD, f.n -1, STARTDATE)), DATEADD(DD, f.n -1, STARTDATE)) AS diff
FROM #TEMP as d
CROSS APPLY Nums as f
WHERE f.n <= DATEDIFF(DD, STARTDATE, ENDDATE)+1
)
SELECT ACCOUNT
, PROGRAM
, MIN(CALCDATE) AS rangeStart
, MAX(CALCDATE) AS rangeEnd
FROM cteData
-- WHERE PROGRAM = 'dg'
GROUP BY ACCOUNT, PROGRAM, diff
ORDER BY ACCOUNT, PROGRAM, rangeStart;
May 6, 2017 at 9:42 am
DesNorton - Saturday, May 6, 2017 12:47 AMAfter re-reading my post from last night, I notice that the solution will only work if STARTTIME is always unique.
Below is a modified query that will handle multiple ACCOUNT, PROGRAM combinations.
/*********************************************************************************
NOTE: This technique was based on a solution from pg 196 of the book
"Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions"
by Itzik Ben-Gan.
*********************************************************************************/
WITH
--=== Generate a list of numbers to use for padding the dates
T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
, (0),(0),(0),(0)) AS X(N))
, Nums(n) AS (SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM T T1 -- Max = 64
, T T2 -- Max = 4,096
)
, cteData AS (
--=== Generate a padded list of all the dates between STARTDATE and ENDDATE for each record
SELECT
ACCOUNT
, PROGRAM
, CALCDATE = DATEADD(DD, f.n -1, STARTDATE)
, DATEADD(day, -1 * DENSE_RANK() OVER(PARTITION BY ACCOUNT, PROGRAM
ORDER BY DATEADD(DD, f.n -1, STARTDATE)), DATEADD(DD, f.n -1, STARTDATE)) AS diff
FROM #TEMP as d
CROSS APPLY Nums as f
WHERE f.n <= DATEDIFF(DD, STARTDATE, ENDDATE)+1
)
SELECT ACCOUNT
, PROGRAM
, MIN(CALCDATE) AS rangeStart
, MAX(CALCDATE) AS rangeEnd
FROM cteData
-- WHERE PROGRAM = 'dg'
GROUP BY ACCOUNT, PROGRAM, diff
ORDER BY ACCOUNT, PROGRAM, rangeStart;
Des - this loses the row for program "AD".
Here's an alternative method adapted from something I found on an Oracle site recently:SELECT ACCOUNT, PROGRAM, STARTDATE = MIN(STARTDATE), ENDDATE = MIN(ENDDATE)
FROM (
SELECT *, Grouper = DATEADD(DAY,
0-SUM(1+DATEDIFF(DAY,STARTDATE,ENDDATE)) OVER(PARTITION BY ACCOUNT,PROGRAM ORDER BY STARTDATE),
ENDDATE)
FROM #TEMP
) d
GROUP BY ACCOUNT, PROGRAM, Grouper
Here's a breakdown of how it works:SELECT *, Grouper = DATEADD(DAY,0-SumDuration,ENDDATE)
FROM (
SELECT *, SumDuration = SUM(Duration) OVER(PARTITION BY ACCOUNT,PROGRAM ORDER BY STARTDATE)
FROM (
SELECT *, Duration = 1+DATEDIFF(DAY,STARTDATE,ENDDATE)
FROM #TEMP
) d1
) d2
ORDER BY ACCOUNT,STARTDATE
This is generally around ten times faster than the IBG method you reference - but I'm fairly sure he's improved on it in the last couple of years.
Edit: typical execution stats
(7 row(s) affected)
Table 'Worktable'. Scan count 15, logical reads 54 ...
Table '#TEMP'. Scan count 1, logical reads 2 ...
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
=========================================================
(6 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0 ...
Table '#TEMP'. Scan count 1, logical reads 8193 ...
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 20 ms.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
May 7, 2017 at 2:57 am
Nice Chris
Just one small change to your script
Need to change MIN(ENDDATE) to MAX(ENDDATE)
SELECT ACCOUNT, PROGRAM, STARTDATE = MIN(STARTDATE), ENDDATE = MAX(ENDDATE)
FROM (
SELECT *, Grouper = DATEADD(DAY,
0-SUM(1+DATEDIFF(DAY,STARTDATE,ENDDATE)) OVER(PARTITION BY ACCOUNT,PROGRAM ORDER BY STARTDATE),
ENDDATE)
FROM #TEMP
) d
GROUP BY ACCOUNT, PROGRAM, Grouper
May 7, 2017 at 5:27 am
DesNorton - Sunday, May 7, 2017 2:57 AMNice ChrisJust one small change to your script
Need to change MIN(ENDDATE) to MAX(ENDDATE)
SELECT ACCOUNT, PROGRAM, STARTDATE = MIN(STARTDATE), ENDDATE = MAX(ENDDATE)
FROM (
SELECT *, Grouper = DATEADD(DAY,
0-SUM(1+DATEDIFF(DAY,STARTDATE,ENDDATE)) OVER(PARTITION BY ACCOUNT,PROGRAM ORDER BY STARTDATE),
ENDDATE)
FROM #TEMP
) d
GROUP BY ACCOUNT, PROGRAM, Grouper
Many thanks Des.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
May 8, 2017 at 9:12 am
Excellent, thank you Des and Chris. I will be studying this today! I always love posting here, I learn something new every time.
May 8, 2017 at 9:16 am
Thanks for the feedback, Adam. Holler if you get stuck.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply