December 15, 2017 at 12:15 pm
Hey guys, feeling kinda dumb I can't figure this one out on my own. 🙁 Hopefully you can advise. Here's some sample data:
IF OBJECT_ID('tempdb..#EMP_TEST') IS NOT NULL
DROP TABLE #EMP_TEST
GO
CREATE TABLE #EMP_TEST
(
EID SMALLINT NOT NULL,
Stat CHAR(1) NOT NULL,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL
)
GO
INSERT INTO #EMP_TEST
(
EID,
Stat,
StartDate,
EndDate
)
SELECT 12345, 'A', '20111215', '20120101' UNION ALL
SELECT 12345, 'A', '20120101', '20120118' UNION ALL
SELECT 12345, 'A', '20120118', '20120201' UNION ALL
SELECT 12345, 'A', '20120201', '20120321' UNION ALL
SELECT 12345, 'A', '20120321', '20120401' UNION ALL
SELECT 12345, 'A', '20120401', '20121109' UNION ALL
SELECT 12345, 'L', '20121109', '20130101' UNION ALL
SELECT 12345, 'L', '20130101', '20130103' UNION ALL
SELECT 12345, 'A', '20130103', '20130203' UNION ALL
SELECT 12345, 'R', '20130203', '20141210'
;
This is a table that stores employee status data. A = Active, L = Leave, etc. What I need to do is consolidate these contiguous blocks into single rows. So, expected output would look like this:
EID | Stat | StartDate | EndDate |
12345 | A | 20111215 | 20121109 |
12345 | L | 20121109 | 20130103 |
12345 | A | 20130103 | 20130203 |
12345 | R | 20130203 | 20141210 |
The issue that's throwing me is that the person goes from A to L and back to A. So if I try to partition by EID, STAT in any type of windowing function, I get all screwd up. What is the sexy solution to this problem? Dazzle me please, SQL wizards!
December 15, 2017 at 12:37 pm
It requires two different ROW_NUMBER()s.
;
WITH StatusGroups AS
(
SELECT *
, Grp = ROW_NUMBER() OVER(PARTITION BY EID ORDER BY StartDate, Stat) - ROW_NUMBER() OVER(PARTITION BY EID, Stat ORDER BY StartDate)
FROM #EMP_TEST
)
SELECT EID, Stat, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate
FROM StatusGroups
GROUP BY EID, Stat, Grp
ORDER BY EID, StartDate
;
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 15, 2017 at 12:46 pm
DREW = TOO SEXY
December 15, 2017 at 1:01 pm
autoexcrement - Friday, December 15, 2017 12:46 PMDREW = TOO SEXY
what results would you expect for this set of data?
INSERT INTO #EMP_TEST
(
EID,
Stat,
StartDate,
EndDate
)
SELECT 12345, 'A', '20111215', '20120101' UNION ALL
SELECT 12345, 'A', '20120101', '20120118' UNION ALL
SELECT 12345, 'A', '20120118', '20120201' UNION ALL
SELECT 12345, 'A', '20120201', '20120321' UNION ALL
SELECT 12345, 'A', '20120321', '20120401' UNION ALL
SELECT 12345, 'A', '20120401', '20121109' UNION ALL
SELECT 12345, 'L', '20121109', '20130101' UNION ALL
SELECT 12345, 'L', '20130101', '20130103' UNION ALL
SELECT 12345, 'A', '20130103', '20130203' UNION ALL
SELECT 12345, 'T', '20120203', '20120208'
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
December 15, 2017 at 1:18 pm
J Livingston SQL - Friday, December 15, 2017 1:01 PMautoexcrement - Friday, December 15, 2017 12:46 PMDREW = TOO SEXYwhat results would you expect for this set of data?
INSERT INTO #EMP_TEST
(
EID,
Stat,
StartDate,
EndDate
)
SELECT 12345, 'A', '20111215', '20120101' UNION ALL
SELECT 12345, 'A', '20120101', '20120118' UNION ALL
SELECT 12345, 'A', '20120118', '20120201' UNION ALL
SELECT 12345, 'A', '20120201', '20120321' UNION ALL
SELECT 12345, 'A', '20120321', '20120401' UNION ALL
SELECT 12345, 'A', '20120401', '20121109' UNION ALL
SELECT 12345, 'L', '20121109', '20130101' UNION ALL
SELECT 12345, 'L', '20130101', '20130103' UNION ALL
SELECT 12345, 'A', '20130103', '20130203' UNION ALL
SELECT 12345, 'T', '20120203', '20120208'
My code was written with the assumption that the ranges do not overlap, since an overlap would indicate contradictory statuses. That is, an employee cannot both be active and terminated at the same time. The fact that this data contains such an overlap indicates that there is a data quality issue, not an issue with the code.
EDIT: I should also not that my code assumes that the statuses are contiguous. If there are gaps between statuses, then my code will not produce the correct results.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 15, 2017 at 1:23 pm
drew.allen - Friday, December 15, 2017 1:18 PMJ Livingston SQL - Friday, December 15, 2017 1:01 PMautoexcrement - Friday, December 15, 2017 12:46 PMDREW = TOO SEXYwhat results would you expect for this set of data?
INSERT INTO #EMP_TEST
(
EID,
Stat,
StartDate,
EndDate
)
SELECT 12345, 'A', '20111215', '20120101' UNION ALL
SELECT 12345, 'A', '20120101', '20120118' UNION ALL
SELECT 12345, 'A', '20120118', '20120201' UNION ALL
SELECT 12345, 'A', '20120201', '20120321' UNION ALL
SELECT 12345, 'A', '20120321', '20120401' UNION ALL
SELECT 12345, 'A', '20120401', '20121109' UNION ALL
SELECT 12345, 'L', '20121109', '20130101' UNION ALL
SELECT 12345, 'L', '20130101', '20130103' UNION ALL
SELECT 12345, 'A', '20130103', '20130203' UNION ALL
SELECT 12345, 'T', '20120203', '20120208'My code was written with the assumption that the ranges do not overlap, since an overlap would indicate contradictory statuses. That is, an employee cannot both be active and terminated at the same time. The fact that this data contains such an overlap indicates that there is a data quality issue, not an issue with the code.
Drew
absolutely agree Drew and it certainly was not my intention to question your code...I just wanted to ensure that the OP understood that overlaps (for what ever reason) will cause problems.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
December 15, 2017 at 1:32 pm
Yes, for the sake of this thread, it's safe to assume no overlaps. But good question. Thank you guys!
December 15, 2017 at 1:40 pm
Joe Torre - Friday, December 15, 2017 1:07 PMtake a look at this
This article depends on data where the records are for one unit (day), but the data above uses ranges instead of a record for every unit. You could convert the data above to one record per unit (day), but it would probably be less efficient than the code I supplied above, although it would account for gaps where my code assumes contiguous statuses. However, if there are not contiguous statuses there is another method that will produce contiguous statuses that would probably still be more efficient.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 20, 2017 at 1:21 pm
WITH
emp
(
EID,
Stat,
StartDate,
EndDate
)
AS
(SELECT 12345, 'A', '20111215', '20120101' UNION ALL
SELECT 12345, 'A', '20120101', '20120118' UNION ALL
SELECT 12345, 'A', '20120118', '20120201' UNION ALL
SELECT 12345, 'A', '20120201', '20120321' UNION ALL
SELECT 12345, 'A', '20120321', '20120401' UNION ALL
SELECT 12345, 'A', '20120401', '20121109' UNION ALL
SELECT 12345, 'L', '20121109', '20130101' UNION ALL
SELECT 12345, 'L', '20130101', '20130103' UNION ALL
SELECT 12345, 'A', '20130103', '20130203' UNION ALL
SELECT 12345, 'R', '20130203', '20141210'
),
workCte
( GapStart
, GapEnd
, GapDays
)
AS
(SELECT Lag(emp.StartDate) OVER (PARTITION BY emp.EID ORDER BY emp.EID, emp.StartDate)
, emp.StartDate
, DateDiff(dd, Lag(emp.StartDate) OVER (PARTITION BY emp.EID ORDER BY emp.EID, emp.StartDate), emp.StartDate)-1
FROM emp
) ,
Cal
(
dd
) AS
(
SELECT DateAdd(dd, n.number, (SELECT Min(emp.StartDate) FROM emp))
FROM master.dbo.spt_values n
WHERE n.type='p'
AND n.number<DateDiff(dd,(SELECT Max(emp.EndDate) FROM emp),(SELECT Min(emp.StartDate) FROM emp))
)
SELECT Coalesce(workCte.GapStart, 0) GapStart
, Coalesce(workCte.GapEnd, 0) GapEnd
, Coalesce(workCte.GapDays, 0) GapDays
FROM workCte
ORDER BY workCte.GapStart;
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply