Aggregate date series

Question

Post reply

Aggregate date series

hebjTS

SSC Enthusiast

Points: 114
More actions
January 9, 2024 at 2:33 pm
Go to Answer
#4359158

I have a an EmployeeTable with data for each day. I need to aggregate that into a table with From and Tom date.

This is a simplified view of the source data:
select *
from (values (1, 1, 20240101),
(1, 1, 20240102),
(1, 2, 20240103),
(1, 2, 20240104),
(1, 1, 20240105)) Employee(EmployeeId, EmploymentTypeId, DateId)
This is how I wish the result to look like:
select *
from (values (1, 1, 20240101, 20240102),
(1, 2, 20240103, 20240104),
(1, 1, 20240105, 20240105)) EmployeeAggr(EmployeeId, EmploymentTypeId, DateId_start, DateId_end)
If I try to create this with a group by I end up with this (which is not what I want):
with cte_Employee as (
select *
from (values (1, 1, 20240101),
(1, 1, 20240102),
(1, 2, 20240103),
(1, 2, 20240104),
(1, 1, 20240105)) Employee(EmployeeId, EmploymentTypeId, DateId)
)
select EmployeeId
, EmploymentTypeId
, min( DateId ) DateId_start
, max( DateId ) DateId_end
from cte_Employee
group by EmployeeId
, EmploymentTypeId
order by 3
I need to find a way to separate the 1st and 2nd of january from the 5th but Im not able to find a way to accomplish this.
Thanx!

Ken McKelvey

SSCoach

Points: 18984

January 9, 2024 at 3:51 pm

Answer

#4359197

You would make your life easier if you stored dates as dates. This is an Islands and Gaps problem. One solution:

WITH cte_Employee (EmployeeId, EmploymentTypeId, DateId)
AS
(
    SELECT EmployeeId, EmploymentTypeId, DateId
    FROM
    (
        VALUES (1, 1, 20240101)
            ,(1, 1, 20240102)
            ,(1, 2, 20240103)
            ,(1, 2, 20240104)
            ,(1, 1, 20240105)
    ) E (EmployeeId, EmploymentTypeId, DateId)
)
,Boundaries
AS
(
    SELECT E.EmployeeId, E.EmploymentTypeId, E.DateId
        ,CASE
            WHEN 1 = DATEDIFF(day, LAG(X.DateIdDate) OVER (PARTITION BY E.EmployeeId, E.EmploymentTypeId ORDER BY X.DateIdDate), X.DateIdDate)
            THEN 0
            ELSE 1
        END AS Boundary
    FROM cte_Employee E
        CROSS APPLY
        (
            VALUES
            (
                CAST(CAST(DateId AS char(8)) AS date)
            )
        ) X (DateIdDate)
)
,Grps
AS
(
    SELECT EmployeeId, EmploymentTypeId, DateId
        ,SUM(Boundary) OVER (PARTITION BY EmployeeId, EmploymentTypeId ORDER BY DateId) AS Grp
    FROM Boundaries
)
SELECT EmployeeId, EmploymentTypeId
    ,MIN(DateId) AS DateId_start
    ,MAX(DateId) AS DateId_end
FROM Grps
GROUP BY EmployeeId, EmploymentTypeId, Grp
ORDER BY EmployeeId, DateId_start;

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

hebjTS SSC Enthusiast Points: 114 More actions · Answer 1

Excellent! Ive never seen sum() used with over (... order by ) this way before so I learned something new today! 🙂 Thankyou!

(This was an excerpt from a fact-table in a DW so thats why the "dates" are stored as int.)

Ken McKelvey SSCoach Points: 18984 More actions · Answer 2

Looking at this again the following is probably better:

WITH cte_Employee (EmployeeId, EmploymentTypeId, DateId)
AS
(
    SELECT EmployeeId, EmploymentTypeId, DateId
    FROM
    (
        VALUES (1, 1, 20240101)
            ,(1, 1, 20240102)
            ,(1, 2, 20240103)
            ,(1, 2, 20240104)
            ,(1, 1, 20240105)
    ) E (EmployeeId, EmploymentTypeId, DateId)
)
,Boundaries
AS
(
SELECT EmployeeId, EmploymentTypeId, DateId
    ,CASE
        WHEN EmploymentTypeId = LAG(EmploymentTypeId) OVER (PARTITION BY EmployeeId ORDER BY DateId)
        THEN 0
        ELSE 1
    END AS Boundary
FROM cte_Employee
)
,Grps
AS
(
    SELECT EmployeeId, EmploymentTypeId, DateId
        ,SUM(Boundary) OVER (PARTITION BY EmployeeId, EmploymentTypeId ORDER BY DateId) AS Grp
    FROM Boundaries
)
SELECT EmployeeId, EmploymentTypeId
    ,MIN(DateId) AS DateId_start
    ,MAX(DateId) AS DateId_end
FROM Grps
GROUP BY EmployeeId, EmploymentTypeId, Grp
ORDER BY EmployeeId, DateId_start;

hebjTS SSC Enthusiast Points: 114 More actions · Answer 3

Yes that is an even better solution. I also removed EmploymentTypeId from partition by in the sum clause later to get a more appealing increment in Grp. Thanx for your attention to this!