Aggregate date series

  • I have a an EmployeeTable with data for each day. I need to aggregate that into a table with From and Tom date.

     

    This is a simplified view of the source data:

    select *

    from (values (1, 1, 20240101),

    (1, 1, 20240102),

    (1, 2, 20240103),

    (1, 2, 20240104),

    (1, 1, 20240105)) Employee(EmployeeId, EmploymentTypeId, DateId)

    This is how I wish the result to look like:

    select *

    from (values (1, 1, 20240101, 20240102),

    (1, 2, 20240103, 20240104),

    (1, 1, 20240105, 20240105)) EmployeeAggr(EmployeeId, EmploymentTypeId, DateId_start, DateId_end)

    If I try to create this with a group by I end up with this (which is not what I want):

    with cte_Employee as (

    select *

    from (values (1, 1, 20240101),

    (1, 1, 20240102),

    (1, 2, 20240103),

    (1, 2, 20240104),

    (1, 1, 20240105)) Employee(EmployeeId, EmploymentTypeId, DateId)

    )

    select EmployeeId

    , EmploymentTypeId

    , min( DateId ) DateId_start

    , max( DateId ) DateId_end

    from cte_Employee

    group by EmployeeId

    , EmploymentTypeId

    order by 3

    I need to find a way to separate the 1st and 2nd of january from the 5th but Im not able to find a way to accomplish this.

    Thanx!

  • You would make your life easier if you stored dates as dates. This is an Islands and Gaps problem. One solution:

    WITH cte_Employee (EmployeeId, EmploymentTypeId, DateId)
    AS
    (
    SELECT EmployeeId, EmploymentTypeId, DateId
    FROM
    (
    VALUES (1, 1, 20240101)
    ,(1, 1, 20240102)
    ,(1, 2, 20240103)
    ,(1, 2, 20240104)
    ,(1, 1, 20240105)
    ) E (EmployeeId, EmploymentTypeId, DateId)
    )
    ,Boundaries
    AS
    (
    SELECT E.EmployeeId, E.EmploymentTypeId, E.DateId
    ,CASE
    WHEN 1 = DATEDIFF(day, LAG(X.DateIdDate) OVER (PARTITION BY E.EmployeeId, E.EmploymentTypeId ORDER BY X.DateIdDate), X.DateIdDate)
    THEN 0
    ELSE 1
    END AS Boundary
    FROM cte_Employee E
    CROSS APPLY
    (
    VALUES
    (
    CAST(CAST(DateId AS char(8)) AS date)
    )
    ) X (DateIdDate)
    )
    ,Grps
    AS
    (
    SELECT EmployeeId, EmploymentTypeId, DateId
    ,SUM(Boundary) OVER (PARTITION BY EmployeeId, EmploymentTypeId ORDER BY DateId) AS Grp
    FROM Boundaries
    )
    SELECT EmployeeId, EmploymentTypeId
    ,MIN(DateId) AS DateId_start
    ,MAX(DateId) AS DateId_end
    FROM Grps
    GROUP BY EmployeeId, EmploymentTypeId, Grp
    ORDER BY EmployeeId, DateId_start;
  • Excellent! Ive never seen sum() used with over (... order by ) this way before so I learned something new today! 🙂 Thankyou!

    (This was an excerpt from a fact-table in a DW so thats why the "dates" are stored as int.)

  • Looking at this again the following is probably better:

    WITH cte_Employee (EmployeeId, EmploymentTypeId, DateId)
    AS
    (
    SELECT EmployeeId, EmploymentTypeId, DateId
    FROM
    (
    VALUES (1, 1, 20240101)
    ,(1, 1, 20240102)
    ,(1, 2, 20240103)
    ,(1, 2, 20240104)
    ,(1, 1, 20240105)
    ) E (EmployeeId, EmploymentTypeId, DateId)
    )
    ,Boundaries
    AS
    (
    SELECT EmployeeId, EmploymentTypeId, DateId
    ,CASE
    WHEN EmploymentTypeId = LAG(EmploymentTypeId) OVER (PARTITION BY EmployeeId ORDER BY DateId)
    THEN 0
    ELSE 1
    END AS Boundary
    FROM cte_Employee
    )
    ,Grps
    AS
    (
    SELECT EmployeeId, EmploymentTypeId, DateId
    ,SUM(Boundary) OVER (PARTITION BY EmployeeId, EmploymentTypeId ORDER BY DateId) AS Grp
    FROM Boundaries
    )
    SELECT EmployeeId, EmploymentTypeId
    ,MIN(DateId) AS DateId_start
    ,MAX(DateId) AS DateId_end
    FROM Grps
    GROUP BY EmployeeId, EmploymentTypeId, Grp
    ORDER BY EmployeeId, DateId_start;
  • Yes that is an even better solution. I also removed EmploymentTypeId from partition by in the sum clause later to get a more appealing increment in Grp. Thanx for your attention to this!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply