I have a an EmployeeTable with data for each day. I need to aggregate that into a table with From and Tom date.
This is a simplified view of the source data:
select *
from (values (1, 1, 20240101),
(1, 1, 20240102),
(1, 2, 20240103),
(1, 2, 20240104),
(1, 1, 20240105)) Employee(EmployeeId, EmploymentTypeId, DateId)
This is how I wish the result to look like:
select *
from (values (1, 1, 20240101, 20240102),
(1, 2, 20240103, 20240104),
(1, 1, 20240105, 20240105)) EmployeeAggr(EmployeeId, EmploymentTypeId, DateId_start, DateId_end)
If I try to create this with a group by I end up with this (which is not what I want):
with cte_Employee as (
select *
from (values (1, 1, 20240101),
(1, 1, 20240102),
(1, 2, 20240103),
(1, 2, 20240104),
(1, 1, 20240105)) Employee(EmployeeId, EmploymentTypeId, DateId)
)
select EmployeeId
, EmploymentTypeId
, min( DateId ) DateId_start
, max( DateId ) DateId_end
from cte_Employee
group by EmployeeId
, EmploymentTypeId
order by 3
I need to find a way to separate the 1st and 2nd of january from the 5th but Im not able to find a way to accomplish this.
Thanx!
You would make your life easier if you stored dates as dates. This is an Islands and Gaps problem. One solution:
WITH cte_Employee (EmployeeId, EmploymentTypeId, DateId)
AS
(
SELECT EmployeeId, EmploymentTypeId, DateId
FROM
(
VALUES (1, 1, 20240101)
,(1, 1, 20240102)
,(1, 2, 20240103)
,(1, 2, 20240104)
,(1, 1, 20240105)
) E (EmployeeId, EmploymentTypeId, DateId)
)
,Boundaries
AS
(
SELECT E.EmployeeId, E.EmploymentTypeId, E.DateId
,CASE
WHEN 1 = DATEDIFF(day, LAG(X.DateIdDate) OVER (PARTITION BY E.EmployeeId, E.EmploymentTypeId ORDER BY X.DateIdDate), X.DateIdDate)
THEN 0
ELSE 1
END AS Boundary
FROM cte_Employee E
CROSS APPLY
(
VALUES
(
CAST(CAST(DateId AS char(8)) AS date)
)
) X (DateIdDate)
)
,Grps
AS
(
SELECT EmployeeId, EmploymentTypeId, DateId
,SUM(Boundary) OVER (PARTITION BY EmployeeId, EmploymentTypeId ORDER BY DateId) AS Grp
FROM Boundaries
)
SELECT EmployeeId, EmploymentTypeId
,MIN(DateId) AS DateId_start
,MAX(DateId) AS DateId_end
FROM Grps
GROUP BY EmployeeId, EmploymentTypeId, Grp
ORDER BY EmployeeId, DateId_start;
January 9, 2024 at 4:31 pm
Excellent! Ive never seen sum() used with over (... order by ) this way before so I learned something new today! 🙂 Thankyou!
(This was an excerpt from a fact-table in a DW so thats why the "dates" are stored as int.)
January 9, 2024 at 5:13 pm
Looking at this again the following is probably better:
WITH cte_Employee (EmployeeId, EmploymentTypeId, DateId)
AS
(
SELECT EmployeeId, EmploymentTypeId, DateId
FROM
(
VALUES (1, 1, 20240101)
,(1, 1, 20240102)
,(1, 2, 20240103)
,(1, 2, 20240104)
,(1, 1, 20240105)
) E (EmployeeId, EmploymentTypeId, DateId)
)
,Boundaries
AS
(
SELECT EmployeeId, EmploymentTypeId, DateId
,CASE
WHEN EmploymentTypeId = LAG(EmploymentTypeId) OVER (PARTITION BY EmployeeId ORDER BY DateId)
THEN 0
ELSE 1
END AS Boundary
FROM cte_Employee
)
,Grps
AS
(
SELECT EmployeeId, EmploymentTypeId, DateId
,SUM(Boundary) OVER (PARTITION BY EmployeeId, EmploymentTypeId ORDER BY DateId) AS Grp
FROM Boundaries
)
SELECT EmployeeId, EmploymentTypeId
,MIN(DateId) AS DateId_start
,MAX(DateId) AS DateId_end
FROM Grps
GROUP BY EmployeeId, EmploymentTypeId, Grp
ORDER BY EmployeeId, DateId_start;
January 10, 2024 at 2:22 pm
Yes that is an even better solution. I also removed EmploymentTypeId from partition by in the sum clause later to get a more appealing increment in Grp. Thanx for your attention to this!
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply