Hi,
Im facing some performance issues with a query im wokring on and really don't know how to work around.
The tabele contains assignments (role) per person within a team. A person can exist in multiple teams but within each team+person+role the assignment rows will never overlap. What I need to accomplish is a way to combine a persons all roles into columns and then contruct a new date interval for all rows valid for a common date range:
/*
From this
TeamId PersonId RoleName StartDate EndDate
1 1 Manager 2020-01-01 null
1 1 Other 2020-06-01 2023-04-30
1 1 Leader 2021-01-01 2023-06-30
2 1 Leader 2021-06-01 2021-12-31
To this
TeamId PersonId StartDate EndDate Manager Leader Other
1 1 2020-01-01 2020-05-31 1 null null
1 1 2020-06-01 2020-12-31 1 null 1
1 1 2021-01-01 2023-04-30 1 1 1
1 1 2023-05-01 2023-06-30 1 1 null
1 1 2023-07-01 2024-12-31 1 null null
2 1 2021-06-01 2021-12-31 null 1 1
*/
Below is the query im using now and it work perfectly, but when I run it for the entire dataset it gets too heavy. The number of roles are fixed to it's no problem to hard code these. It's just that is't about 25 roles in totalt and one person can have up to 10 different roles over time, and the dates span from early 2000 to and forward.
And since i have to expand each person role range into days i get a lot of rows before it can group it in the pivot. I've tried do it in a loop, working with one person at a time but it still takes forever to complete.
Is there a smarter way to achive this than how Im doing now?
DROP TABLE IF EXISTS #AssignmentRange
CREATE TABLE #AssignmentRange(
TeamId int
, PersonId int
, RoleName varchar(10)
, StartDate date
, EndDate date
)
INSERT INTO #AssignmentRange
SELECT 1, 1, 'Manager', '2020-01-01', NULL UNION
SELECT 1, 1, 'Leader', '2021-01-01', '2023-06-30' UNION
SELECT 1, 1, 'Other', '2020-06-01', '2023-04-30' UNION
SELECT 2, 1, 'Leader', '2021-06-01', '2021-12-31'
DROP TABLE IF EXISTS #dates
CREATE TABLE #dates(
[Date] date NOT NULL PRIMARY KEY
)
/*
Generate all dates between 1990-01-01 and end of next year
*/
INSERT INTO #dates
SELECT
DATEADD(DAY, [value], '1990-01-01')
FROM
GENERATE_SERIES(
0
, DATEDIFF(DAY,'1990-01-01', DATEFROMPARTS(YEAR(GETDATE()) +1, 12, 31))
, 1)
SELECT
TeamId
, PersonId
, MIN([Date]) AS StartDate
, MAX([Date]) AS EndDate
, [Manager]
, [Leader]
, [Other]
FROM(
SELECT
*
,COUNT(NextJsonOBJ) OVER(PARTITION BY TeamId, PersonId ORDER BY [Date]) AS Groups
FROM(
SELECT
*
, CASE
WHEN LAG(JsonOBJ, 1, NULL) OVER(PARTITION BY TeamId, PersonId ORDER BY [Date]) IS NULL
THEN 1
WHEN LAG(JsonOBJ, 1, NULL) OVER(PARTITION BY TeamId, PersonId ORDER BY [Date]) != JsonOBJ
THEN 1
ELSE NULL
END AS NextJsonOBJ
FROM(
SELECT
*
,(
SELECT
TeamId
, PersonId
, [Manager]
, [Leader]
, [Other]
FOR JSON PATH
, WITHOUT_ARRAY_WRAPPER
) AS JsonOBJ
FROM(
SELECT
ar.TeamId
, ar.PersonId
, d.[Date]
, ar.RoleName
, 1 AS Dummy
FROM
#AssignmentRange AS ar
INNER JOIN
#dates AS d
ON d.[Date] BETWEEN
ar.StartDate
AND COALESCE(ar.EndDate, DATEFROMPARTS(YEAR(GETDATE()) +1, 12, 31))
) AS n
PIVOT(
MAX(Dummy)
FOR RoleName IN (
[Manager]
, [Leader]
, [Other]
)
) AS pvt
) AS n
) AS x
) AS u
GROUP BY
TeamId
, PersonId
, [Manager]
, [Leader]
, [Other]
, Groups
ORDER BY
PersonId, TeamId, StartDate
Regards,
Alfred
November 17, 2023 at 3:10 pm
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
November 17, 2023 at 6:21 pm
Afaik this returns the same output. The CTEs expand the date intervals, pivot by role name and summarize by day, identify gaps in GROUP BY columns, and create groups using SUM OVER the gaps. The final SELECT summarizes using the 'grp' column in the GROUP BY. To improve performance an indexing strategy, possibly using temp table(s) could be beneficial. Performance results depend on the unique scenario. Also, it seems possible this code could be further optimized for performance by factoring out certain CTE(s)
declare @end_of_next_yr date=datefromparts(year(getdate())+1, 12, 31);
with
exp_cte as (
select ar.*, dateadd(day, t.n-1, ar.StartDate) exp_dt
from #AssignmentRange ar
cross apply (select top (datediff(day, ar.StartDate, isnull(EndDate, @end_of_next_yr))+1)
row_number() over (order by (select null))
from sys.all_columns) t(n)),
unq_cte as (
select TeamId, PersonId, exp_dt,
max(iif(RoleName='Manager', 1, null)) rn_Manager,
max(iif(RoleName='Leader', 1, null)) rn_Leader,
max(iif(RoleName='Other', 1, null)) rn_Other
from exp_cte
group by TeamId, PersonId, exp_dt),
gap_cte as (
select *, CASE WHEN (LAG(ISNULL(rn_Manager, 0)) OVER (partition by TeamId, PersonId ORDER BY exp_dt) <> ISNULL(rn_Manager, 0))
OR (LAG(ISNULL(rn_Leader, 0)) OVER (partition by TeamId, PersonId ORDER BY exp_dt) <> ISNULL(rn_Leader, 0))
OR (LAG(ISNULL(rn_Other, 0)) OVER (partition by TeamId, PersonId ORDER BY exp_dt) <> ISNULL(rn_Other, 0))
THEN 1 ELSE 0 END AS gap
from unq_cte),
grp_cte as (
select *, sum(gap) over (partition by TeamId, PersonId ORDER BY exp_dt) grp
from gap_cte)
select TeamId, PersonId,min(exp_dt) StartDate, max(exp_dt) EndDate,
rn_Manager, rn_Leader, rn_Other
from grp_cte
group by TeamId, PersonId, rn_Manager, rn_Leader, rn_Other, grp
order by TeamId, PersonId, StartDate;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
November 17, 2023 at 8:00 pm
The top two CTEs could be replaced with only one. Possibly the bottom two could be combined as well using the "smudge" method?
with
pvt_cte as (
select TeamId, PersonId, dateadd(day, t.n-1, ar.StartDate) exp_dt,
max(iif(RoleName='Manager', 1, null)) rn_Manager,
max(iif(RoleName='Leader', 1, null)) rn_Leader,
max(iif(RoleName='Other', 1, null)) rn_Other
from #AssignmentRange ar
cross apply (select top (datediff(day, ar.StartDate, isnull(EndDate, @end_of_next_yr))+1)
row_number() over (order by (select null))
from sys.all_columns) t(n)
group by TeamId, PersonId, dateadd(day, t.n-1, ar.StartDate)),
gap_cte as (
select *, iif((lag(isnull(rn_manager, 0)) over (partition by teamid, personid order by exp_dt) <> isnull(rn_manager, 0))
or (lag(isnull(rn_leader, 0)) over (partition by teamid, personid order by exp_dt) <> isnull(rn_leader, 0))
or (lag(isnull(rn_other, 0)) over (partition by teamid, personid order by exp_dt) <> isnull(rn_other, 0)), 1, 0) as gap
from pvt_cte),
grp_cte as (
select *, sum(gap) over (partition by TeamId, PersonId ORDER BY exp_dt) grp
from gap_cte)
select TeamId, PersonId,min(exp_dt) StartDate, max(exp_dt) EndDate,
rn_Manager, rn_Leader, rn_Other
from grp_cte
group by TeamId, PersonId, rn_Manager, rn_Leader, rn_Other, grp
order by TeamId, PersonId, StartDate;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
November 17, 2023 at 10:16 pm
WITH ChangeDates
AS
(
SELECT R.TeamId, R.PersonId, D.ChangeDate
,MAX(IIF(R.RoleName = 'Manager', X.StartEnd, 0)) AS Manager
,MAX(IIF(R.RoleName = 'Leader', X.StartEnd, 0)) AS Leader
,MAX(IIF(R.RoleName = 'Other', X.StartEnd, 0)) AS Other
FROM #AssignmentRange R
CROSS APPLY (VALUES (1),(-1)) X (StartEnd)
CROSS APPLY
(
VALUES
(
CASE
WHEN X.StartEnd = 1
THEN StartDate
ELSE DATEADD(day, 1, ISNULL(EndDate, '9999'))
END
)
) D (ChangeDate)
GROUP BY R.TeamId, R.PersonId, D.ChangeDate
)
,StartEnd
AS
(
SELECT TeamId, PersonId
,ChangeDate AS StartDate
,DATEADD(day, -1, LEAD(ChangeDate) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate)) AS EndDate
,SUM(Manager) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Manager
,SUM(Leader) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Leader
,SUM(Other) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Other
FROM ChangeDates
/* SSQL2022 WINDOW win AS (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) */
)
SELECT TeamId, PersonId, StartDate
,IIF(EndDate = '9999', NULL, EndDate) AS EndDate
,Manager, Leader, Other
FROM StartEnd
WHERE EndDate IS NOT NULL
ORDER BY TeamId, PersonId, StartDate;
November 18, 2023 at 11:14 am
Or dynamically:
/* Should be able to get this from a role table without DISTINCT
SELECT DISTINCT RoleName
INTO #Roles
FROM #AssignmentRange;
*/
SELECT RoleId, RoleName
INTO #Roles
FROM
(
VALUES (1, 'Manager')
,(2, 'Leader')
,(3, 'Other')
) V (RoleId, RoleName)
DECLARE @sql nvarchar(MAX) = N'WITH ChangeDates
AS
(
SELECT R.TeamId, R.PersonId, D.ChangeDate
';
SELECT @sql += N' ,MAX(IIF(R.RoleName = ''' + RoleName + N''', X.StartEnd, 0)) AS ' + RoleName + NCHAR(13) + NCHAR(10)
FROM #Roles
ORDER BY RoleId;
SET @sql += N' FROM #AssignmentRange R
CROSS APPLY (VALUES (1),(-1)) X (StartEnd)
CROSS APPLY
(
VALUES
(
CASE
WHEN X.StartEnd = 1
THEN StartDate
ELSE DATEADD(day, 1, ISNULL(EndDate, ''9999''))
END
)
) D (ChangeDate)
GROUP BY R.TeamId, R.PersonId, D.ChangeDate
)
,StartEnd
AS
(
SELECT TeamId, PersonId
,ChangeDate AS StartDate
,DATEADD(day, -1, LEAD(ChangeDate) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate)) AS EndDate
';
SELECT @sql += N' ,SUM(' + RoleName +N') OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS ' + RoleName + NCHAR(13) + NCHAR(10)
FROM #Roles
ORDER BY RoleId;
SET @sql += N' FROM ChangeDates
/* SSQL2022 WINDOW win AS (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) */
)
SELECT TeamId, PersonId, StartDate
,IIF(EndDate = ''9999'', NULL, EndDate) AS EndDate
';
SELECT @sql += N',' + RoleName
FROM #Roles
ORDER BY RoleId;
SET @sql += NCHAR(13) + NCHAR(10) + N'FROM StartEnd
WHERE EndDate IS NOT NULL
ORDER BY TeamId, PersonId, StartDate;
'
--print @sql
EXEC sp_executesql @sql;
November 20, 2023 at 11:02 am
The top two CTEs could be replaced with only one. Possibly the bottom two could be combined as well using the "smudge" method?
Thank you for your input Steve. Your solution is way prettier than mine but the performance issue is the same, and Im guessing it's due to the same technique with exapnding all the intervals.
November 20, 2023 at 11:08 am
Thank you Ken for you input. Your solution doesn't expand the intervals and that makes it really fast. But it doesn't produce correct result. In this scenario the final row has "1" for more than that the one role that is active from 2021-07-01
INSERT INTO #AssignmentRange
SELECT 1, 1, 'Manager', '2006-05-24', '2018-10-23' UNION
SELECT 1, 1, 'Leader', '2018-10-24', '2021-06-30' UNION
SELECT 1, 1, 'Other', '2015-01-12', '2022-06-30' UNION
SELECT 1, 1, 'Role2', '2018-10-24', '2021-06-30' UNION
SELECT 1, 1, 'Role3', '2006-05-24', '2021-06-30'
/*
TeamId PersonId StartDate EndDate Manager Leader Other Role2 Role3
1 1 2006-05-24 2015-01-11 1 0 0 0 1
1 1 2015-01-12 2018-10-23 1 0 1 0 1
1 1 2018-10-24 2021-06-30 1 1 1 1 1
1 1 2021-07-01 2022-06-30 1 1 1 1 1
*/
It should be
/*
StartDate EndDate Manager Leader Other Role2 Role3
2006-05-24 2015-01-11 1 0 0 0 1
2015-01-12 2018-10-23 1 0 1 0 1
2018-10-24 2021-06-30 0 1 1 1 1
2021-07-01 2022-06-30 0 0 1 0 0
*/
WITH ChangeDates
AS
(
SELECT R.TeamId, R.PersonId, D.ChangeDate
,MAX(CASE WHEN R.RoleName = 'Manager' THEN X.StartEnd END) AS Manager
,MAX(CASE WHEN R.RoleName = 'Leader' THEN X.StartEnd END) AS Leader
,MAX(CASE WHEN R.RoleName = 'Other' THEN X.StartEnd END) AS Other
,MAX(CASE WHEN R.RoleName = 'Role2' THEN X.StartEnd END) AS Role2
,MAX(CASE WHEN R.RoleName = 'Role3' THEN X.StartEnd END) AS Role3
FROM #AssignmentRange R
CROSS APPLY (VALUES (1),(-1)) X (StartEnd)
CROSS APPLY
(
VALUES
(
CASE
WHEN X.StartEnd = 1
THEN StartDate
ELSE DATEADD(day, 1, ISNULL(EndDate, '9999'))
END
)
) D (ChangeDate)
GROUP BY R.TeamId, R.PersonId, D.ChangeDate
)
,StartEnd
AS
(
SELECT TeamId, PersonId
,ChangeDate AS StartDate
,DATEADD(day, -1, LEAD(ChangeDate) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate)) AS EndDate
,SUM(Manager) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Manager
,SUM(Leader) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Leader
,SUM(Other) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Other
,SUM(Role2) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Role2
,SUM(Role3) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Role3
FROM ChangeDates
/* SSQL2022 WINDOW win AS (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) */
)
SELECT TeamId, PersonId, StartDate
,IIF(EndDate = '9999', NULL, EndDate) AS EndDate
,ISNULL(Manager, 0) AS Manager
,ISNULL(Leader, 0) AS Leader
,ISNULL(Other, 0) AS Other
,ISNULL(Role2, 0) AS Role2
,ISNULL(Role3, 0) AS Role2
FROM StartEnd
WHERE EndDate IS NOT NULL
ORDER BY TeamId, PersonId, StartDate;
November 20, 2023 at 12:44 pm
Now it looks like it's correct for all rows in my data set. I've been working on a new solution on my own but didn't get correct result when the range was just one day, but your solution handles that perfectly.
Thank you so much Ken!
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply