Pivot with grouping internvals

Question

Post reply

Pivot with grouping internvals

Alfred

SSC-Addicted

Points: 436

November 16, 2023 at 2:45 pm

Go to Answer

#4328926

Hi,

Im facing some performance issues with a query im wokring on and really don't know how to work around.

The tabele contains assignments (role) per person within a team. A person can exist in multiple teams but within each team+person+role the assignment rows will never overlap. What I need to accomplish is a way to combine a persons all roles into columns and then contruct a new date interval for all rows valid for a common date range:

/*
From this
TeamId  PersonId    RoleName    StartDate   EndDate
1       1           Manager     2020-01-01  null
1       1           Other       2020-06-01  2023-04-30
1       1           Leader      2021-01-01  2023-06-30
2       1           Leader      2021-06-01  2021-12-31

To this
TeamId  PersonId    StartDate   EndDate     Manager     Leader      Other
1       1           2020-01-01  2020-05-31  1           null        null
1       1           2020-06-01  2020-12-31  1           null        1
1       1           2021-01-01  2023-04-30  1           1           1
1       1           2023-05-01  2023-06-30  1           1           null
1       1           2023-07-01  2024-12-31  1           null        null
2       1           2021-06-01  2021-12-31  null        1           1
*/

Below is the query im using now and it work perfectly, but when I run it for the entire dataset it gets too heavy. The number of roles are fixed to it's no problem to hard code these. It's just that is't about 25 roles in totalt and one person can have up to 10 different roles over time, and the dates span from early 2000 to and forward.

And since i have to expand each person role range into days i get a lot of rows before it can group it in the pivot. I've tried do it in a loop, working with one person at a time but it still takes forever to complete.

Is there a smarter way to achive this than how Im doing now?

DROP TABLE IF EXISTS #AssignmentRange
CREATE TABLE #AssignmentRange(
    TeamId int
    , PersonId int
    , RoleName varchar(10)
    , StartDate date
    , EndDate date
)


INSERT INTO #AssignmentRange
SELECT 1, 1, 'Manager', '2020-01-01', NULL UNION
SELECT 1, 1, 'Leader', '2021-01-01', '2023-06-30' UNION
SELECT 1, 1, 'Other', '2020-06-01', '2023-04-30' UNION
SELECT 2, 1, 'Leader', '2021-06-01', '2021-12-31'


DROP TABLE IF EXISTS #dates
CREATE TABLE #dates(
    [Date] date NOT NULL PRIMARY KEY
)

/*
    Generate all dates between 1990-01-01 and end of next year
*/
INSERT INTO #dates
SELECT
    DATEADD(DAY, [value], '1990-01-01')
FROM
    GENERATE_SERIES(
        0
        , DATEDIFF(DAY,'1990-01-01', DATEFROMPARTS(YEAR(GETDATE()) +1, 12, 31))
        , 1)


SELECT
    TeamId
    , PersonId
    , MIN([Date]) AS StartDate
    , MAX([Date]) AS EndDate
    , [Manager]
    , [Leader]
    , [Other]
FROM(
    SELECT
        *
        ,COUNT(NextJsonOBJ) OVER(PARTITION BY TeamId, PersonId  ORDER BY [Date]) AS Groups
    FROM(
        SELECT
            *
            , CASE
                WHEN LAG(JsonOBJ, 1, NULL) OVER(PARTITION BY TeamId, PersonId ORDER BY [Date]) IS NULL
                    THEN 1
                WHEN LAG(JsonOBJ, 1, NULL) OVER(PARTITION BY TeamId, PersonId  ORDER BY [Date]) != JsonOBJ
                    THEN 1
                ELSE NULL
            END AS NextJsonOBJ
        FROM(
            SELECT
                *
                ,(
                    SELECT
                        TeamId
                        , PersonId
                        , [Manager]
                        , [Leader]
                        , [Other]
                    FOR JSON PATH
                        , WITHOUT_ARRAY_WRAPPER
                ) AS JsonOBJ
            FROM(
                SELECT
                    ar.TeamId
                    , ar.PersonId
                    , d.[Date]
                    , ar.RoleName
                    , 1 AS Dummy
                FROM
                    #AssignmentRange AS ar
                INNER JOIN
                    #dates AS d
                        ON d.[Date] BETWEEN
                            ar.StartDate
                            AND COALESCE(ar.EndDate, DATEFROMPARTS(YEAR(GETDATE()) +1, 12, 31))
            ) AS n
            PIVOT(
                MAX(Dummy)
                FOR RoleName IN (
                    [Manager]
                    , [Leader]
                    , [Other]
                )
            ) AS pvt
        ) AS n
    ) AS x
) AS u
GROUP BY
    TeamId
    , PersonId
    , [Manager]
    , [Leader]
    , [Other]
    , Groups
ORDER BY
    PersonId, TeamId, StartDate

Regards,

Alfred

Ken McKelvey

SSCoach

Points: 18842

November 20, 2023 at 11:32 am

Answer

#4331036

WITH ChangeDates
AS
(
    SELECT R.TeamId, R.PersonId, D.ChangeDate
,MAX(CASE WHEN R.RoleName = 'Manager' THEN X.StartEnd END) AS Manager
,MAX(CASE WHEN R.RoleName = 'Leader' THEN X.StartEnd END) AS Leader
,MAX(CASE WHEN R.RoleName = 'Other' THEN X.StartEnd END) AS Other
,MAX(CASE WHEN R.RoleName = 'Role2' THEN X.StartEnd END) AS Role2
,MAX(CASE WHEN R.RoleName = 'Role3' THEN X.StartEnd END) AS Role3
    FROM #AssignmentRange R
        CROSS APPLY (VALUES (1),(-1)) X (StartEnd)
        CROSS APPLY
        (
            VALUES
            (
                CASE
                    WHEN X.StartEnd = 1
                    THEN StartDate
                    ELSE DATEADD(day, 1, ISNULL(EndDate, '9999'))
                END
            )
        ) D (ChangeDate)
    GROUP BY R.TeamId, R.PersonId, D.ChangeDate
)
,StartEnd
AS
(
    SELECT TeamId, PersonId
        ,ChangeDate AS StartDate
        ,DATEADD(day, -1, LEAD(ChangeDate) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate)) AS EndDate
        ,SUM(Manager) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Manager
        ,SUM(Leader) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Leader
        ,SUM(Other) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Other
        ,SUM(Role2) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Role2
        ,SUM(Role3) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Role3
    FROM ChangeDates
    /* SSQL2022 WINDOW win AS (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) */
)
SELECT TeamId, PersonId, StartDate
    ,IIF(EndDate = '9999', NULL, EndDate) AS EndDate
    ,ISNULL(Manager, 0) AS Manager
    ,ISNULL(Leader, 0) AS Leader
    ,ISNULL(Other, 0) AS Other
    ,ISNULL(Role2, 0) AS Role2
    ,ISNULL(Role3, 0) AS Role2
FROM StartEnd
WHERE EndDate IS NOT NULL
ORDER BY TeamId, PersonId, StartDate;

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Site Owners SSC Guru Points: 80330 More actions · Answer 1

Thanks for posting your issue and hopefully someone will answer soon.

This is an automated bump to increase visibility of your question.

Steve Collins SSCrazy Points: 2157 More actions · Answer 2

Afaik this returns the same output. The CTEs expand the date intervals, pivot by role name and summarize by day, identify gaps in GROUP BY columns, and create groups using SUM OVER the gaps. The final SELECT summarizes using the 'grp' column in the GROUP BY. To improve performance an indexing strategy, possibly using temp table(s) could be beneficial. Performance results depend on the unique scenario. Also, it seems possible this code could be further optimized for performance by factoring out certain CTE(s)

declare @end_of_next_yr date=datefromparts(year(getdate())+1, 12, 31);

with 
exp_cte as (
    select ar.*, dateadd(day, t.n-1, ar.StartDate) exp_dt
    from #AssignmentRange ar
         cross apply (select top (datediff(day, ar.StartDate, isnull(EndDate, @end_of_next_yr))+1)
                             row_number() over (order by (select null)) 
                      from sys.all_columns) t(n)),
unq_cte as (
    select TeamId, PersonId, exp_dt,
           max(iif(RoleName='Manager', 1, null)) rn_Manager,
           max(iif(RoleName='Leader', 1, null)) rn_Leader,
           max(iif(RoleName='Other', 1, null)) rn_Other
    from exp_cte
    group by TeamId, PersonId, exp_dt),
gap_cte as (
    select *, CASE WHEN (LAG(ISNULL(rn_Manager, 0)) OVER (partition by TeamId, PersonId ORDER BY exp_dt) <> ISNULL(rn_Manager, 0)) 
                     OR (LAG(ISNULL(rn_Leader, 0)) OVER (partition by TeamId, PersonId ORDER BY exp_dt) <> ISNULL(rn_Leader, 0))
                     OR (LAG(ISNULL(rn_Other, 0)) OVER (partition by TeamId, PersonId ORDER BY exp_dt) <> ISNULL(rn_Other, 0))
                   THEN 1 ELSE 0 END AS gap
    from unq_cte),
grp_cte as (
    select *, sum(gap) over (partition by TeamId, PersonId ORDER BY exp_dt) grp
    from gap_cte)
select TeamId, PersonId,min(exp_dt) StartDate, max(exp_dt) EndDate,
       rn_Manager, rn_Leader, rn_Other
from grp_cte
group by TeamId, PersonId, rn_Manager, rn_Leader, rn_Other, grp
order by TeamId, PersonId, StartDate;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2157 More actions · Answer 3

The top two CTEs could be replaced with only one. Possibly the bottom two could be combined as well using the "smudge" method?

with 
pvt_cte as (
    select TeamId, PersonId, dateadd(day, t.n-1, ar.StartDate) exp_dt,
           max(iif(RoleName='Manager', 1, null)) rn_Manager,
           max(iif(RoleName='Leader', 1, null)) rn_Leader,
           max(iif(RoleName='Other', 1, null)) rn_Other
    from #AssignmentRange ar
         cross apply (select top (datediff(day, ar.StartDate, isnull(EndDate, @end_of_next_yr))+1)
                             row_number() over (order by (select null)) 
                      from sys.all_columns) t(n)
    group by TeamId, PersonId, dateadd(day, t.n-1, ar.StartDate)),
gap_cte as (
    select *, iif((lag(isnull(rn_manager, 0)) over (partition by teamid, personid order by exp_dt) <> isnull(rn_manager, 0)) 
               or (lag(isnull(rn_leader, 0)) over (partition by teamid, personid order by exp_dt) <> isnull(rn_leader, 0))
               or (lag(isnull(rn_other, 0)) over (partition by teamid, personid order by exp_dt) <> isnull(rn_other, 0)), 1, 0) as gap
    from pvt_cte),
grp_cte as (
    select *, sum(gap) over (partition by TeamId, PersonId ORDER BY exp_dt) grp
    from gap_cte)
select TeamId, PersonId,min(exp_dt) StartDate, max(exp_dt) EndDate,
       rn_Manager, rn_Leader, rn_Other
from grp_cte
group by TeamId, PersonId, rn_Manager, rn_Leader, rn_Other, grp
order by TeamId, PersonId, StartDate;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Ken McKelvey SSCoach Points: 18842 More actions · Answer 4

WITH ChangeDates
AS
(
    SELECT R.TeamId, R.PersonId, D.ChangeDate
        ,MAX(IIF(R.RoleName = 'Manager', X.StartEnd, 0)) AS Manager
        ,MAX(IIF(R.RoleName = 'Leader', X.StartEnd, 0)) AS Leader
        ,MAX(IIF(R.RoleName = 'Other', X.StartEnd, 0)) AS Other
    FROM #AssignmentRange R
        CROSS APPLY (VALUES (1),(-1)) X (StartEnd)
        CROSS APPLY
        (
            VALUES
            (
                CASE
                    WHEN X.StartEnd = 1
                    THEN StartDate
                    ELSE DATEADD(day, 1, ISNULL(EndDate, '9999'))
                END
            )
        ) D (ChangeDate)
    GROUP BY R.TeamId, R.PersonId, D.ChangeDate
)
,StartEnd
AS
(
    SELECT TeamId, PersonId
        ,ChangeDate AS StartDate
        ,DATEADD(day, -1, LEAD(ChangeDate) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate)) AS EndDate
        ,SUM(Manager) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Manager
        ,SUM(Leader) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Leader
        ,SUM(Other) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Other
    FROM ChangeDates
    /* SSQL2022 WINDOW win AS (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) */
)
SELECT TeamId, PersonId, StartDate
    ,IIF(EndDate = '9999', NULL, EndDate) AS EndDate
    ,Manager, Leader, Other
FROM StartEnd
WHERE EndDate IS NOT NULL
ORDER BY TeamId, PersonId, StartDate;

This reply was modified 1 year ago by Ken McKelvey.
This reply was modified 1 year ago by Ken McKelvey.

Ken McKelvey SSCoach Points: 18842 More actions · Answer 5

Or dynamically:

/* Should be able to get this from a role table without DISTINCT
SELECT DISTINCT RoleName
INTO #Roles
FROM #AssignmentRange;
 */

 SELECT RoleId, RoleName
 INTO #Roles
 FROM
 (
    VALUES (1, 'Manager')
        ,(2, 'Leader')
        ,(3, 'Other')
 ) V (RoleId, RoleName)

DECLARE @sql nvarchar(MAX) = N'WITH ChangeDates
AS
(
    SELECT R.TeamId, R.PersonId, D.ChangeDate
';

SELECT @sql += N'    ,MAX(IIF(R.RoleName = ''' + RoleName + N''', X.StartEnd, 0)) AS ' + RoleName + NCHAR(13) + NCHAR(10)
FROM #Roles
ORDER BY RoleId;

SET @sql += N'    FROM #AssignmentRange R
        CROSS APPLY (VALUES (1),(-1)) X (StartEnd)
        CROSS APPLY
        (
            VALUES
            (
                CASE
                    WHEN X.StartEnd = 1
                    THEN StartDate
                    ELSE DATEADD(day, 1, ISNULL(EndDate, ''9999''))
                END
            )
        ) D (ChangeDate)
    GROUP BY R.TeamId, R.PersonId, D.ChangeDate
)
,StartEnd
AS
(
    SELECT TeamId, PersonId
        ,ChangeDate AS StartDate
        ,DATEADD(day, -1, LEAD(ChangeDate) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate)) AS EndDate
';
SELECT @sql += N'        ,SUM(' + RoleName +N') OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS ' + RoleName + NCHAR(13) + NCHAR(10)
FROM #Roles
ORDER BY RoleId;

SET @sql += N'    FROM ChangeDates
    /* SSQL2022 WINDOW win AS (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) */
)
SELECT TeamId, PersonId, StartDate
    ,IIF(EndDate = ''9999'', NULL, EndDate) AS EndDate
    ';

SELECT @sql += N',' + RoleName
FROM #Roles
ORDER BY RoleId;

SET @sql += NCHAR(13) + NCHAR(10) + N'FROM StartEnd
WHERE EndDate IS NOT NULL
ORDER BY TeamId, PersonId, StartDate;
'
--print @sql

EXEC sp_executesql @sql;

Alfred SSC-Addicted Points: 436 More actions · Answer 6

Steve Collins wrote:

The top two CTEs could be replaced with only one. Possibly the bottom two could be combined as well using the "smudge" method?

Thank you for your input Steve. Your solution is way prettier than mine but the performance issue is the same, and Im guessing it's due to the same technique with exapnding all the intervals.

Alfred SSC-Addicted Points: 436 More actions · Answer 7

Thank you Ken for you input. Your solution doesn't expand the intervals and that makes it really fast. But it doesn't produce correct result. In this scenario the final row has "1" for more than that the one role that is active from 2021-07-01

INSERT INTO #AssignmentRange
SELECT 1, 1, 'Manager', '2006-05-24', '2018-10-23' UNION
SELECT 1, 1, 'Leader', '2018-10-24', '2021-06-30' UNION
SELECT 1, 1, 'Other', '2015-01-12', '2022-06-30' UNION
SELECT 1, 1, 'Role2', '2018-10-24', '2021-06-30' UNION
SELECT 1, 1, 'Role3', '2006-05-24', '2021-06-30'

/*
TeamId     PersonId     StartDate     EndDate     Manager     Leader     Other     Role2     Role3
1     1     2006-05-24     2015-01-11     1     0     0     0     1
1     1     2015-01-12     2018-10-23     1     0     1     0     1
1     1     2018-10-24     2021-06-30     1     1     1     1     1
1     1     2021-07-01     2022-06-30     1     1     1     1     1
*/

It should be

/*
StartDate    EndDate       Manager Leader Other Role2 Role3
2006-05-24   2015-01-11    1       0      0     0     1
2015-01-12   2018-10-23    1       0      1     0     1
2018-10-24   2021-06-30    0       1      1     1     1
2021-07-01   2022-06-30    0       0      1     0     0
*/

This reply was modified 1 year ago by Alfred.
This reply was modified 1 year ago by Alfred.

Alfred SSC-Addicted Points: 436 More actions · Answer 8

Now it looks like it's correct for all rows in my data set. I've been working on a new solution on my own but didn't get correct result when the range was just one day, but your solution handles that perfectly.

Thank you so much Ken!