Pivot with grouping internvals

  • Hi,

    Im facing some performance issues with a query im wokring on and really don't know how to work around.

    The tabele contains assignments (role) per person within a team. A person can exist in multiple teams but within each team+person+role the assignment rows will never overlap. What I need to accomplish is a way to combine a persons all roles into columns and then contruct a new date interval for all rows valid for a common date range:

    /*
    From this
    TeamId PersonId RoleName StartDate EndDate
    1 1 Manager 2020-01-01 null
    1 1 Other 2020-06-01 2023-04-30
    1 1 Leader 2021-01-01 2023-06-30
    2 1 Leader 2021-06-01 2021-12-31

    To this
    TeamId PersonId StartDate EndDate Manager Leader Other
    1 1 2020-01-01 2020-05-31 1 null null
    1 1 2020-06-01 2020-12-31 1 null 1
    1 1 2021-01-01 2023-04-30 1 1 1
    1 1 2023-05-01 2023-06-30 1 1 null
    1 1 2023-07-01 2024-12-31 1 null null
    2 1 2021-06-01 2021-12-31 null 1 1
    */

    Below is the query im using now and it work perfectly, but when I run it for the entire dataset it gets too heavy. The number of roles are fixed to it's no problem to hard code these. It's just that is't about 25 roles in totalt and one person can have up to 10 different roles over time, and the dates span from early 2000 to and forward.

    And since i have to expand each person role range into days i get a lot of rows before it can group it in the pivot. I've tried do it in a loop, working with one person at a time but it still takes forever to complete.

    Is there a smarter way to achive this than how Im doing now?

    DROP TABLE IF EXISTS #AssignmentRange
    CREATE TABLE #AssignmentRange(
    TeamId int
    , PersonId int
    , RoleName varchar(10)
    , StartDate date
    , EndDate date
    )


    INSERT INTO #AssignmentRange
    SELECT 1, 1, 'Manager', '2020-01-01', NULL UNION
    SELECT 1, 1, 'Leader', '2021-01-01', '2023-06-30' UNION
    SELECT 1, 1, 'Other', '2020-06-01', '2023-04-30' UNION
    SELECT 2, 1, 'Leader', '2021-06-01', '2021-12-31'


    DROP TABLE IF EXISTS #dates
    CREATE TABLE #dates(
    [Date] date NOT NULL PRIMARY KEY
    )

    /*
    Generate all dates between 1990-01-01 and end of next year
    */
    INSERT INTO #dates
    SELECT
    DATEADD(DAY, [value], '1990-01-01')
    FROM
    GENERATE_SERIES(
    0
    , DATEDIFF(DAY,'1990-01-01', DATEFROMPARTS(YEAR(GETDATE()) +1, 12, 31))
    , 1)


    SELECT
    TeamId
    , PersonId
    , MIN([Date]) AS StartDate
    , MAX([Date]) AS EndDate
    , [Manager]
    , [Leader]
    , [Other]
    FROM(
    SELECT
    *
    ,COUNT(NextJsonOBJ) OVER(PARTITION BY TeamId, PersonId ORDER BY [Date]) AS Groups
    FROM(
    SELECT
    *
    , CASE
    WHEN LAG(JsonOBJ, 1, NULL) OVER(PARTITION BY TeamId, PersonId ORDER BY [Date]) IS NULL
    THEN 1
    WHEN LAG(JsonOBJ, 1, NULL) OVER(PARTITION BY TeamId, PersonId ORDER BY [Date]) != JsonOBJ
    THEN 1
    ELSE NULL
    END AS NextJsonOBJ
    FROM(
    SELECT
    *
    ,(
    SELECT
    TeamId
    , PersonId
    , [Manager]
    , [Leader]
    , [Other]
    FOR JSON PATH
    , WITHOUT_ARRAY_WRAPPER
    ) AS JsonOBJ
    FROM(
    SELECT
    ar.TeamId
    , ar.PersonId
    , d.[Date]
    , ar.RoleName
    , 1 AS Dummy
    FROM
    #AssignmentRange AS ar
    INNER JOIN
    #dates AS d
    ON d.[Date] BETWEEN
    ar.StartDate
    AND COALESCE(ar.EndDate, DATEFROMPARTS(YEAR(GETDATE()) +1, 12, 31))
    ) AS n
    PIVOT(
    MAX(Dummy)
    FOR RoleName IN (
    [Manager]
    , [Leader]
    , [Other]
    )
    ) AS pvt
    ) AS n
    ) AS x
    ) AS u
    GROUP BY
    TeamId
    , PersonId
    , [Manager]
    , [Leader]
    , [Other]
    , Groups
    ORDER BY
    PersonId, TeamId, StartDate

    Regards,

    Alfred

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Afaik this returns the same output.  The CTEs expand the date intervals, pivot by role name and summarize by day, identify gaps in GROUP BY columns, and create groups using SUM OVER the gaps.  The final SELECT summarizes using the 'grp' column in the GROUP BY.  To improve performance an indexing strategy, possibly using temp table(s) could be beneficial.    Performance results depend on the unique scenario.  Also, it seems possible this code could be further optimized for performance by factoring out certain CTE(s)

    declare @end_of_next_yr date=datefromparts(year(getdate())+1, 12, 31);

    with
    exp_cte as (
    select ar.*, dateadd(day, t.n-1, ar.StartDate) exp_dt
    from #AssignmentRange ar
    cross apply (select top (datediff(day, ar.StartDate, isnull(EndDate, @end_of_next_yr))+1)
    row_number() over (order by (select null))
    from sys.all_columns) t(n)),
    unq_cte as (
    select TeamId, PersonId, exp_dt,
    max(iif(RoleName='Manager', 1, null)) rn_Manager,
    max(iif(RoleName='Leader', 1, null)) rn_Leader,
    max(iif(RoleName='Other', 1, null)) rn_Other
    from exp_cte
    group by TeamId, PersonId, exp_dt),
    gap_cte as (
    select *, CASE WHEN (LAG(ISNULL(rn_Manager, 0)) OVER (partition by TeamId, PersonId ORDER BY exp_dt) <> ISNULL(rn_Manager, 0))
    OR (LAG(ISNULL(rn_Leader, 0)) OVER (partition by TeamId, PersonId ORDER BY exp_dt) <> ISNULL(rn_Leader, 0))
    OR (LAG(ISNULL(rn_Other, 0)) OVER (partition by TeamId, PersonId ORDER BY exp_dt) <> ISNULL(rn_Other, 0))
    THEN 1 ELSE 0 END AS gap
    from unq_cte),
    grp_cte as (
    select *, sum(gap) over (partition by TeamId, PersonId ORDER BY exp_dt) grp
    from gap_cte)
    select TeamId, PersonId,min(exp_dt) StartDate, max(exp_dt) EndDate,
    rn_Manager, rn_Leader, rn_Other
    from grp_cte
    group by TeamId, PersonId, rn_Manager, rn_Leader, rn_Other, grp
    order by TeamId, PersonId, StartDate;

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • The top two CTEs could be replaced with only one.  Possibly the bottom two could be combined as well using the "smudge" method?

    with 
    pvt_cte as (
    select TeamId, PersonId, dateadd(day, t.n-1, ar.StartDate) exp_dt,
    max(iif(RoleName='Manager', 1, null)) rn_Manager,
    max(iif(RoleName='Leader', 1, null)) rn_Leader,
    max(iif(RoleName='Other', 1, null)) rn_Other
    from #AssignmentRange ar
    cross apply (select top (datediff(day, ar.StartDate, isnull(EndDate, @end_of_next_yr))+1)
    row_number() over (order by (select null))
    from sys.all_columns) t(n)
    group by TeamId, PersonId, dateadd(day, t.n-1, ar.StartDate)),
    gap_cte as (
    select *, iif((lag(isnull(rn_manager, 0)) over (partition by teamid, personid order by exp_dt) <> isnull(rn_manager, 0))
    or (lag(isnull(rn_leader, 0)) over (partition by teamid, personid order by exp_dt) <> isnull(rn_leader, 0))
    or (lag(isnull(rn_other, 0)) over (partition by teamid, personid order by exp_dt) <> isnull(rn_other, 0)), 1, 0) as gap
    from pvt_cte),
    grp_cte as (
    select *, sum(gap) over (partition by TeamId, PersonId ORDER BY exp_dt) grp
    from gap_cte)
    select TeamId, PersonId,min(exp_dt) StartDate, max(exp_dt) EndDate,
    rn_Manager, rn_Leader, rn_Other
    from grp_cte
    group by TeamId, PersonId, rn_Manager, rn_Leader, rn_Other, grp
    order by TeamId, PersonId, StartDate;

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • WITH ChangeDates
    AS
    (
    SELECT R.TeamId, R.PersonId, D.ChangeDate
    ,MAX(IIF(R.RoleName = 'Manager', X.StartEnd, 0)) AS Manager
    ,MAX(IIF(R.RoleName = 'Leader', X.StartEnd, 0)) AS Leader
    ,MAX(IIF(R.RoleName = 'Other', X.StartEnd, 0)) AS Other
    FROM #AssignmentRange R
    CROSS APPLY (VALUES (1),(-1)) X (StartEnd)
    CROSS APPLY
    (
    VALUES
    (
    CASE
    WHEN X.StartEnd = 1
    THEN StartDate
    ELSE DATEADD(day, 1, ISNULL(EndDate, '9999'))
    END
    )
    ) D (ChangeDate)
    GROUP BY R.TeamId, R.PersonId, D.ChangeDate
    )
    ,StartEnd
    AS
    (
    SELECT TeamId, PersonId
    ,ChangeDate AS StartDate
    ,DATEADD(day, -1, LEAD(ChangeDate) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate)) AS EndDate
    ,SUM(Manager) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Manager
    ,SUM(Leader) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Leader
    ,SUM(Other) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Other
    FROM ChangeDates
    /* SSQL2022 WINDOW win AS (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) */
    )
    SELECT TeamId, PersonId, StartDate
    ,IIF(EndDate = '9999', NULL, EndDate) AS EndDate
    ,Manager, Leader, Other
    FROM StartEnd
    WHERE EndDate IS NOT NULL
    ORDER BY TeamId, PersonId, StartDate;
  • Or dynamically:

    /* Should be able to get this from a role table without DISTINCT
    SELECT DISTINCT RoleName
    INTO #Roles
    FROM #AssignmentRange;
    */

    SELECT RoleId, RoleName
    INTO #Roles
    FROM
    (
    VALUES (1, 'Manager')
    ,(2, 'Leader')
    ,(3, 'Other')
    ) V (RoleId, RoleName)

    DECLARE @sql nvarchar(MAX) = N'WITH ChangeDates
    AS
    (
    SELECT R.TeamId, R.PersonId, D.ChangeDate
    ';

    SELECT @sql += N' ,MAX(IIF(R.RoleName = ''' + RoleName + N''', X.StartEnd, 0)) AS ' + RoleName + NCHAR(13) + NCHAR(10)
    FROM #Roles
    ORDER BY RoleId;

    SET @sql += N' FROM #AssignmentRange R
    CROSS APPLY (VALUES (1),(-1)) X (StartEnd)
    CROSS APPLY
    (
    VALUES
    (
    CASE
    WHEN X.StartEnd = 1
    THEN StartDate
    ELSE DATEADD(day, 1, ISNULL(EndDate, ''9999''))
    END
    )
    ) D (ChangeDate)
    GROUP BY R.TeamId, R.PersonId, D.ChangeDate
    )
    ,StartEnd
    AS
    (
    SELECT TeamId, PersonId
    ,ChangeDate AS StartDate
    ,DATEADD(day, -1, LEAD(ChangeDate) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate)) AS EndDate
    ';
    SELECT @sql += N' ,SUM(' + RoleName +N') OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS ' + RoleName + NCHAR(13) + NCHAR(10)
    FROM #Roles
    ORDER BY RoleId;

    SET @sql += N' FROM ChangeDates
    /* SSQL2022 WINDOW win AS (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) */
    )
    SELECT TeamId, PersonId, StartDate
    ,IIF(EndDate = ''9999'', NULL, EndDate) AS EndDate
    ';

    SELECT @sql += N',' + RoleName
    FROM #Roles
    ORDER BY RoleId;

    SET @sql += NCHAR(13) + NCHAR(10) + N'FROM StartEnd
    WHERE EndDate IS NOT NULL
    ORDER BY TeamId, PersonId, StartDate;
    '
    --print @sql

    EXEC sp_executesql @sql;
  • Steve Collins wrote:

    The top two CTEs could be replaced with only one.  Possibly the bottom two could be combined as well using the "smudge" method?

    Thank you for your input Steve. Your solution is way prettier than mine but the performance issue is the same, and Im guessing it's due to the same technique with exapnding all the intervals.

  • Thank you Ken for you input. Your solution doesn't expand the intervals and that makes it really fast. But it doesn't produce correct result. In this scenario the final row has "1" for more than that the one role that is active from 2021-07-01

     

    INSERT INTO #AssignmentRange
    SELECT 1, 1, 'Manager', '2006-05-24', '2018-10-23' UNION
    SELECT 1, 1, 'Leader', '2018-10-24', '2021-06-30' UNION
    SELECT 1, 1, 'Other', '2015-01-12', '2022-06-30' UNION
    SELECT 1, 1, 'Role2', '2018-10-24', '2021-06-30' UNION
    SELECT 1, 1, 'Role3', '2006-05-24', '2021-06-30'

    /*
    TeamId PersonId StartDate EndDate Manager Leader Other Role2 Role3
    1 1 2006-05-24 2015-01-11 1 0 0 0 1
    1 1 2015-01-12 2018-10-23 1 0 1 0 1
    1 1 2018-10-24 2021-06-30 1 1 1 1 1
    1 1 2021-07-01 2022-06-30 1 1 1 1 1
    */

    It should be

    /*
    StartDate EndDate Manager Leader Other Role2 Role3
    2006-05-24 2015-01-11 1 0 0 0 1
    2015-01-12 2018-10-23 1 0 1 0 1
    2018-10-24 2021-06-30 0 1 1 1 1
    2021-07-01 2022-06-30 0 0 1 0 0
    */

    • This reply was modified 1 year ago by  Alfred.
    • This reply was modified 1 year ago by  Alfred.
  • WITH ChangeDates
    AS
    (
    SELECT R.TeamId, R.PersonId, D.ChangeDate
    ,MAX(CASE WHEN R.RoleName = 'Manager' THEN X.StartEnd END) AS Manager
    ,MAX(CASE WHEN R.RoleName = 'Leader' THEN X.StartEnd END) AS Leader
    ,MAX(CASE WHEN R.RoleName = 'Other' THEN X.StartEnd END) AS Other
    ,MAX(CASE WHEN R.RoleName = 'Role2' THEN X.StartEnd END) AS Role2
    ,MAX(CASE WHEN R.RoleName = 'Role3' THEN X.StartEnd END) AS Role3
    FROM #AssignmentRange R
    CROSS APPLY (VALUES (1),(-1)) X (StartEnd)
    CROSS APPLY
    (
    VALUES
    (
    CASE
    WHEN X.StartEnd = 1
    THEN StartDate
    ELSE DATEADD(day, 1, ISNULL(EndDate, '9999'))
    END
    )
    ) D (ChangeDate)
    GROUP BY R.TeamId, R.PersonId, D.ChangeDate
    )
    ,StartEnd
    AS
    (
    SELECT TeamId, PersonId
    ,ChangeDate AS StartDate
    ,DATEADD(day, -1, LEAD(ChangeDate) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate)) AS EndDate
    ,SUM(Manager) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Manager
    ,SUM(Leader) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Leader
    ,SUM(Other) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Other
    ,SUM(Role2) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Role2
    ,SUM(Role3) OVER (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) AS Role3
    FROM ChangeDates
    /* SSQL2022 WINDOW win AS (PARTITION BY TeamId, PersonId ORDER BY ChangeDate) */
    )
    SELECT TeamId, PersonId, StartDate
    ,IIF(EndDate = '9999', NULL, EndDate) AS EndDate
    ,ISNULL(Manager, 0) AS Manager
    ,ISNULL(Leader, 0) AS Leader
    ,ISNULL(Other, 0) AS Other
    ,ISNULL(Role2, 0) AS Role2
    ,ISNULL(Role3, 0) AS Role2
    FROM StartEnd
    WHERE EndDate IS NOT NULL
    ORDER BY TeamId, PersonId, StartDate;
  • Now it looks like it's correct for all rows in my data set. I've been working on a new solution on my own but didn't get correct result when the range was just one day, but your solution handles that perfectly.

    Thank you so much Ken!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply