Islands (no gaps) - Problem solved but could this be done better?

  • I've obfuscated my data, but this is legitimately the way the data in my table is distributed.  A "User" can have different tiers.  They may belong to one for a while, get bumped up, then get bumped down.  I need the start and end for each tier the user has been in.  They might go in and out of the same tier over different periods of time.  I need those reflected - not lumped together.  I've consulted two different articles and they did not solve my problem.  I managed to get this to work (after more time today than I care to admit), but would like to know if there is a better way.  I've attached an image of the data and then the range start and end dates to be derived for each group.

    NOTE: There are other columns in this slowly changing dimension which when changed, results in a new row for a user - that's why you see the same tier repeated over a period of time in some cases.

    -- Create sample data
    ----------------------------------------------
    DROP TABLE IF EXISTS #UserInfo
    CREATE TABLE #UserInfo
    (UserID INT,
    UserTier VARCHAR(10),
    StartDate DATE,
    EndDate DATE)

    INSERT INTO #UserInfo (UserID, UserTier, StartDate, EndDate)
    VALUES
    (5000, 'Basic', '2014-11-22', '2015-02-16'),
    (5000, 'Basic', '2015-02-16', '2015-04-13'),
    (5000, 'Basic', '2015-04-13', '2015-10-09'),
    (5000, 'Basic', '2015-10-09', '2016-07-20'),
    (5000, 'Basic', '2016-07-20', '2016-09-19'),
    (5000, 'Basic', '2016-09-19', '2016-12-03'),
    (5000, 'Basic', '2016-12-03', '2017-02-16'),
    (5000, 'Basic', '2017-02-16', '2018-11-10'),
    (5000, 'Basic', '2018-11-10', '2019-05-30'),
    (5000, 'Basic', '2019-05-30', '2019-06-29'),
    (5000, 'Manager', '2019-06-29', '2019-07-19'),
    (5000, 'Manager', '2019-07-19', '2019-07-25'),
    (5000, 'Manager', '2019-07-25', '2019-12-06'),
    (5000, 'Manager', '2019-12-06', '2020-03-03'),
    (5000, 'Super User', '2020-03-03', '2020-03-07'),
    (5000, 'Super User', '2020-03-07', '2020-10-14'),
    (5000, 'Manager', '2020-10-14', '2021-02-11'),
    (5000, 'Super User', '2021-02-11', '2021-09-02'),
    (5000, 'Manager', '2021-09-02', '2021-11-04'),
    (5000, 'Manager', '2021-11-04', '2022-01-11'),
    (5000, 'Manager', '2022-01-11', '2022-07-01'),
    (5000, 'Basic', '2022-07-01', '2022-07-02'),
    (5000, 'Manager', '2022-07-02', '9999-12-31')

    -- Review what we have
    ----------------------------------------------
    SELECT *
    FROM #UserInfo
    ORDER BY UserID, StartDate

    -- Final query which works
    ----------------------------------------------
    ; WITH UserTiers AS
    (SELECT UserID, UserTier, StartDate, EndDate,
    NextUserTier = LEAD(UserTier) OVER (PARTITION BY UserID ORDER BY StartDate),
    RowNum = ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY StartDate)
    FROM #UserInfo
    )

    , PreliminaryInfo AS
    (SELECT UserID, UserTier, StartDate, EndDate,
    NextUserTier,
    RowNum,
    FinalUserTier = CASE WHEN UserTier = NextUserTier THEN UserTier
    WHEN UserTier <> NextUserTier THEN NextUserTier
    WHEN NextUserTier IS NULL THEN UserTier
    ELSE NULL
    END,
    FinalStartDate = CASE WHEN RowNum = 1 THEN StartDate
    WHEN RowNum <> 1 AND UserTier <> NextUserTier THEN EndDate
    ELSE NULL
    END
    FROM UserTiers
    WHERE RowNum = 1
    OR UserTier <> NextUserTier
    )

    SELECT UserID,
    FinalUserTier,
    FinalStartDate,
    FinalEndDate = ISNULL(LEAD(FinalStartDate, 1, NULL) OVER (PARTITION BY UserID ORDER BY FinalStartDate), '12/31/9999')
    FROM PreliminaryInfo

    Thank you!

     

    • This topic was modified 1 year, 8 months ago by  SoCal_DBD.
    • This topic was modified 1 year, 8 months ago by  SoCal_DBD.
    • This topic was modified 1 year, 8 months ago by  SoCal_DBD.
    Attachments:
    You must be logged in to view attached files.
  • Here's an alternative.  Is it better, idk

    with 
    gap_cte as (
    select *, case when UserTier<>lag(UserTier) over (partition by UserID order by StartDate) then 1 else 0 end gap
    from #UserInfo),
    grp_cte as (
    select *, sum(gap) over (partition by UserID order by StartDate) grp
    from gap_cte)
    select *,
    first_value(StartDate) over (partition by UserID, UserTier, grp order by StartDate) RangeStart,
    last_value(EndDate) over (partition by UserID, UserTier, grp order by EndDate
    rows between unbounded preceding and unbounded following) RangeEnd
    from grp_cte
    order by StartDate;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Argh!  Yes!  I knew I need to group (grp_cte) but couldn't figure it out for the life of me.

    Thanks!

  • This produces the exact same results as your working query, but I think that the logic is simpler to follow.

    WITH TierChanges AS
    (
    SELECT ui.UserID
    , ui.UserTier
    , ui.StartDate
    , ui.EndDate
    , CASE WHEN ui.UserTier = LAG(ui.UserTier, 1, '') OVER(PARTITION BY ui.UserID ORDER BY ui.StartDate) THEN CAST(0 AS BIT) ELSE 1 END AS TierChange
    FROM #UserInfo AS ui
    )
    , TierGroups AS
    (
    SELECT tc.UserID
    , tc.UserTier
    , tc.StartDate
    , tc.EndDate
    , SUM(tc.TierChange) OVER(PARTITION BY tc.UserID ORDER BY tc.StartDate ROWS UNBOUNDED PRECEDING) AS TierGroup
    FROM TierChanges AS tc
    )
    SELECT tg.UserID
    , MIN(tg.UserTier) AS UserTier
    , MIN(tg.StartDate) AS StartDate
    , MAX(tg.EndDate) AS EndDate
    FROM TierGroups AS tg
    GROUP BY tg.UserID, tg.TierGroup;

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Yes, I like that Drew!  Thank you!

  • This was removed by the editor as SPAM

  • Here is another way!!

    -- Create a grouping column based on the difference between ID and row number within each usertier
    ;WITH CTE AS (
    SELECT *,
    ROW_NUMBER() OVER (ORDER BY StartDate) AS ID,
    ROW_NUMBER() OVER (PARTITION BY UserTier ORDER BY StartDate) AS RNow
    FROM #UserInfo
    ),
    GroupRows AS (
    SELECT *,
    ID - RNow AS Grp
    FROM CTE
    )
    -- Group by usertier and grouping column, and assign a group number using dense rank
    SELECT
    UserTier,
    MIN(StartDate) AS StartDate,
    MAX(EndDate) AS EndDate,
    DENSE_RANK() OVER (ORDER BY MIN(ID)) AS GroupNo
    FROM Grouprows
    GROUP BY UserTier, Grp;

    =======================================================================

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply