Islands (no gaps) - Problem solved but could this be done better?

Question

Post reply

Islands (no gaps) - Problem solved but could this be done better?

SoCal_DBD

Hall of Fame

Points: 3390

March 8, 2023 at 10:37 pm

Go to Answer

#4158354

I've obfuscated my data, but this is legitimately the way the data in my table is distributed. A "User" can have different tiers. They may belong to one for a while, get bumped up, then get bumped down. I need the start and end for each tier the user has been in. They might go in and out of the same tier over different periods of time. I need those reflected - not lumped together. I've consulted two different articles and they did not solve my problem. I managed to get this to work (after more time today than I care to admit), but would like to know if there is a better way. I've attached an image of the data and then the range start and end dates to be derived for each group.

NOTE: There are other columns in this slowly changing dimension which when changed, results in a new row for a user - that's why you see the same tier repeated over a period of time in some cases.

-- Create sample data
----------------------------------------------
DROP TABLE IF EXISTS #UserInfo
CREATE TABLE #UserInfo
    (UserID INT, 
     UserTier VARCHAR(10), 
     StartDate DATE, 
     EndDate DATE)

INSERT INTO #UserInfo (UserID, UserTier, StartDate, EndDate)
VALUES 
(5000, 'Basic', '2014-11-22', '2015-02-16'),
(5000, 'Basic', '2015-02-16', '2015-04-13'),
(5000, 'Basic', '2015-04-13', '2015-10-09'),
(5000, 'Basic', '2015-10-09', '2016-07-20'),
(5000, 'Basic', '2016-07-20', '2016-09-19'),
(5000, 'Basic', '2016-09-19', '2016-12-03'),
(5000, 'Basic', '2016-12-03', '2017-02-16'),
(5000, 'Basic', '2017-02-16', '2018-11-10'),
(5000, 'Basic', '2018-11-10', '2019-05-30'),
(5000, 'Basic', '2019-05-30', '2019-06-29'),
(5000, 'Manager', '2019-06-29', '2019-07-19'),
(5000, 'Manager', '2019-07-19', '2019-07-25'),
(5000, 'Manager', '2019-07-25', '2019-12-06'),
(5000, 'Manager', '2019-12-06', '2020-03-03'),
(5000, 'Super User', '2020-03-03', '2020-03-07'),
(5000, 'Super User', '2020-03-07', '2020-10-14'),
(5000, 'Manager', '2020-10-14', '2021-02-11'),
(5000, 'Super User', '2021-02-11', '2021-09-02'),
(5000, 'Manager', '2021-09-02', '2021-11-04'),
(5000, 'Manager', '2021-11-04', '2022-01-11'),
(5000, 'Manager', '2022-01-11', '2022-07-01'),
(5000, 'Basic', '2022-07-01', '2022-07-02'),
(5000, 'Manager', '2022-07-02', '9999-12-31')

-- Review what we have
----------------------------------------------
SELECT *
FROM #UserInfo
ORDER BY UserID, StartDate

-- Final query which works
----------------------------------------------
; WITH UserTiers AS
(SELECT UserID, UserTier, StartDate, EndDate,
        NextUserTier = LEAD(UserTier) OVER (PARTITION BY UserID ORDER BY StartDate),
        RowNum = ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY StartDate)
 FROM #UserInfo
)

, PreliminaryInfo AS
(SELECT UserID, UserTier, StartDate, EndDate, 
        NextUserTier,
        RowNum,
        FinalUserTier = CASE WHEN UserTier = NextUserTier THEN UserTier
                             WHEN UserTier <> NextUserTier THEN NextUserTier
                             WHEN NextUserTier IS NULL THEN UserTier
                             ELSE NULL 
                        END,
        FinalStartDate = CASE WHEN RowNum = 1 THEN StartDate 
                              WHEN RowNum <> 1 AND UserTier <> NextUserTier THEN EndDate 
                              ELSE NULL
                         END
 FROM UserTiers
 WHERE RowNum = 1
 OR UserTier <> NextUserTier
)

SELECT UserID,
       FinalUserTier,
       FinalStartDate,
       FinalEndDate = ISNULL(LEAD(FinalStartDate, 1, NULL) OVER (PARTITION BY UserID ORDER BY FinalStartDate), '12/31/9999')
FROM PreliminaryInfo

Thank you!

This topic was modified 3 years ago by SoCal_DBD.
This topic was modified 3 years ago by SoCal_DBD.
This topic was modified 3 years ago by SoCal_DBD.

Attachments:

You must be logged in to view attached files.

Steve Collins

SSCrazy

Points: 2252

March 8, 2023 at 11:21 pm

Answer

#4158361

Here's an alternative. Is it better, idk

with 
gap_cte as (
    select *, case when UserTier<>lag(UserTier) over (partition by UserID order by StartDate) then 1 else 0 end gap
    from #UserInfo),
grp_cte as (
    select *, sum(gap) over (partition by  UserID order by StartDate) grp
    from gap_cte)
select *,
       first_value(StartDate) over (partition by UserID, UserTier, grp order by StartDate) RangeStart,
       last_value(EndDate) over (partition by UserID, UserTier, grp order by EndDate
                                 rows between unbounded preceding and unbounded following) RangeEnd
from grp_cte
order by StartDate;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

SoCal_DBD Hall of Fame Points: 3390 More actions · Answer 1

Argh! Yes! I knew I need to group (grp_cte) but couldn't figure it out for the life of me.

Thanks!

drew.allen SSC Guru Points: 77014 More actions · Answer 2

This produces the exact same results as your working query, but I think that the logic is simpler to follow.

WITH TierChanges AS
(
    SELECT ui.UserID
 , ui.UserTier
 , ui.StartDate
 , ui.EndDate
         , CASE WHEN ui.UserTier = LAG(ui.UserTier, 1, '') OVER(PARTITION BY ui.UserID ORDER BY ui.StartDate) THEN CAST(0 AS BIT) ELSE 1 END AS TierChange
    FROM #UserInfo AS ui
)
, TierGroups AS
(
    SELECT tc.UserID
     , tc.UserTier
     , tc.StartDate
     , tc.EndDate
     , SUM(tc.TierChange) OVER(PARTITION BY tc.UserID ORDER BY tc.StartDate ROWS UNBOUNDED PRECEDING) AS TierGroup
    FROM TierChanges AS tc
)
SELECT tg.UserID
 , MIN(tg.UserTier) AS UserTier
 , MIN(tg.StartDate) AS StartDate
 , MAX(tg.EndDate) AS EndDate
FROM TierGroups AS tg
GROUP BY tg.UserID, tg.TierGroup;

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

SoCal_DBD Hall of Fame Points: 3390 More actions · Answer 3

SoCal_DBD

Hall of Fame

Points: 3390

March 9, 2023 at 3:47 pm

#4158890

Yes, I like that Drew! Thank you!

Emperor100 SSCrazy Points: 2222 More actions · Answer 4

Here is another way!!

-- Create a grouping column based on the difference between ID and row number within each usertier
;WITH CTE AS (
  SELECT *, 
    ROW_NUMBER() OVER (ORDER BY StartDate) AS ID,
    ROW_NUMBER() OVER (PARTITION BY UserTier ORDER BY StartDate) AS RNow
  FROM #UserInfo
),
GroupRows AS (
  SELECT *,
    ID - RNow AS Grp
  FROM CTE
)
-- Group by usertier and grouping column, and assign a group number using dense rank
SELECT 
  UserTier,
  MIN(StartDate) AS StartDate,
  MAX(EndDate) AS EndDate,
  DENSE_RANK() OVER (ORDER BY MIN(ID)) AS GroupNo
FROM Grouprows
GROUP BY UserTier, Grp;

=======================================================================