I've obfuscated my data, but this is legitimately the way the data in my table is distributed. A "User" can have different tiers. They may belong to one for a while, get bumped up, then get bumped down. I need the start and end for each tier the user has been in. They might go in and out of the same tier over different periods of time. I need those reflected - not lumped together. I've consulted two different articles and they did not solve my problem. I managed to get this to work (after more time today than I care to admit), but would like to know if there is a better way. I've attached an image of the data and then the range start and end dates to be derived for each group.
NOTE: There are other columns in this slowly changing dimension which when changed, results in a new row for a user - that's why you see the same tier repeated over a period of time in some cases.
-- Create sample data
----------------------------------------------
DROP TABLE IF EXISTS #UserInfo
CREATE TABLE #UserInfo
(UserID INT,
UserTier VARCHAR(10),
StartDate DATE,
EndDate DATE)
INSERT INTO #UserInfo (UserID, UserTier, StartDate, EndDate)
VALUES
(5000, 'Basic', '2014-11-22', '2015-02-16'),
(5000, 'Basic', '2015-02-16', '2015-04-13'),
(5000, 'Basic', '2015-04-13', '2015-10-09'),
(5000, 'Basic', '2015-10-09', '2016-07-20'),
(5000, 'Basic', '2016-07-20', '2016-09-19'),
(5000, 'Basic', '2016-09-19', '2016-12-03'),
(5000, 'Basic', '2016-12-03', '2017-02-16'),
(5000, 'Basic', '2017-02-16', '2018-11-10'),
(5000, 'Basic', '2018-11-10', '2019-05-30'),
(5000, 'Basic', '2019-05-30', '2019-06-29'),
(5000, 'Manager', '2019-06-29', '2019-07-19'),
(5000, 'Manager', '2019-07-19', '2019-07-25'),
(5000, 'Manager', '2019-07-25', '2019-12-06'),
(5000, 'Manager', '2019-12-06', '2020-03-03'),
(5000, 'Super User', '2020-03-03', '2020-03-07'),
(5000, 'Super User', '2020-03-07', '2020-10-14'),
(5000, 'Manager', '2020-10-14', '2021-02-11'),
(5000, 'Super User', '2021-02-11', '2021-09-02'),
(5000, 'Manager', '2021-09-02', '2021-11-04'),
(5000, 'Manager', '2021-11-04', '2022-01-11'),
(5000, 'Manager', '2022-01-11', '2022-07-01'),
(5000, 'Basic', '2022-07-01', '2022-07-02'),
(5000, 'Manager', '2022-07-02', '9999-12-31')
-- Review what we have
----------------------------------------------
SELECT *
FROM #UserInfo
ORDER BY UserID, StartDate
-- Final query which works
----------------------------------------------
; WITH UserTiers AS
(SELECT UserID, UserTier, StartDate, EndDate,
NextUserTier = LEAD(UserTier) OVER (PARTITION BY UserID ORDER BY StartDate),
RowNum = ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY StartDate)
FROM #UserInfo
)
, PreliminaryInfo AS
(SELECT UserID, UserTier, StartDate, EndDate,
NextUserTier,
RowNum,
FinalUserTier = CASE WHEN UserTier = NextUserTier THEN UserTier
WHEN UserTier <> NextUserTier THEN NextUserTier
WHEN NextUserTier IS NULL THEN UserTier
ELSE NULL
END,
FinalStartDate = CASE WHEN RowNum = 1 THEN StartDate
WHEN RowNum <> 1 AND UserTier <> NextUserTier THEN EndDate
ELSE NULL
END
FROM UserTiers
WHERE RowNum = 1
OR UserTier <> NextUserTier
)
SELECT UserID,
FinalUserTier,
FinalStartDate,
FinalEndDate = ISNULL(LEAD(FinalStartDate, 1, NULL) OVER (PARTITION BY UserID ORDER BY FinalStartDate), '12/31/9999')
FROM PreliminaryInfo
Thank you!
Here's an alternative. Is it better, idk
with
gap_cte as (
select *, case when UserTier<>lag(UserTier) over (partition by UserID order by StartDate) then 1 else 0 end gap
from #UserInfo),
grp_cte as (
select *, sum(gap) over (partition by UserID order by StartDate) grp
from gap_cte)
select *,
first_value(StartDate) over (partition by UserID, UserTier, grp order by StartDate) RangeStart,
last_value(EndDate) over (partition by UserID, UserTier, grp order by EndDate
rows between unbounded preceding and unbounded following) RangeEnd
from grp_cte
order by StartDate;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
March 8, 2023 at 11:47 pm
Argh! Yes! I knew I need to group (grp_cte) but couldn't figure it out for the life of me.
Thanks!
March 9, 2023 at 2:28 pm
This produces the exact same results as your working query, but I think that the logic is simpler to follow.
WITH TierChanges AS
(
SELECT ui.UserID
, ui.UserTier
, ui.StartDate
, ui.EndDate
, CASE WHEN ui.UserTier = LAG(ui.UserTier, 1, '') OVER(PARTITION BY ui.UserID ORDER BY ui.StartDate) THEN CAST(0 AS BIT) ELSE 1 END AS TierChange
FROM #UserInfo AS ui
)
, TierGroups AS
(
SELECT tc.UserID
, tc.UserTier
, tc.StartDate
, tc.EndDate
, SUM(tc.TierChange) OVER(PARTITION BY tc.UserID ORDER BY tc.StartDate ROWS UNBOUNDED PRECEDING) AS TierGroup
FROM TierChanges AS tc
)
SELECT tg.UserID
, MIN(tg.UserTier) AS UserTier
, MIN(tg.StartDate) AS StartDate
, MAX(tg.EndDate) AS EndDate
FROM TierGroups AS tg
GROUP BY tg.UserID, tg.TierGroup;
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
March 9, 2023 at 3:47 pm
Yes, I like that Drew! Thank you!
March 10, 2023 at 4:25 am
This was removed by the editor as SPAM
March 10, 2023 at 8:29 pm
Here is another way!!
-- Create a grouping column based on the difference between ID and row number within each usertier
;WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY StartDate) AS ID,
ROW_NUMBER() OVER (PARTITION BY UserTier ORDER BY StartDate) AS RNow
FROM #UserInfo
),
GroupRows AS (
SELECT *,
ID - RNow AS Grp
FROM CTE
)
-- Group by usertier and grouping column, and assign a group number using dense rank
SELECT
UserTier,
MIN(StartDate) AS StartDate,
MAX(EndDate) AS EndDate,
DENSE_RANK() OVER (ORDER BY MIN(ID)) AS GroupNo
FROM Grouprows
GROUP BY UserTier, Grp;
=======================================================================
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply