April 26, 2021 at 5:30 pm
Hello,
I have the following table, where ID is the unique identifier. An can move from category to category, both up and down. My table records each day an ID stays in a given category. I am trying to identify the start date and the end date of an ID in a given category. The problem is that an ID can move up a category, and move back down to its original category after a certain number of days. Here is my table as an example with only 1 ID:
ID Category Date
1 1 2021-01-01
1 1 2021-01-02
...
1 1 2021-01-24
1 2 2021-01-25
...
1 2 2021-02-15
1 1 2021-02-16
...
1 1 2021-04-20
1 2 2021-04-21
When I try to get the MIN(DATE) and MAX(DATE) and group by the category and ID, it shows me that the account was in Category 1 from 2021-01-01 to 2021-04-20, and in Category 2 from 02-25 to 04-21. I am trying to track the movements of the file in each bucket step by step, meaning in my ideal result, the movements of the account will be tracked as:
ID Category StartDate EndDate
1 1 2021-01-01 2021-01-24
1 2 2021-01-25 2021-02-15
1 1 2021-02-16 2021-04-20
1 2 2021-04-21 NULL (or GETDATE())
How can I achieve this result? Any help would be appreciated. I tried using the RANK() function but because the table records every single day, it seems useless.
April 26, 2021 at 6:31 pm
This is not particularly elegant, but it seems to get the job done.
Next time, please remember to post your sample data in consumable format, so that others can simply paste into SSMS and start coding.
DROP TABLE IF EXISTS #SomeId;
CREATE TABLE #SomeId
(
Id INT NOT NULL
,Category INT NOT NULL
,StartDate DATE NOT NULL
,
PRIMARY KEY CLUSTERED (
Id
,Category
,StartDate
)
);
INSERT #SomeId
(
Id
,Category
,StartDate
)
VALUES
(1, 1, '20210101')
,(1, 1, '20210102')
,(1, 1, '20210124')
,(1, 2, '20210125')
,(1, 2, '20210215')
,(1, 1, '20210216')
,(1, 1, '20210420')
,(1, 2, '20210521');
WITH NextDates
AS (SELECT si.Id
,si.Category
,si.StartDate
,PrevCategory = LAG(si.Category, 1) OVER (PARTITION BY si.Id ORDER BY si.StartDate)
,NextCategory = LEAD(si.Category, 1) OVER (PARTITION BY si.Id ORDER BY si.StartDate)
,NextDate = LEAD(si.StartDate, 1, GETDATE()) OVER (PARTITION BY si.Id ORDER BY si.StartDate)
FROM #SomeId si)
SELECT NextDates.Id
,NextDates.Category
,NextDates.StartDate
,EndDate = DATEADD(
DAY
,-1
,LEAD(NextDates.NextDate, 1) OVER (PARTITION BY NextDates.Id ORDER BY NextDates.StartDate)
)
FROM NextDates
WHERE (
NextDates.PrevCategory <> NextDates.NextCategory
AND NextDates.NextCategory <> NextDates.Category
)
OR NextDates.PrevCategory IS NULL
ORDER BY NextDates.Id
,NextDates.StartDate;
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 26, 2021 at 6:59 pm
Here is another take:
--==== Create some test data
Drop Table If Exists #testData;
Declare @startDate date = '2021-01-01'
, @endDate date = '2021-06-30';
With t(n)
As (
Select t.n
From (
Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
, (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
)
, iTally (num)
As (
Select Top (datediff(day, @startDate, @endDate) + 1)
checksum(row_number() over(Order By @@spid))
From t t1, t t2, t t3
)
Select ID = 1
, Category = Case When dt.StartDate Between '2021-01-25' And '2021-02-15' Then 2
When dt.StartDate Between '2021-04-21' And '2021-06-15' Then 2
Else 1
End
, dt.StartDate
Into #testData
From iTally it
Cross Apply (Values (dateadd(day, it.num - 1, @startDate))) dt(StartDate);
--==== Solution using above test data
With dateGroups
As (
Select *
, StartGroup = iif(lag(td.Category, 1, 0) over(Partition By td.ID Order By td.StartDate) <> td.Category, 1, 0)
From #testData td
)
Select dg.ID
, dg.Category
, dg.StartDate
, EndDate = dateadd(day, -1, lead(dg.StartDate) over(Partition By dg.ID Order By dg.StartDate))
From dateGroups dg
Where dg.StartGroup = 1;
The first part is to define the row that starts each group - then we simply filter by that row and calculate the end date as 1 day less than the next rows start date. To define a start row - we look at the 'next' rows Category - if the next rows category changes then a new group is started.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
April 26, 2021 at 8:04 pm
WITH SeqBreaks
AS
(
SELECT ID, Category, [Date]
,CASE
WHEN Category = LAG(Category) OVER (PARTITION BY ID ORDER BY [Date])
THEN 0
ELSE 1
END AS SeqBreak
FROM YourTable
)
,Grps
AS
(
SELECT ID, Category, [Date]
,SUM(SeqBreak) OVER (PARTITION BY ID ORDER BY [Date]) AS Grp
,LEAD(Category) OVER (PARTITION BY ID ORDER BY [Date]) AS NextCategory
FROM SeqBreaks
)
SELECT ID, Category
,MIN([Date]) AS StartDate
,MAX(CASE WHEN NextCategory IS NULL THEN '9999-12-31' ELSE [Date] END) AS EndDate
FROM Grps
GROUP BY ID, Category, Grp
ORDER BY ID, StartDate
April 26, 2021 at 8:36 pm
I think this is a form of Gaps and Islands (term coined by Itzik Ben-Gan? maybe?). It's rather traditional in G&I to use the name "grp", so I stuck with that:
;WITH grps AS (
SELECT id, category, StartDate,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY StartDate) -
ROW_NUMBER() OVER(PARTITION BY id ORDER BY category, StartDate) AS grp
FROM #SomeId
)
SELECT id, category, MIN(StartDate) AS StartDate, MAX(StartDate) AS EndDate
FROM grps
GROUP BY id, category, grp
ORDER BY id, StartDate
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 27, 2021 at 3:38 pm
Here's another way to do gaps and islands
WITH C1 AS (
SELECT id, category, StartDate,
CASE WHEN LAG(category) OVER(PARTITION BY id ORDER BY StartDate) = category THEN 0 ELSE 1 END AS IsStart
FROM #SomeId
),
C2 AS (
SELECT id, category, StartDate,
SUM(IsStart) OVER(PARTITION BY id ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS grp
FROM C1
)
SELECT id, category, MIN(StartDate) AS StartDate, MAX(StartDate) AS EndDate
FROM C2
GROUP BY id, category, grp
ORDER BY id, StartDate;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply