Category Entry and Exit Dates per ID

  • Hello,

    I have the following table, where ID is the unique identifier. An can move from category to category, both up and down. My table records each day an ID stays in a given category. I am trying to identify the start date and the end date of an ID in a given category. The problem is that an ID can move up a category, and move back down to its original category after a certain number of days. Here is my table as an example with only 1 ID:

    ID    Category     Date

    1            1              2021-01-01

    1            1              2021-01-02

    ...

    1           1               2021-01-24

    1           2              2021-01-25

    ...

    1          2               2021-02-15

    1          1               2021-02-16

    ...

    1         1                 2021-04-20

    1          2               2021-04-21

    When I try to get the MIN(DATE) and MAX(DATE) and group by the category and ID, it shows me that the account was in Category 1 from 2021-01-01 to 2021-04-20, and in Category 2 from 02-25 to 04-21. I am trying to track the movements of the file in each bucket step by step, meaning in my ideal result, the movements of the account will be tracked as:

    ID        Category      StartDate        EndDate

    1                1              2021-01-01        2021-01-24

    1               2             2021-01-25          2021-02-15

    1               1             2021-02-16          2021-04-20

    1               2            2021-04-21                NULL (or GETDATE())

     

    How can I achieve this result? Any help would be appreciated. I tried using the RANK() function but because the table records every single day, it seems useless.

  • This is not particularly elegant, but it seems to get the job done.

    Next time, please remember to post your sample data in consumable format, so that others can simply paste into SSMS and start coding.

    DROP TABLE IF EXISTS #SomeId;

    CREATE TABLE #SomeId
    (
    Id INT NOT NULL
    ,Category INT NOT NULL
    ,StartDate DATE NOT NULL
    ,
    PRIMARY KEY CLUSTERED (
    Id
    ,Category
    ,StartDate
    )
    );

    INSERT #SomeId
    (
    Id
    ,Category
    ,StartDate
    )
    VALUES
    (1, 1, '20210101')
    ,(1, 1, '20210102')
    ,(1, 1, '20210124')
    ,(1, 2, '20210125')
    ,(1, 2, '20210215')
    ,(1, 1, '20210216')
    ,(1, 1, '20210420')
    ,(1, 2, '20210521');

    WITH NextDates
    AS (SELECT si.Id
    ,si.Category
    ,si.StartDate
    ,PrevCategory = LAG(si.Category, 1) OVER (PARTITION BY si.Id ORDER BY si.StartDate)
    ,NextCategory = LEAD(si.Category, 1) OVER (PARTITION BY si.Id ORDER BY si.StartDate)
    ,NextDate = LEAD(si.StartDate, 1, GETDATE()) OVER (PARTITION BY si.Id ORDER BY si.StartDate)
    FROM #SomeId si)
    SELECT NextDates.Id
    ,NextDates.Category
    ,NextDates.StartDate
    ,EndDate = DATEADD(
    DAY
    ,-1
    ,LEAD(NextDates.NextDate, 1) OVER (PARTITION BY NextDates.Id ORDER BY NextDates.StartDate)
    )
    FROM NextDates
    WHERE (
    NextDates.PrevCategory <> NextDates.NextCategory
    AND NextDates.NextCategory <> NextDates.Category
    )
    OR NextDates.PrevCategory IS NULL
    ORDER BY NextDates.Id
    ,NextDates.StartDate;

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Here is another take:

     --==== Create some test data
    Drop Table If Exists #testData;

    Declare @startDate date = '2021-01-01'
    , @endDate date = '2021-06-30';

    With t(n)
    As (
    Select t.n
    From (
    Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
    , (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
    )
    , iTally (num)
    As (
    Select Top (datediff(day, @startDate, @endDate) + 1)
    checksum(row_number() over(Order By @@spid))
    From t t1, t t2, t t3
    )
    Select ID = 1
    , Category = Case When dt.StartDate Between '2021-01-25' And '2021-02-15' Then 2
    When dt.StartDate Between '2021-04-21' And '2021-06-15' Then 2
    Else 1
    End
    , dt.StartDate
    Into #testData
    From iTally it
    Cross Apply (Values (dateadd(day, it.num - 1, @startDate))) dt(StartDate);

    --==== Solution using above test data
    With dateGroups
    As (
    Select *
    , StartGroup = iif(lag(td.Category, 1, 0) over(Partition By td.ID Order By td.StartDate) <> td.Category, 1, 0)
    From #testData td
    )
    Select dg.ID
    , dg.Category
    , dg.StartDate
    , EndDate = dateadd(day, -1, lead(dg.StartDate) over(Partition By dg.ID Order By dg.StartDate))
    From dateGroups dg
    Where dg.StartGroup = 1;

    The first part is to define the row that starts each group - then we simply filter by that row and calculate the end date as 1 day less than the next rows start date.  To define a start row - we look at the 'next' rows Category - if the next rows category changes then a new group is started.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • WITH SeqBreaks
    AS
    (
    SELECT ID, Category, [Date]
    ,CASE
    WHEN Category = LAG(Category) OVER (PARTITION BY ID ORDER BY [Date])
    THEN 0
    ELSE 1
    END AS SeqBreak
    FROM YourTable
    )
    ,Grps
    AS
    (
    SELECT ID, Category, [Date]
    ,SUM(SeqBreak) OVER (PARTITION BY ID ORDER BY [Date]) AS Grp
    ,LEAD(Category) OVER (PARTITION BY ID ORDER BY [Date]) AS NextCategory
    FROM SeqBreaks
    )
    SELECT ID, Category
    ,MIN([Date]) AS StartDate
    ,MAX(CASE WHEN NextCategory IS NULL THEN '9999-12-31' ELSE [Date] END) AS EndDate
    FROM Grps
    GROUP BY ID, Category, Grp
    ORDER BY ID, StartDate

    • This reply was modified 3 years, 7 months ago by  Ken McKelvey.
  • I think this is a form of Gaps and Islands (term coined by Itzik Ben-Gan?  maybe?).  It's rather traditional in G&I to use the name "grp", so I stuck with that:

    ;WITH grps AS (
    SELECT id, category, StartDate,
    ROW_NUMBER() OVER(PARTITION BY id ORDER BY StartDate) -
    ROW_NUMBER() OVER(PARTITION BY id ORDER BY category, StartDate) AS grp
    FROM #SomeId
    )
    SELECT id, category, MIN(StartDate) AS StartDate, MAX(StartDate) AS EndDate
    FROM grps
    GROUP BY id, category, grp
    ORDER BY id, StartDate

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Here's another way to do gaps and islands

    WITH C1 AS (
    SELECT id, category, StartDate,
    CASE WHEN LAG(category) OVER(PARTITION BY id ORDER BY StartDate) = category THEN 0 ELSE 1 END AS IsStart
    FROM #SomeId
    ),
    C2 AS (
    SELECT id, category, StartDate,
    SUM(IsStart) OVER(PARTITION BY id ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS grp
    FROM C1
    )
    SELECT id, category, MIN(StartDate) AS StartDate, MAX(StartDate) AS EndDate
    FROM C2
    GROUP BY id, category, grp
    ORDER BY id, StartDate;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply