Combine Overlapping Date Ranges Per User

  • I feel like this is a common issue but I'm having problems finding good examples on how to accomplish this task.

    This is cases where date ranges should be combined into a single start date and a single end date when date ranges overlap.

    -- Example Data
    declare @t table (Name varchar(100), Datetime_Start datetime, Datetime_End datetime);
    insert into @t values
    ('A' , '2023-01-02 00:00' , '2023-03-28 00:10'),
    ('A' , '2023-05-14 23:50' , '2023-05-29 23:50'),
    ('B' , '2023-05-18 00:00' , '2023-05-18 04:00'),
    ('B' , '2023-05-18 02:00' , '2023-05-18 03:00'),
    ('C' , '2023-01-02 00:00' , '2023-01-17 15:50'),
    ('C' , '2023-01-14 03:50' , '2023-01-28 15:50'),
    ('C' , '2023-01-25 03:50' , '2023-02-05 15:50');

    What I want in the results is:

     'A' '2023-01-02 00:00' : '2023-03-28 00:10'
    'A' '2023-05-14 23:50' : '2023-05-29 23:50'
    'B' '2023-05-18 00:00' : '2023-05-18 03:00'
    'C' '2023-01-02 00:00' : '2023-02-05 15:50'

    Not really sure how to accomplish this one.

  • This is based on code by Itzik Ben-Gan.  It uses a cross apply and a sum windowed function to increment the open interval count for start times and decrement it for end times.  It then looks for records where the previous open interval count was zero to determine interval groups, and then finds the min and max (start and end) times for those interval groups.

     WITH Interval_Counts AS
    (
    SELECT *, SUM(d.open_interval_increment) OVER(PARTITION BY [t].[Name] ORDER BY d.dt_val ROWS UNBOUNDED PRECEDING) AS open_interval_cnt
    FROM @t AS t
    CROSS APPLY ( VALUES(t.Datetime_Start, 1), (t.Datetime_End, -1) ) d(dt_val, open_interval_increment)
    )
    , Interval_Groups AS
    (
    SELECT *
    , COUNT(CASE WHEN ic.open_interval_cnt = 0 THEN 1 END) OVER(PARTITION BY [ic].[Name] ORDER BY ic.dt_val ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS Interval_Grp
    FROM Interval_Counts AS ic
    )
    SELECT ig.[Name], MIN(ig.dt_val) AS Datetime_Start, MAX(ig.dt_val) AS Datetime_End
    FROM Interval_Groups AS ig
    GROUP BY ig.[Name], ig.Interval_Grp
    ORDER BY ig.[Name], ig.Interval_Grp

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • Many thanks, that helps!

  • I have a question, and it comes up where I work a lot and finding a way to process through like the above feels like a challenge for me.

    It's the same as above, but instead of the overlap, can a solution be presented where the dates are next to each other.  By that I mean if I would want to join a record ending on the 2nd to join with one starting on the 3rd say or earlier if it overlaps.

     

    declare @t table (Name varchar(100),  Datetime_Start  date,  Datetime_End date);
    insert into @t values
    ('A' , '2023-01-02' , '2023-03-28'),
    ('A' , '2023-03-29' , '2023-05-29'), -- See this as a joined up record for A
    ('B' , '2023-05-18' , '2023-05-18'),
    ('B' , '2023-05-17' , '2023-05-20'), -- Joined up also
    ('C' , '2023-01-02' , '2023-01-13'),
    ('C' , '2023-01-14' , '2023-01-28'),-- Joined
    ('C' , '2023-01-30' , '2023-02-05');--New
  • Double posted

    • This reply was modified 2 weeks, 5 days ago by  paul27uk. Reason: Dupe
  • I would look at "forcing" the overlap.  This can be achieved by applying an extra CTE at the start of the above code

     WITH ForcedOverlap AS (
    SELECT t.Name
    /* If the start_date is one more than the prev end_date, then reduce the start_date by 2 to "force" the overlap */ , Datetime_Start = CASE WHEN DATEDIFF(dd, LAG(t.Datetime_End) OVER (PARTITION BY t.Name ORDER BY t.Datetime_Start, t.Datetime_End), t.Datetime_Start) = 1
    THEN DATEADD(dd, -2, t.Datetime_Start)
    ELSE t.Datetime_Start END
    , t.Datetime_End
    FROM @t AS t
    )
    , Interval_Counts AS (
    SELECT t.Name
    , t.Datetime_Start
    , t.Datetime_End
    , d.dt_val
    , d.open_interval_increment
    , open_interval_cnt = SUM(d.open_interval_increment) OVER(PARTITION BY [t].[Name] ORDER BY d.dt_val, t.Datetime_Start
    ROWS UNBOUNDED PRECEDING)
    FROM ForcedOverlap AS t
    CROSS APPLY ( VALUES(t.Datetime_Start, 1), (t.Datetime_End, -1) ) d(dt_val, open_interval_increment)
    )
    , Interval_Groups AS (
    SELECT *
    , Interval_Grp = COUNT(CASE WHEN ic.open_interval_cnt = 0 THEN 1 END) OVER(PARTITION BY [ic].[Name] ORDER BY ic.dt_val, ic.Datetime_Start
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
    FROM Interval_Counts AS ic
    )
    SELECT ig.[Name], Datetime_Start = MIN(ig.dt_val), Datetime_End = MAX(ig.dt_val)
    FROM Interval_Groups AS ig
    GROUP BY ig.[Name], ig.Interval_Grp
    ORDER BY ig.[Name], ig.Interval_Grp

    • This reply was modified 2 weeks, 5 days ago by  DesNorton.
  • Duplicate post

  • The problem is that your data is stored as closed intervals (both end-points are included in the interval) when this issue requires half-closed intervals (only one end-point is included in the interval).  The following code changes the closed intervals to half-closed, updates the order by clauses to account for that change, and then converts the final results back to closed intervals.

     WITH Interval_Counts AS
    (
    -- Update ORDER BY clause to reflect change in interval type in the CROSS APPLY. The DESC is to ensure that start dates are evaluated before end dates.
    SELECT *, SUM(d.open_interval_increment) OVER(PARTITION BY [t].[Name] ORDER BY d.dt_val, d.open_interval_increment DESC ROWS UNBOUNDED PRECEDING) AS open_interval_cnt
    FROM @t AS t
    -- Change closed interval to half-closed interval by substituting DATEADD(DAY, 1, t.Datetime_End) for t.Datetime_End.
    CROSS APPLY ( VALUES(t.Datetime_Start, 1), (DATEADD(DAY, 1, t.Datetime_End), -1) ) d(dt_val, open_interval_increment)
    )
    , Interval_Groups AS
    (
    SELECT *
    -- Update ORDER BY clause to reflect change in interval type.
    , COUNT(CASE WHEN ic.open_interval_cnt = 0 THEN 1 END) OVER(PARTITION BY [ic].[Name] ORDER BY ic.dt_val, ic.open_interval_increment DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS Interval_Grp
    FROM Interval_Counts AS ic
    )
    -- Convert half-closed interval back to closed.
    SELECT ig.[Name], MIN(ig.dt_val) AS Datetime_Start, DATEADD(DAY, -1, MAX(ig.dt_val)) AS Datetime_End
    FROM Interval_Groups AS ig
    GROUP BY ig.[Name], ig.Interval_Grp
    ORDER BY ig.[Name], ig.Interval_Grp

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Another thing.  In cases such as this, it's usually best to start a new post and reference the old post instead of hijacking the original post.  Otherwise it can get very confusing whether a post is referencing the old problem or the new problem.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Ok thanks.  I will do that now.

  • Just wanted to say a heartfelt thanks for this support and solution, and to apologise again for the mistake with the hijack on the post.

    Your solution has solved the issue we have, and the speeds for processing are night and day apart (in a good way).

     

    Thanks again.

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply