I feel like this is a common issue but I'm having problems finding good examples on how to accomplish this task.
This is cases where date ranges should be combined into a single start date and a single end date when date ranges overlap.
-- Example Data
declare @t table (Name varchar(100), Datetime_Start datetime, Datetime_End datetime);
insert into @t values
('A' , '2023-01-02 00:00' , '2023-03-28 00:10'),
('A' , '2023-05-14 23:50' , '2023-05-29 23:50'),
('B' , '2023-05-18 00:00' , '2023-05-18 04:00'),
('B' , '2023-05-18 02:00' , '2023-05-18 03:00'),
('C' , '2023-01-02 00:00' , '2023-01-17 15:50'),
('C' , '2023-01-14 03:50' , '2023-01-28 15:50'),
('C' , '2023-01-25 03:50' , '2023-02-05 15:50');
What I want in the results is:
'A' '2023-01-02 00:00' : '2023-03-28 00:10'
'A' '2023-05-14 23:50' : '2023-05-29 23:50'
'B' '2023-05-18 00:00' : '2023-05-18 03:00'
'C' '2023-01-02 00:00' : '2023-02-05 15:50'
Not really sure how to accomplish this one.
This is based on code by Itzik Ben-Gan. It uses a cross apply and a sum windowed function to increment the open interval count for start times and decrement it for end times. It then looks for records where the previous open interval count was zero to determine interval groups, and then finds the min and max (start and end) times for those interval groups.
WITH Interval_Counts AS
(
SELECT *, SUM(d.open_interval_increment) OVER(PARTITION BY [t].[Name] ORDER BY d.dt_val ROWS UNBOUNDED PRECEDING) AS open_interval_cnt
FROM @t AS t
CROSS APPLY ( VALUES(t.Datetime_Start, 1), (t.Datetime_End, -1) ) d(dt_val, open_interval_increment)
)
, Interval_Groups AS
(
SELECT *
, COUNT(CASE WHEN ic.open_interval_cnt = 0 THEN 1 END) OVER(PARTITION BY [ic].[Name] ORDER BY ic.dt_val ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS Interval_Grp
FROM Interval_Counts AS ic
)
SELECT ig.[Name], MIN(ig.dt_val) AS Datetime_Start, MAX(ig.dt_val) AS Datetime_End
FROM Interval_Groups AS ig
GROUP BY ig.[Name], ig.Interval_Grp
ORDER BY ig.[Name], ig.Interval_Grp
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 1, 2023 at 7:22 am
This was removed by the editor as SPAM
August 1, 2023 at 12:24 pm
This was removed by the editor as SPAM
August 1, 2023 at 6:45 pm
Many thanks, that helps!
December 6, 2024 at 9:41 am
I have a question, and it comes up where I work a lot and finding a way to process through like the above feels like a challenge for me.
It's the same as above, but instead of the overlap, can a solution be presented where the dates are next to each other. By that I mean if I would want to join a record ending on the 2nd to join with one starting on the 3rd say or earlier if it overlaps.
declare @t table (Name varchar(100), Datetime_Start date, Datetime_End date);
insert into @t values
('A' , '2023-01-02' , '2023-03-28'),
('A' , '2023-03-29' , '2023-05-29'), -- See this as a joined up record for A
('B' , '2023-05-18' , '2023-05-18'),
('B' , '2023-05-17' , '2023-05-20'), -- Joined up also
('C' , '2023-01-02' , '2023-01-13'),
('C' , '2023-01-14' , '2023-01-28'),-- Joined
('C' , '2023-01-30' , '2023-02-05');--New
December 6, 2024 at 12:46 pm
I would look at "forcing" the overlap. This can be achieved by applying an extra CTE at the start of the above code
WITH ForcedOverlap AS (
SELECT t.Name
/* If the start_date is one more than the prev end_date, then reduce the start_date by 2 to "force" the overlap */ , Datetime_Start = CASE WHEN DATEDIFF(dd, LAG(t.Datetime_End) OVER (PARTITION BY t.Name ORDER BY t.Datetime_Start, t.Datetime_End), t.Datetime_Start) = 1
THEN DATEADD(dd, -2, t.Datetime_Start)
ELSE t.Datetime_Start END
, t.Datetime_End
FROM @t AS t
)
, Interval_Counts AS (
SELECT t.Name
, t.Datetime_Start
, t.Datetime_End
, d.dt_val
, d.open_interval_increment
, open_interval_cnt = SUM(d.open_interval_increment) OVER(PARTITION BY [t].[Name] ORDER BY d.dt_val, t.Datetime_Start
ROWS UNBOUNDED PRECEDING)
FROM ForcedOverlap AS t
CROSS APPLY ( VALUES(t.Datetime_Start, 1), (t.Datetime_End, -1) ) d(dt_val, open_interval_increment)
)
, Interval_Groups AS (
SELECT *
, Interval_Grp = COUNT(CASE WHEN ic.open_interval_cnt = 0 THEN 1 END) OVER(PARTITION BY [ic].[Name] ORDER BY ic.dt_val, ic.Datetime_Start
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM Interval_Counts AS ic
)
SELECT ig.[Name], Datetime_Start = MIN(ig.dt_val), Datetime_End = MAX(ig.dt_val)
FROM Interval_Groups AS ig
GROUP BY ig.[Name], ig.Interval_Grp
ORDER BY ig.[Name], ig.Interval_Grp
December 6, 2024 at 12:52 pm
Duplicate post
December 6, 2024 at 4:41 pm
The problem is that your data is stored as closed intervals (both end-points are included in the interval) when this issue requires half-closed intervals (only one end-point is included in the interval). The following code changes the closed intervals to half-closed, updates the order by clauses to account for that change, and then converts the final results back to closed intervals.
WITH Interval_Counts AS
(
-- Update ORDER BY clause to reflect change in interval type in the CROSS APPLY. The DESC is to ensure that start dates are evaluated before end dates.
SELECT *, SUM(d.open_interval_increment) OVER(PARTITION BY [t].[Name] ORDER BY d.dt_val, d.open_interval_increment DESC ROWS UNBOUNDED PRECEDING) AS open_interval_cnt
FROM @t AS t
-- Change closed interval to half-closed interval by substituting DATEADD(DAY, 1, t.Datetime_End) for t.Datetime_End.
CROSS APPLY ( VALUES(t.Datetime_Start, 1), (DATEADD(DAY, 1, t.Datetime_End), -1) ) d(dt_val, open_interval_increment)
)
, Interval_Groups AS
(
SELECT *
-- Update ORDER BY clause to reflect change in interval type.
, COUNT(CASE WHEN ic.open_interval_cnt = 0 THEN 1 END) OVER(PARTITION BY [ic].[Name] ORDER BY ic.dt_val, ic.open_interval_increment DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS Interval_Grp
FROM Interval_Counts AS ic
)
-- Convert half-closed interval back to closed.
SELECT ig.[Name], MIN(ig.dt_val) AS Datetime_Start, DATEADD(DAY, -1, MAX(ig.dt_val)) AS Datetime_End
FROM Interval_Groups AS ig
GROUP BY ig.[Name], ig.Interval_Grp
ORDER BY ig.[Name], ig.Interval_Grp
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 6, 2024 at 4:44 pm
Another thing. In cases such as this, it's usually best to start a new post and reference the old post instead of hijacking the original post. Otherwise it can get very confusing whether a post is referencing the old problem or the new problem.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 11, 2024 at 10:17 am
Ok thanks. I will do that now.
December 11, 2024 at 5:05 pm
Just wanted to say a heartfelt thanks for this support and solution, and to apologise again for the mistake with the hijack on the post.
Your solution has solved the issue we have, and the speeds for processing are night and day apart (in a good way).
Thanks again.
December 17, 2024 at 6:48 pm
This was removed by the editor as SPAM
December 17, 2024 at 6:49 pm
This was removed by the editor as SPAM
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply