merge/flatten overlapping Time ranges. In order.

Question

merge/flatten overlapping Time ranges. In order.

tomek tomek

Mr or Mrs. 500

Points: 597
More actions
February 10, 2015 at 5:07 am

#397999

Hi,
I have few intervals with status.
They change in Time (order indicated by OrderId).
OrderId Status DateStart DateEnd
----------- ------ ----------------------- -----------------------
1 A 2015-01-01 00:00:00.000 2015-03-31 00:00:00.000
2 B 2015-01-15 00:00:00.000 2015-02-15 00:00:00.000
3 C 2015-02-15 00:00:00.000 2015-03-15 00:00:00.000
4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000
5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000
In the End I would Like to have flattened (in order of appearance, next OrderId has priority) result like this :
OrderId Status DateStart DateEnd
----------- ------ ----------------------- -----------------------
1 A 2015-01-01 00:00:00.000 2015-01-14 00:00:00.000
4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000
1 A 2015-03-16 00:00:00.000 2015-03-31 00:00:00.000
5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000
as you can see:
1) OrderId=1 was split into 2 rows, as it was partially overlapped by next ranges
2) orderdId 2 and 3 don't exist anymore as they've been overlapped by OrderId=5.
code for source table:
select *
FROM
(VALUES
(1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),
(2, 'B','20150115','20150215'),
(3, 'C','20150215','20150315'),
(4, 'D','20150115','20150315'),
(5, 'E','20150401','20150415')
) AS TimeIntervals(OrderId, status, DateStart, DateEnd)
Hope you can help.
Tomek

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

venoym SSCarpal Tunnel Points: 4161 More actions · Answer 1

You may want to take a look at the below. This is a "Gaps and Islands" problem. The below is designed around a numeric gap, but can be worked around to handle dates/times. I've done this a while back (broke it down into finding 1 minute gaps), but this is a good starting point.

http://www.sqlservercentral.com/articles/Advanced+Querying/anefficientsetbasedsolutionforislandsandgaps/1619/

pietlinden SSC Guru Points: 63633 More actions · Answer 2

There's a solution to the gaps and islands problem in Itzik Ben-Gan's book on windowing functions, too. I know I've seen at least one article on the web. Make sure you are searching for the right version of SQL Server, because some solutions are specific to later versions (2012, 2014).

Mark Cowne One Orange Chip Points: 26990 More actions · Answer 3

This works with your data, not very efficient though

WITH Source AS (

SELECT *

FROM

(VALUES

(1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),

(2, 'B','20150115','20150215'),

(3, 'C','20150215','20150315'),

(4, 'D','20150115','20150315'),

(5, 'E','20150401','20150415')

) AS TimeIntervals(OrderId, status, DateStart, DateEnd)

),

Starts AS (

SELECT a.DateStart AS dt

FROM Source a

WHERE NOT EXISTS(SELECT * FROM Source b

WHERE b.OrderId > a.OrderId

AND a.DateStart BETWEEN b.DateStart AND b.DateEnd)),

Ends AS (

SELECT a.DateEnd AS dt

FROM Source a

WHERE NOT EXISTS(SELECT * FROM Source b

WHERE b.OrderId > a.OrderId

AND a.DateEnd BETWEEN b.DateStart AND b.DateEnd)),

Combined AS (

SELECT dt,'S' AS StartEnd

FROM Starts

UNION ALL

SELECT dt,'E' AS StartEnd

FROM Ends

),

Ordered AS (

SELECT dt,StartEnd,

ROW_NUMBER() OVER(ORDER BY dt) AS rn

FROM Combined)

SELECT ca.OrderId,

ca.status,

CASE WHEN s.StartEnd = 'E' THEN DATEADD(day,1,s.dt) ELSE s.dt END AS DateStart,

CASE WHEN e.StartEnd = 'S' THEN DATEADD(day,-1,e.dt) ELSE e.dt END AS DateEnd

FROM Ordered s

INNER JOIN Ordered e ON e.rn = s.rn + 1

CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status

FROM Source ca

WHERE s.dt >= ca.DateStart

AND e.dt <= ca.DateEnd

ORDER BY ca.OrderId DESC) ca

ORDER BY s.dt;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Dwain Camps SSC Guru Points: 86978 More actions · Answer 4

venoym (2/10/2015)
You may want to take a look at the below. This is a "Gaps and Islands" problem. The below is designed around a numeric gap, but can be worked around to handle dates/times. I've done this a while back (broke it down into finding 1 minute gaps), but this is a good starting point.
http://www.sqlservercentral.com/articles/Advanced+Querying/anefficientsetbasedsolutionforislandsandgaps/1619/

I differ with your opinion here. This is similar to a gaps and islands problem but is not solved the same way. This is the overlapping intervals problem.

The original link to Itzik Ben-Gan's article on SolidQ had gone dead the last time I checked, but I used his methods in discussing various efficient approaches to this problem here:

Calculating Gaps Between Overlapping Time Intervals in SQL [/url]

The link to IBG's article is in that one, and it is too bad his article is gone because it explains it quite well. Note that this article takes the normalized overlapping intervals and converts them to the gaps (so the second step you don't need).

If you find that a bit daunting to understand (I confess that I did at first), let me know and I'll make a solution happen based on your specific test data.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

tomek tomek Mr or Mrs. 500 Points: 597 More actions · Answer 5

Hi,

First of all, thank you all for your input. It is invaluable.

@Venoym

your solutions with numbers looks very interesting, but as Dwain suggested I had problems finding out how it would be relevant to my case

@pietlinden

I started reading Itzik Ben Gan's 'Window Functions in SQL Server 2012", but I'm not as far yet.

@mark-3 Cowne - Your solution seems to work fine. Thanks! I need to do some testing. As you mentioned - not very perfomarnt though...

@Dwain - Thanks for commenting on my problems again (I received some useful hints from your sometime ago - also on Time Intervals).

I would be very, very greteful if you would find some time, to show me how to implement your logic to my case.

OH, and important info.I didn't know it before, but solution will be meant for SQL 2012.

I would rather know a way for both 2008r2 and 2012, but I don't want to waste your time...

Many Thanks.

Tomek

Mark Cowne One Orange Chip Points: 26990 More actions · Answer 6

tomek tomek (2/11/2015)
Hi,
First of all, thank you all for your input. It is invaluable.
@Venoym
a) you solutions with numbers looks very interesting, but as Dwain suggested I had problems finding out how it would be relevant to my case
b) I started reading Itzik Ben Gan's 'Window Functions in SQL Server 2012", but I'm not as far yet.
@mark-3 Cowne - Your solution seems to work fine. Thanks! I need to do some testing. As you mentioned - not very perfomarnt though...
@Dawin - Thanks for commenting on my problems again (I received some useful hints from your sometime ago - also on Time Intervals).
I would be very, very greteful if you would find some time, to show me how to implement your logic to my case.
Many Thanks.
Tomek

Just be aware that I've changed the query I originally posted, it should work correctly now although sadly no quicker.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

tomek tomek Mr or Mrs. 500 Points: 597 More actions · Answer 7

@mark-3 Cowne - Care to explain, what was wrong with the first approach?

Regards

tomek

Mark Cowne One Orange Chip Points: 26990 More actions · Answer 8

It was getting the wrong values for ordereid and status in some circumstances.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

tomek tomek Mr or Mrs. 500 Points: 597 More actions · Answer 9

@mark-3

it seems there's something wrong with your new code (I don't see same problem with previous solution).

for following dataset:

SELECT *

FROM

(VALUES

(1, '1',convert(datetime,'20121001'),convert(datetime,'20141031')),

(2, '7','20140701','20140731'),

(3, '10','20140801','20141031')

) AS TimeIntervals(OrderId, status, DateStart, DateEnd)

I receive this :

OrderId status DateStart DateEnd

----------- ------ ----------------------- -----------------------

1 1 2012-10-01 00:00:00.000 2014-06-30 00:00:00.000

2 7 2014-07-01 00:00:00.000 2014-07-31 00:00:00.000

1 1 2014-08-01 00:00:00.000 2014-07-31 00:00:00.000

3 10 2014-08-01 00:00:00.000 2014-10-31 00:00:00.000

Correct Result would be:

OrderId status DateStart DateEnd

----------- ------ ----------------------- -----------------------

1 1 2012-10-01 00:00:00.000 2014-06-30 00:00:00.000

2 7 2014-07-01 00:00:00.000 2014-07-31 00:00:00.000

3 10 2014-08-01 00:00:00.000 2014-10-31 00:00:00.000

1. Can you check this?

2. Can you show me example where your second approach fixes problems found with the first one?

For me, first solution seems to be working fine.

regards

Tomek

Mark Cowne One Orange Chip Points: 26990 More actions · Answer 10

The problem is due to two intervals having a one day gap between the end of one and the start of the next, I think these can simply be excluded.

Change

CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status

FROM Source ca

WHERE s.dt >= ca.DateStart

AND e.dt <= ca.DateEnd

ORDER BY ca.OrderId DESC) ca

ORDER BY s.dt;

to

CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status

FROM Source ca

WHERE s.dt >= ca.DateStart

AND e.dt <= ca.DateEnd

ORDER BY ca.OrderId DESC) ca

-- Exclude the case where two consecutive intervals have a gap of one day - this will generate a negative interval

WHERE CASE WHEN s.StartEnd = 'E' THEN DATEADD(day,1,s.dt) ELSE s.dt END <= CASE WHEN e.StartEnd = 'S' THEN DATEADD(day,-1,e.dt) ELSE e.dt END

ORDER BY s.dt;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Nevyn SSChampion Points: 14210 More actions · Answer 11

Well, this was a fun one to do.

I decided to try it without reading Dwain's article from earlier so this may not be the best way.

Basically I am joining to a calendar table and grouping to resolve the overlap, which then gives me an islands problem, which I solve in IBG fashion

For the calendar table, I borrowed Dwain's inline function from here[/url]

CREATE TABLE #SourceData

(

OrderID int,

OrderStatus char(1),

StartDatedatetime,

EndDate datetime

)

Insert #SourceData (OrderID,OrderStatus,StartDate,EndDate)

select *

FROM

(VALUES

(1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),

(2, 'B','20150115','20150215'),

(3, 'C','20150215','20150315'),

(4, 'D','20150115','20150315'),

(5, 'E','20150401','20150415')

) AS TimeIntervals(OrderId, status, DateStart, DateEnd)

;

WITH CrossToCalendar AS (

SELECT c.[Date], MAX(OrderID) OrderId FROM #SourceData s

CROSS APPLY (SELECT g.[Date] FROM GenerateCalendar('2015-01-01',365) g

WHERE g.[Date] BETWEEN s.StartDate AND s.EndDate) c

GROUP BY c.[Date]

),

IslandBuilder AS (

SELECT DATEADD(day,(-1*DENSE_RANK() OVER (PARTITION BY OrderID ORDER BY [date])),[date]) dr,date,OrderId FROM CrossToCalendar

),

Islands AS (

SELECT MIN(date) StartDate,MAX(date) EndDate,dr,OrderId

FROM IslandBuilder

GROUP BY dr,OrderId

)

SELECT i.OrderId,s.orderstatus,i.StartDate,i.EndDate

FROM Islands i

JOIN #SourceData s

ON i.OrderId = s.OrderID

ORDER BY StartDate

DROP TABLE #SourceData

Not sure if it would be better to use ROW_NUMBER() in cross to calendar instead of grouping so I could carry the status down instead of rejoining at the end, but this is the path I had started down so I stuck with it.

EDIT: Just skimmed through Dwain's article, and nope, he doesn't bother with a calendar table. But he is also packing overlapping intervals, and this problem splits them with Order ID as a tiebreaker instead.

Mark Cowne One Orange Chip Points: 26990 More actions · Answer 12

Here's another version, this one using SQL Server 2012 LEAD

WITH Source AS (

SELECT *

FROM

(VALUES

(1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),

(2, 'B','20150115','20150215'),

(3, 'C','20150215','20150315'),

(4, 'D','20150115','20150315'),

(5, 'E','20150401','20150415')

) AS TimeIntervals(OrderId, status, DateStart, DateEnd)

),

Starts AS (

SELECT a.DateStart AS dt

FROM Source a

WHERE NOT EXISTS(SELECT * FROM Source b

WHERE b.OrderId > a.OrderId

AND a.DateStart BETWEEN b.DateStart AND b.DateEnd)),

Ends AS (

SELECT a.DateEnd AS dt

FROM Source a

WHERE NOT EXISTS(SELECT * FROM Source b

WHERE b.OrderId > a.OrderId

AND a.DateEnd BETWEEN b.DateStart AND b.DateEnd)),

Combined AS (

SELECT dt,'S' AS StartEnd

FROM Starts

UNION ALL

SELECT dt,'E' AS StartEnd

FROM Ends

),

Ordered AS (

SELECT dt AS DateStart,

StartEnd AS S_StartEnd,

LEAD(dt) OVER(ORDER BY dt) AS DateEnd,

LEAD(StartEnd) OVER(ORDER BY dt) AS E_StartEnd

FROM Combined)

SELECT ca.OrderId,

ca.status,

CASE WHEN s.S_StartEnd = 'E' THEN DATEADD(day,1,s.DateStart) ELSE s.DateStart END AS DateStart,

CASE WHEN s.E_StartEnd = 'S' THEN DATEADD(day,-1,s.DateEnd) ELSE s.DateEnd END AS DateEnd

FROM Ordered s

CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status

FROM Source ca

WHERE s.DateStart >= ca.DateStart

AND s.DateEnd <= ca.DateEnd

ORDER BY ca.OrderId DESC) ca

-- Exclude the case where two consecutive intervals have a gap of one day - this will generate a negative interval

WHERE CASE WHEN s.S_StartEnd = 'E' THEN DATEADD(day,1,s.DateStart) ELSE s.DateStart END <= CASE WHEN s.E_StartEnd = 'S' THEN DATEADD(day,-1,s.DateEnd) ELSE s.DateEnd END

ORDER BY s.DateStart;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Dwain Camps SSC Guru Points: 86978 More actions · Answer 13

tomek tomek (2/11/2015)
@Dwain - Thanks for commenting on my problems again (I received some useful hints from your sometime ago - also on Time Intervals).
I would be very, very greteful if you would find some time, to show me how to implement your logic to my case.

Hmmm. Upon taking a closer look at your expected results, I'm not sure if I understood the problem correctly, so let me ask a couple of questions here (refer to comments).

OrderID status DateStart DateEnd

1 A 2015-01-01 00:00:00.000 2015-03-31 00:00:00.000 -- Starts with a 3 month interval

2 B 2015-01-15 00:00:00.000 2015-02-15 00:00:00.000 -- Starts/ends within OrderID=1

3 C 2015-02-15 00:00:00.000 2015-03-15 00:00:00.000 -- Starts/ends within OrderID=1

4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000 -- Starts/ends within OrderID=1

5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000 -- Starts and ends after OrderID=1

Since OrderIDs 2, 3 and 4 all start and end within OrderID=1, why are they not eliminated such that your final result is (these would be the non-overlapping time periods during which you have an order):

OrderID status DateStart DateEnd

1 A 2015-01-01 00:00:00.000 2015-03-31 00:00:00.000

5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000

Perhaps my morning coffee has not yet kicked in... Either that or it is not an overlapping intervals problem as I first thought it was.

You also said originally:

tomek tomek (2/10/2015)
as you can see:
1) OrderId=1 was split into 2 rows, as it was partially overlapped by next ranges
2) orderdId 2 and 3 don't exist anymore as they've been overlapped by OrderId=5.

1. What is the relationship between OrderID=1 and OrderIDs IN (2,3) such that they would cause OrderID=1 to be split?

2. Since OrderID=4 is also within OrderID=1, why doesn't it cause some effect to OrderID=1?

3. On your 2), shouldn't that be OrderID=1 (instead of OrderID=5 because 5 overlaps with neither)?

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Nevyn SSChampion Points: 14210 More actions · Answer 14

Basically if two intervals would be active he considers the one with the highest order id active. 2 never shows because it perfectly overlaps with 4, which is higher.