merge/flatten overlapping Time ranges. In order.

  • Hi,

    I have few intervals with status.

    They change in Time (order indicated by OrderId).

    OrderId Status DateStart DateEnd

    ----------- ------ ----------------------- -----------------------

    1 A 2015-01-01 00:00:00.000 2015-03-31 00:00:00.000

    2 B 2015-01-15 00:00:00.000 2015-02-15 00:00:00.000

    3 C 2015-02-15 00:00:00.000 2015-03-15 00:00:00.000

    4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000

    5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000

    In the End I would Like to have flattened (in order of appearance, next OrderId has priority) result like this :

    OrderId Status DateStart DateEnd

    ----------- ------ ----------------------- -----------------------

    1 A 2015-01-01 00:00:00.000 2015-01-14 00:00:00.000

    4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000

    1 A 2015-03-16 00:00:00.000 2015-03-31 00:00:00.000

    5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000

    as you can see:

    1) OrderId=1 was split into 2 rows, as it was partially overlapped by next ranges

    2) orderdId 2 and 3 don't exist anymore as they've been overlapped by OrderId=5.

    code for source table:

    select *

    FROM

    (VALUES

    (1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),

    (2, 'B','20150115','20150215'),

    (3, 'C','20150215','20150315'),

    (4, 'D','20150115','20150315'),

    (5, 'E','20150401','20150415')

    ) AS TimeIntervals(OrderId, status, DateStart, DateEnd)

    Hope you can help.

    Tomek

  • You may want to take a look at the below. This is a "Gaps and Islands" problem. The below is designed around a numeric gap, but can be worked around to handle dates/times. I've done this a while back (broke it down into finding 1 minute gaps), but this is a good starting point.

    http://www.sqlservercentral.com/articles/Advanced+Querying/anefficientsetbasedsolutionforislandsandgaps/1619/

  • There's a solution to the gaps and islands problem in Itzik Ben-Gan's book on windowing functions, too. I know I've seen at least one article on the web. Make sure you are searching for the right version of SQL Server, because some solutions are specific to later versions (2012, 2014).

  • This works with your data, not very efficient though

    WITH Source AS (

    SELECT *

    FROM

    (VALUES

    (1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),

    (2, 'B','20150115','20150215'),

    (3, 'C','20150215','20150315'),

    (4, 'D','20150115','20150315'),

    (5, 'E','20150401','20150415')

    ) AS TimeIntervals(OrderId, status, DateStart, DateEnd)

    ),

    Starts AS (

    SELECT a.DateStart AS dt

    FROM Source a

    WHERE NOT EXISTS(SELECT * FROM Source b

    WHERE b.OrderId > a.OrderId

    AND a.DateStart BETWEEN b.DateStart AND b.DateEnd)),

    Ends AS (

    SELECT a.DateEnd AS dt

    FROM Source a

    WHERE NOT EXISTS(SELECT * FROM Source b

    WHERE b.OrderId > a.OrderId

    AND a.DateEnd BETWEEN b.DateStart AND b.DateEnd)),

    Combined AS (

    SELECT dt,'S' AS StartEnd

    FROM Starts

    UNION ALL

    SELECT dt,'E' AS StartEnd

    FROM Ends

    ),

    Ordered AS (

    SELECT dt,StartEnd,

    ROW_NUMBER() OVER(ORDER BY dt) AS rn

    FROM Combined)

    SELECT ca.OrderId,

    ca.status,

    CASE WHEN s.StartEnd = 'E' THEN DATEADD(day,1,s.dt) ELSE s.dt END AS DateStart,

    CASE WHEN e.StartEnd = 'S' THEN DATEADD(day,-1,e.dt) ELSE e.dt END AS DateEnd

    FROM Ordered s

    INNER JOIN Ordered e ON e.rn = s.rn + 1

    CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status

    FROM Source ca

    WHERE s.dt >= ca.DateStart

    AND e.dt <= ca.DateEnd

    ORDER BY ca.OrderId DESC) ca

    ORDER BY s.dt;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • venoym (2/10/2015)


    You may want to take a look at the below. This is a "Gaps and Islands" problem. The below is designed around a numeric gap, but can be worked around to handle dates/times. I've done this a while back (broke it down into finding 1 minute gaps), but this is a good starting point.

    http://www.sqlservercentral.com/articles/Advanced+Querying/anefficientsetbasedsolutionforislandsandgaps/1619/

    I differ with your opinion here. This is similar to a gaps and islands problem but is not solved the same way. This is the overlapping intervals problem.

    The original link to Itzik Ben-Gan's article on SolidQ had gone dead the last time I checked, but I used his methods in discussing various efficient approaches to this problem here:

    Calculating Gaps Between Overlapping Time Intervals in SQL [/url]

    The link to IBG's article is in that one, and it is too bad his article is gone because it explains it quite well. Note that this article takes the normalized overlapping intervals and converts them to the gaps (so the second step you don't need).

    If you find that a bit daunting to understand (I confess that I did at first), let me know and I'll make a solution happen based on your specific test data.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi,

    First of all, thank you all for your input. It is invaluable.

    @venoym

    your solutions with numbers looks very interesting, but as Dwain suggested I had problems finding out how it would be relevant to my case

    @pietlinden

    I started reading Itzik Ben Gan's 'Window Functions in SQL Server 2012", but I'm not as far yet.

    @mark-3 Cowne - Your solution seems to work fine. Thanks! I need to do some testing. As you mentioned - not very perfomarnt though...

    @Dwain - Thanks for commenting on my problems again (I received some useful hints from your sometime ago - also on Time Intervals).

    I would be very, very greteful if you would find some time, to show me how to implement your logic to my case.

    OH, and important info.I didn't know it before, but solution will be meant for SQL 2012.

    I would rather know a way for both 2008r2 and 2012, but I don't want to waste your time...

    Many Thanks.

    Tomek

  • tomek tomek (2/11/2015)


    Hi,

    First of all, thank you all for your input. It is invaluable.

    @venoym

    a) you solutions with numbers looks very interesting, but as Dwain suggested I had problems finding out how it would be relevant to my case

    b) I started reading Itzik Ben Gan's 'Window Functions in SQL Server 2012", but I'm not as far yet.

    @mark-3 Cowne - Your solution seems to work fine. Thanks! I need to do some testing. As you mentioned - not very perfomarnt though...

    @Dawin - Thanks for commenting on my problems again (I received some useful hints from your sometime ago - also on Time Intervals).

    I would be very, very greteful if you would find some time, to show me how to implement your logic to my case.

    Many Thanks.

    Tomek

    Just be aware that I've changed the query I originally posted, it should work correctly now although sadly no quicker.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • @mark-3 Cowne - Care to explain, what was wrong with the first approach?

    Regards

    tomek

  • It was getting the wrong values for ordereid and status in some circumstances.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • @mark-3

    it seems there's something wrong with your new code (I don't see same problem with previous solution).

    for following dataset:

    SELECT *

    FROM

    (VALUES

    (1, '1',convert(datetime,'20121001'),convert(datetime,'20141031')),

    (2, '7','20140701','20140731'),

    (3, '10','20140801','20141031')

    ) AS TimeIntervals(OrderId, status, DateStart, DateEnd)

    I receive this :

    OrderId status DateStart DateEnd

    ----------- ------ ----------------------- -----------------------

    1 1 2012-10-01 00:00:00.000 2014-06-30 00:00:00.000

    2 7 2014-07-01 00:00:00.000 2014-07-31 00:00:00.000

    1 1 2014-08-01 00:00:00.000 2014-07-31 00:00:00.000

    3 10 2014-08-01 00:00:00.000 2014-10-31 00:00:00.000

    Correct Result would be:

    OrderId status DateStart DateEnd

    ----------- ------ ----------------------- -----------------------

    1 1 2012-10-01 00:00:00.000 2014-06-30 00:00:00.000

    2 7 2014-07-01 00:00:00.000 2014-07-31 00:00:00.000

    3 10 2014-08-01 00:00:00.000 2014-10-31 00:00:00.000

    1. Can you check this?

    2. Can you show me example where your second approach fixes problems found with the first one?

    For me, first solution seems to be working fine.

    regards

    Tomek

  • The problem is due to two intervals having a one day gap between the end of one and the start of the next, I think these can simply be excluded.

    Change

    CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status

    FROM Source ca

    WHERE s.dt >= ca.DateStart

    AND e.dt <= ca.DateEnd

    ORDER BY ca.OrderId DESC) ca

    ORDER BY s.dt;

    to

    CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status

    FROM Source ca

    WHERE s.dt >= ca.DateStart

    AND e.dt <= ca.DateEnd

    ORDER BY ca.OrderId DESC) ca

    -- Exclude the case where two consecutive intervals have a gap of one day - this will generate a negative interval

    WHERE CASE WHEN s.StartEnd = 'E' THEN DATEADD(day,1,s.dt) ELSE s.dt END <= CASE WHEN e.StartEnd = 'S' THEN DATEADD(day,-1,e.dt) ELSE e.dt END

    ORDER BY s.dt;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Well, this was a fun one to do.

    I decided to try it without reading Dwain's article from earlier so this may not be the best way.

    Basically I am joining to a calendar table and grouping to resolve the overlap, which then gives me an islands problem, which I solve in IBG fashion

    For the calendar table, I borrowed Dwain's inline function from here[/url]

    CREATE TABLE #SourceData

    (

    OrderID int,

    OrderStatus char(1),

    StartDatedatetime,

    EndDate datetime

    )

    Insert #SourceData (OrderID,OrderStatus,StartDate,EndDate)

    select *

    FROM

    (VALUES

    (1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),

    (2, 'B','20150115','20150215'),

    (3, 'C','20150215','20150315'),

    (4, 'D','20150115','20150315'),

    (5, 'E','20150401','20150415')

    ) AS TimeIntervals(OrderId, status, DateStart, DateEnd)

    ;

    WITH CrossToCalendar AS (

    SELECT c.[Date], MAX(OrderID) OrderId FROM #SourceData s

    CROSS APPLY (SELECT g.[Date] FROM GenerateCalendar('2015-01-01',365) g

    WHERE g.[Date] BETWEEN s.StartDate AND s.EndDate) c

    GROUP BY c.[Date]

    ),

    IslandBuilder AS (

    SELECT DATEADD(day,(-1*DENSE_RANK() OVER (PARTITION BY OrderID ORDER BY [date])),[date]) dr,date,OrderId FROM CrossToCalendar

    ),

    Islands AS (

    SELECT MIN(date) StartDate,MAX(date) EndDate,dr,OrderId

    FROM IslandBuilder

    GROUP BY dr,OrderId

    )

    SELECT i.OrderId,s.orderstatus,i.StartDate,i.EndDate

    FROM Islands i

    JOIN #SourceData s

    ON i.OrderId = s.OrderID

    ORDER BY StartDate

    DROP TABLE #SourceData

    Not sure if it would be better to use ROW_NUMBER() in cross to calendar instead of grouping so I could carry the status down instead of rejoining at the end, but this is the path I had started down so I stuck with it.

    EDIT: Just skimmed through Dwain's article, and nope, he doesn't bother with a calendar table. But he is also packing overlapping intervals, and this problem splits them with Order ID as a tiebreaker instead.

  • Here's another version, this one using SQL Server 2012 LEAD

    WITH Source AS (

    SELECT *

    FROM

    (VALUES

    (1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),

    (2, 'B','20150115','20150215'),

    (3, 'C','20150215','20150315'),

    (4, 'D','20150115','20150315'),

    (5, 'E','20150401','20150415')

    ) AS TimeIntervals(OrderId, status, DateStart, DateEnd)

    ),

    Starts AS (

    SELECT a.DateStart AS dt

    FROM Source a

    WHERE NOT EXISTS(SELECT * FROM Source b

    WHERE b.OrderId > a.OrderId

    AND a.DateStart BETWEEN b.DateStart AND b.DateEnd)),

    Ends AS (

    SELECT a.DateEnd AS dt

    FROM Source a

    WHERE NOT EXISTS(SELECT * FROM Source b

    WHERE b.OrderId > a.OrderId

    AND a.DateEnd BETWEEN b.DateStart AND b.DateEnd)),

    Combined AS (

    SELECT dt,'S' AS StartEnd

    FROM Starts

    UNION ALL

    SELECT dt,'E' AS StartEnd

    FROM Ends

    ),

    Ordered AS (

    SELECT dt AS DateStart,

    StartEnd AS S_StartEnd,

    LEAD(dt) OVER(ORDER BY dt) AS DateEnd,

    LEAD(StartEnd) OVER(ORDER BY dt) AS E_StartEnd

    FROM Combined)

    SELECT ca.OrderId,

    ca.status,

    CASE WHEN s.S_StartEnd = 'E' THEN DATEADD(day,1,s.DateStart) ELSE s.DateStart END AS DateStart,

    CASE WHEN s.E_StartEnd = 'S' THEN DATEADD(day,-1,s.DateEnd) ELSE s.DateEnd END AS DateEnd

    FROM Ordered s

    CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status

    FROM Source ca

    WHERE s.DateStart >= ca.DateStart

    AND s.DateEnd <= ca.DateEnd

    ORDER BY ca.OrderId DESC) ca

    -- Exclude the case where two consecutive intervals have a gap of one day - this will generate a negative interval

    WHERE CASE WHEN s.S_StartEnd = 'E' THEN DATEADD(day,1,s.DateStart) ELSE s.DateStart END <= CASE WHEN s.E_StartEnd = 'S' THEN DATEADD(day,-1,s.DateEnd) ELSE s.DateEnd END

    ORDER BY s.DateStart;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • tomek tomek (2/11/2015)


    @Dwain - Thanks for commenting on my problems again (I received some useful hints from your sometime ago - also on Time Intervals).

    I would be very, very greteful if you would find some time, to show me how to implement your logic to my case.

    Hmmm. Upon taking a closer look at your expected results, I'm not sure if I understood the problem correctly, so let me ask a couple of questions here (refer to comments).

    OrderID status DateStart DateEnd

    1 A 2015-01-01 00:00:00.000 2015-03-31 00:00:00.000 -- Starts with a 3 month interval

    2 B 2015-01-15 00:00:00.000 2015-02-15 00:00:00.000 -- Starts/ends within OrderID=1

    3 C 2015-02-15 00:00:00.000 2015-03-15 00:00:00.000 -- Starts/ends within OrderID=1

    4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000 -- Starts/ends within OrderID=1

    5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000 -- Starts and ends after OrderID=1

    Since OrderIDs 2, 3 and 4 all start and end within OrderID=1, why are they not eliminated such that your final result is (these would be the non-overlapping time periods during which you have an order):

    OrderID status DateStart DateEnd

    1 A 2015-01-01 00:00:00.000 2015-03-31 00:00:00.000

    5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000

    Perhaps my morning coffee has not yet kicked in... Either that or it is not an overlapping intervals problem as I first thought it was.

    You also said originally:

    tomek tomek (2/10/2015)


    as you can see:

    1) OrderId=1 was split into 2 rows, as it was partially overlapped by next ranges

    2) orderdId 2 and 3 don't exist anymore as they've been overlapped by OrderId=5.

    1. What is the relationship between OrderID=1 and OrderIDs IN (2,3) such that they would cause OrderID=1 to be split?

    2. Since OrderID=4 is also within OrderID=1, why doesn't it cause some effect to OrderID=1?

    3. On your 2), shouldn't that be OrderID=1 (instead of OrderID=5 because 5 overlaps with neither)?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Basically if two intervals would be active he considers the one with the highest order id active. 2 never shows because it perfectly overlaps with 4, which is higher.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply