merge/flatten overlapping Time ranges. In order.

  • Hi Dwain,

    dwain.c (2/11/2015)


    You also said originally:

    tomek tomek (2/10/2015)


    as you can see:

    1) OrderId=1 was split into 2 rows, as it was partially overlapped by next ranges

    2) orderdId 2 and 3 don't exist anymore as they've been overlapped by OrderId=5.

    let me answer your questions beginning with the last one.

    First of all here, as a reminder is the desired result

    OrderId Status DateStart DateEnd

    ----------- ------ ----------------------- -----------------------

    1 A 2015-01-01 00:00:00.000 2015-01-14 00:00:00.000

    4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000

    1 A 2015-03-16 00:00:00.000 2015-03-31 00:00:00.000

    5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000

    3. On your 2), shouldn't that be OrderID=1 (instead of OrderID=5 because 5 overlaps with neither)?

    you're absolutely right - I meant orderid=4 overlaps OrderId 2 and 3

    2. Since OrderID=4 is also within OrderID=1, why doesn't it cause some effect to OrderID=1?

    yes - it does absolutely, as you can see in the desired result - OrderId covers partially OrderId=1 and leaves 2 peaces of it (01-01 -- 01-14 and 03-16 -- 03-31)

    1. What is the relationship between OrderID=1 and OrderIDs IN (2,3) such that they would cause OrderID=1 to be split?

    each next OrderId "covers" the previous one. Imagine patchwork, or gluing peaces of colored papers on each other. I want to describe the final situation, I don't care if there were tons of papers underneath, which I don't see anymore. Only what's on display is important.

    I hope this will clearify the matter.

    Tomek

  • Given that explanation, this seems to work.

    select OrderID, [status], DateStart, DateEnd

    INTO #OverlappingIntervals

    FROM

    (VALUES

    (1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),

    (2, 'B','20150115','20150215'),

    (3, 'C','20150215','20150315'),

    (4, 'D','20150115','20150315'),

    (5, 'E','20150401','20150415')

    ) AS TimeIntervals(OrderId, status, DateStart, DateEnd);

    select OrderID, [status], DateStart, DateEnd

    FROM #OverlappingIntervals;

    WITH Tally (n) AS

    (

    SELECT TOP ((SELECT 1+MAX(DATEDIFF(day, DateStart, DateEnd)) FROM #OverlappingIntervals))

    -- 0 based Tally table

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1

    FROM sys.all_columns

    )

    SELECT OrderID, [status], DateStart=MIN(d), DateEnd=MAX(d)

    FROM

    (

    SELECT OrderID, [status], DateStart, DateEnd, d

    ,n=d-ROW_NUMBER() OVER (PARTITION BY OrderID, [status] ORDER BY d)

    FROM

    (

    SELECT OrderID, [status], DateStart, DateEnd, d, n

    ,rn=ROW_NUMBER() OVER (PARTITION BY d ORDER by OrderID DESC)

    FROM #OverlappingIntervals a

    CROSS APPLY

    (

    SELECT n=n, d=DATEADD(day, n, DateStart)

    FROM Tally

    WHERE DATEADD(day, n, DateStart) <= DateEnd

    ) b

    ) a

    WHERE rn=1

    ) a

    GROUP BY OrderID, [status], n

    ORDER BY DateStart;

    GO

    DROP TABLE #OverlappingIntervals;

    Two notations:

    - My Tally CTE can be replaced by proper usage of a Calendar table: Calendar Tables in T-SQL[/url]

    - I don't really like the solution because of the way it explodes each row into one for each day in the date range. This could lead to a performance problem if you're processing a large set.

    Offhand I don't see anything better, but there might be. I'll play around a little more.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Here's another way that I'm a little happier with:

    WITH IntervalDates AS

    (

    SELECT mind=MIN(DateStart), maxd=MAX(DateEnd)

    FROM #OverlappingIntervals

    ),

    Calendar (d) AS

    (

    SELECT TOP ((SELECT 1+DATEDIFF(day, mind, maxd) FROM IntervalDates))

    DATEADD(day, ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1, mind)

    FROM IntervalDates

    CROSS JOIN sys.all_columns

    )

    SELECT OrderID, [status], StartDate=MIN(d), EndDate=MAX(d)

    FROM

    (

    SELECT b.OrderID, [status], d

    ,n=d-ROW_NUMBER() OVER (PARTITION BY b.OrderID ORDER BY d)

    FROM Calendar a

    CROSS APPLY

    (

    SELECT OrderID, [status], DateStart, DateEnd

    FROM

    (

    SELECT OrderID, [status], DateStart, DateEnd

    ,rn=ROW_NUMBER() OVER (ORDER BY OrderID DESC)

    FROM #OverlappingIntervals

    WHERE d BETWEEN DateStart AND DateEnd

    ) a

    WHERE rn=1

    ) b

    ) a

    GROUP BY OrderID, [status], n

    ORDER BY n;

    In this case, I only explode the overall interval to one row per date once. Depending on your indexes, this might be a little faster.

    I'd be curious to know what indexing you have on your tables and how each of the solutions posted here (not just mine) do with your real data. Since this may have helped you, I think it is only fair for you to reciprocate.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This has the cheapest cost according to the Execution Plan but that doesn't always mean it's faster:

    WITH IntervalDates AS

    (

    SELECT mind=MIN(DateStart), maxd=MAX(DateEnd)

    FROM #OverlappingIntervals

    ),

    Calendar (d) AS

    (

    SELECT TOP ((SELECT 1+DATEDIFF(day, mind, maxd) FROM IntervalDates))

    DATEADD(day, ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1, mind)

    FROM IntervalDates

    CROSS JOIN sys.all_columns

    )

    SELECT OrderID, [status], StartDate=MIN(d), EndDate=MAX(d)

    FROM

    (

    SELECT b.OrderID, [status], d

    ,n=d-ROW_NUMBER() OVER (PARTITION BY b.OrderID ORDER BY d)

    FROM Calendar a

    CROSS APPLY

    (

    SELECT OrderID=MAX(OrderID)

    FROM #OverlappingIntervals

    WHERE d BETWEEN DateStart AND DateEnd

    ) b

    JOIN #OverlappingIntervals c ON b.OrderID = c.OrderID

    ) a

    GROUP BY OrderID, [status], n

    ORDER BY n;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply