Sum Overlapping DateTime Ranges

  • I need to generate a report from a dataset with overlapping start and end times. I am tracking utilization of machines. Utilization is categorized as "Active", "Idle", and "Offline". Here is my mockup scenario:

    DECLARE @TempSource TABLE(

    MachineID INT,

    TimeStart DATETIME,

    TimeStop DATETIME,

    Tag VARCHAR(6)

    );

    DECLARE @ReportStartTime DATETIME,

    @ReportStopTime DATETIME

    -- define the reporting period

    SET @ReportStopTime = '2012-03-14 00:00:00.000'

    SET @ReportStartTime = '2012-03-13 00:00:00.000'

    -- report start and end times are used to grab data

    -- from two tables and union it together.

    -- the resulting data is modeled here

    INSERT @TempSource (MachineID, TimeStart, TimeStop, Tag)

    SELECT 1, '2012-03-12 12:00:00.000', '2012-03-13 04:00:00.000', 'Active' UNION ALL

    SELECT 2, '2012-03-12 20:00:00.000', '2012-03-13 10:00:00.000', 'Active' UNION ALL

    SELECT 3, '2012-03-13 05:00:00.000', '2012-03-13 12:00:00.000', 'Idle' UNION ALL

    SELECT 1, '2012-03-13 03:30:00.000', '2012-03-13 12:00:00.000', 'Active' UNION ALL

    SELECT 3, '2012-03-13 10:00:00.000', '2012-03-13 17:00:00.000', 'Active' UNION ALL

    SELECT 1, '2012-03-13 13:00:00.000', '2012-03-13 16:00:00.000', 'Idle' UNION ALL

    SELECT 2, '2012-03-13 09:00:00.000', '2012-03-13 15:00:00.000', 'Idle' UNION ALL

    SELECT 1, '2012-03-13 16:00:00.000', '2012-03-14 02:00:00.000', 'Active' UNION ALL

    SELECT 3, '2012-03-13 17:00:00.000', '2012-03-13 18:00:00.000', 'Idle' UNION ALL

    SELECT 3, '2012-03-13 18:00:00.000', NULL, 'Active'

    -- clip records which started before the reporting period

    UPDATE @TempSource

    SET TimeStart = @ReportStartTime

    WHERE TimeStart < @ReportStartTime

    -- clip records which ended after the reporting period or which have not ended yet

    UPDATE @TempSource

    SET TimeStop = @ReportStopTime

    WHERE TimeStop IS NULL OR TimeStop > @ReportStopTime

    This produces a working set which looks like:

    --MachineID TimeStart TimeStop Tag

    ----------- ----------------------- ----------------------- ------

    --1 2012-03-13 00:00:00.000 2012-03-13 04:00:00.000 Active

    --2 2012-03-13 00:00:00.000 2012-03-13 10:00:00.000 Active

    --3 2012-03-13 05:00:00.000 2012-03-13 12:00:00.000 Idle

    --1 2012-03-13 03:30:00.000 2012-03-13 12:00:00.000 Active

    --3 2012-03-13 10:00:00.000 2012-03-13 17:00:00.000 Active

    --1 2012-03-13 13:00:00.000 2012-03-13 16:00:00.000 Idle

    --2 2012-03-13 09:00:00.000 2012-03-13 15:00:00.000 Idle

    --1 2012-03-13 16:00:00.000 2012-03-14 00:00:00.000 Active

    --3 2012-03-13 17:00:00.000 2012-03-13 18:00:00.000 Idle

    --3 2012-03-13 18:00:00.000 2012-03-14 00:00:00.000 Active

    The problem I'm having at this point is working out how to calculate the overlapping ranges for each individual machine. If an "Active" time overlaps an "Idle" time, then the "Idle" time is clipped to the boundary of the "Active" time. "Offline" time would be a calculated value as: ReportTime - (IdleTime + ActiveTime).

    The final result, including format that I need is this:

    --MachineID TimeActive TimeIdle TimeOffline

    ----------- ---------- -------- -----------

    --1 20.00.00 03.00.00 01.00.00

    --2 10.00.00 05.00.00 09.00.00

    --3 13.00.00 06.00.00 05.00.00

    Any help or guidance on how to get from my working set to the final version is greatly appreciated.

  • LOVE the way you put down the requirement... As Barney Stinson(from How I Met Your Mother serial) says, Legennnnn.... wait for it... daryy..

    Sure others might have also started working on this, but i am on it too.. will post back in sometime...

  • this is a nice problem! unforunately for me, I'm immediately drawn to an iterative approach, successively replacing overlapping pairs with nonoverlapping pairs with adjusted start or end times according to the specs given, once no more overlapping pairs of actives are found, repeat for first one active second one idle. This is probably because I'm not set oriented, but in thinking this through, since its indeterminate how many pairs would be overlapping, wouldn't the replacement approach be limitted to an iterative approach anyways?

    So next up how about simply specifying what we want? How about just for the general overlapping case? Maybe this would be selecting the minimum start time, and a maximum endtime who's row containing this maximum endtime didn't also contain a starttime greater than the row with the minimum starttime's endtime? But, this only produces a new minimum starttime with a new maximum endtime that still defines a contiguous timespan? Don't you then have to repeat again using this new time pair? Iterative again!

    I would have bailed to perl about an hour ago LOL

  • Hey Justin!

    I think the following works:

    ;WITH cteClip AS (

    SELECT MachineID

    ,CASE WHEN Tag = 'Active' THEN TimeStart

    WHEN TimeStart < ISNULL((SELECT MAX(t2.TimeStop) FROM @TempSource t2

    WHERE t1.MachineID = t2.MachineID and t2.Tag = 'Active' and

    t2.TimeStart < t1.TimeStart), t1.TimeStart)

    THEN ISNULL((SELECT MIN(t2.TimeStop) FROM @TempSource t2

    WHERE t1.MachineID = t2.MachineID and t2.Tag = 'Active' and

    t2.TimeStart < t1.TimeStart), t1.TimeStart)

    ELSE TimeStart END As TimeStart

    ,CASE WHEN Tag = 'Active' THEN TimeStop

    WHEN TimeStop > ISNULL((SELECT MIN(t2.TimeStart) FROM @TempSource t2

    WHERE t1.MachineID = t2.MachineID and t2.Tag = 'Active' and

    t2.TimeStop > t1.TimeStop), t1.TimeStop)

    THEN ISNULL((SELECT MIN(t2.TimeStart) FROM @TempSource t2

    WHERE t1.MachineID = t2.MachineID and t2.Tag = 'Active' and

    t2.TimeStop > t1.TimeStop), t1.TimeStop)

    ELSE TimeStop END As TimeStop

    , Tag

    FROM @TempSource t1

    ),

    cteElapsed as (

    SELECT MachineID, Tag, SUM(DATEDIFF(second, TimeStart, TimeStop)) As TotalTime

    FROM cteClip

    GROUP BY MachineID, Tag

    ),

    cteCalcOL AS (

    SELECT MachineID

    ,MAX(CASE WHEN Tag = 'Active' THEN TotalTime ELSE 0 END) As TimeActive

    ,MAX(CASE WHEN Tag = 'Idle' THEN TotalTime ELSE 0 END) As TimeIdle

    FROM cteElapsed

    GROUP BY MachineID

    ),

    cteReformat AS (

    SELECT MachineID, TimeActive, TimeIdle

    ,(SELECT DATEDIFF(second, MIN(TimeStart), MAX(TimeStop))

    FROM @Tempsource) - (TimeActive + TimeIdle) As TimeOffline

    FROM cteCalcOL

    )

    SELECT MachineID

    ,RIGHT('00'+CAST(TimeActive/3600 AS VARCHAR), 2) + '.' +

    RIGHT('00'+CAST((TimeActive/60)%60 AS VARCHAR), 2) + '.' +

    RIGHT('00'+CAST(TimeActive%3600 AS VARCHAR), 2)

    AS TimeActive

    ,RIGHT('00'+CAST(TimeIdle/3600 AS VARCHAR), 2) + '.' +

    RIGHT('00'+CAST((TimeIdle/60)%60 AS VARCHAR), 2) + '.' +

    RIGHT('00'+CAST(TimeIdle%3600 AS VARCHAR), 2)

    AS TimeIdle

    ,RIGHT('00'+CAST(TimeOffline/3600 AS VARCHAR), 2) + '.' +

    RIGHT('00'+CAST((TimeOffline/60)%60 AS VARCHAR), 2) + '.' +

    RIGHT('00'+CAST(TimeOffline%3600 AS VARCHAR), 2)

    AS TimeOffline

    FROM cteReformat

    However I don't agree with your results set. Mine is:

    MachineIDTimeActiveTimeIdleTimeOffline

    120.30.0003.00.0000.30.00

    210.00.0005.00.0009.00.00

    313.00.0006.00.0005.00.00

    Notice how on the second record for MachineID = 1 the time starts at 03:30? That's an active record so I get 4+8.5+8 hours Active. The 3rd (Idle) record for machine 1 gets clipped to 03:30.

    Please check me and let me know if this works for you.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This gives you just the active and idle time in minutes. You'll have to format this to give HH:MM:SS.

    WITH AllTimes AS (

    SELECT MachineID, TimeStart AS tm

    FROM @TempSource

    UNION

    SELECT MachineID, TimeStop

    FROM @TempSource

    WHERE TimeStop IS NOT NULL),

    Grped AS (

    SELECT a.MachineID,

    a.tm,

    MIN(t.Tag) AS Tag, -- If Active and Idle if present, use Active

    ROW_NUMBER() OVER(PARTITION BY a.MachineID ORDER BY a.tm) AS rn

    FROM AllTimes a

    LEFT OUTER JOIN @TempSource t ON t.MachineID=a.MachineID

    AND a.tm >= t.TimeStart

    AND (a.tm < t.TimeStop OR t.TimeStop IS NULL)

    GROUP BY a.MachineID,a.tm)

    SELECT s.MachineID,

    SUM(CASE WHEN s.Tag='Active' THEN DATEDIFF(minute,s.tm,e.tm) ELSE 0 END) AS ActiveMins,

    SUM(CASE WHEN s.Tag='Idle' THEN DATEDIFF(minute,s.tm,e.tm) ELSE 0 END) AS IdleMins

    FROM Grped s

    INNER JOIN Grped e ON e.MachineID=s.MachineID

    AND e.rn=s.rn+1

    WHERE s.Tag IS NOT NULL

    GROUP BY s.MachineID

    ORDER BY s.MachineID;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thanks!! I'd love to see what you can come up with

  • I've done this before in c#, but the current requirement is that it be in sql. I would have much preferred to do this in client code.

  • Hi Dwayne

    Thanks for the post. Your solution looks really good. Your calculation for machineID=1 is a little off though. The first record for machineID=1 and the second record for that same machine overlap by 30 mins. The first one ends at 4:00 and the second one starts at 3:30. So when those to ranges are unioned it would look like this:

    Range(00:00, 04:00) UNION Range(03:30, 12:00) = Range(00:00, 12:00)

    That half hour overlap would be clipped.

  • Hi Mark!!

    Thanks so much for this solution!! The formatting is no issue, and the OfflineTime is easily calculated. I'll be trying this out with some live data and see how it performs. My tables consist of just shy of 8 million rows and an average reporting period would fluctuate between 60K - 90K rows.

  • jones.justinw (3/21/2012)


    Thanks!! I'd love to see what you can come up with

    Im still working on.. in the meantime, im in the process of learning C#.. it would be great if you could share your C# solution..

  • Another question to you, the start and end of the report times (the 2 vairables u have declared), what will the range be? is it a day or a week or 2 days? how long the range can be?

  • ColdCoffee (3/21/2012)


    Another question to you, the start and end of the report times (the 2 vairables u have declared), what will the range be? is it a day or a week or 2 days? how long the range can be?

    It can be whatever my program manger wants. Typically, this is a weekly report, and the start and end report dates represent a two week rolling window. I left them as variables so that they could be easily modified without having to muck with the rest of the sql.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply