March 20, 2012 at 4:40 pm
I need to generate a report from a dataset with overlapping start and end times. I am tracking utilization of machines. Utilization is categorized as "Active", "Idle", and "Offline". Here is my mockup scenario:
DECLARE @TempSource TABLE(
MachineID INT,
TimeStart DATETIME,
TimeStop DATETIME,
Tag VARCHAR(6)
);
DECLARE @ReportStartTime DATETIME,
@ReportStopTime DATETIME
-- define the reporting period
SET @ReportStopTime = '2012-03-14 00:00:00.000'
SET @ReportStartTime = '2012-03-13 00:00:00.000'
-- report start and end times are used to grab data
-- from two tables and union it together.
-- the resulting data is modeled here
INSERT @TempSource (MachineID, TimeStart, TimeStop, Tag)
SELECT 1, '2012-03-12 12:00:00.000', '2012-03-13 04:00:00.000', 'Active' UNION ALL
SELECT 2, '2012-03-12 20:00:00.000', '2012-03-13 10:00:00.000', 'Active' UNION ALL
SELECT 3, '2012-03-13 05:00:00.000', '2012-03-13 12:00:00.000', 'Idle' UNION ALL
SELECT 1, '2012-03-13 03:30:00.000', '2012-03-13 12:00:00.000', 'Active' UNION ALL
SELECT 3, '2012-03-13 10:00:00.000', '2012-03-13 17:00:00.000', 'Active' UNION ALL
SELECT 1, '2012-03-13 13:00:00.000', '2012-03-13 16:00:00.000', 'Idle' UNION ALL
SELECT 2, '2012-03-13 09:00:00.000', '2012-03-13 15:00:00.000', 'Idle' UNION ALL
SELECT 1, '2012-03-13 16:00:00.000', '2012-03-14 02:00:00.000', 'Active' UNION ALL
SELECT 3, '2012-03-13 17:00:00.000', '2012-03-13 18:00:00.000', 'Idle' UNION ALL
SELECT 3, '2012-03-13 18:00:00.000', NULL, 'Active'
-- clip records which started before the reporting period
UPDATE @TempSource
SET TimeStart = @ReportStartTime
WHERE TimeStart < @ReportStartTime
-- clip records which ended after the reporting period or which have not ended yet
UPDATE @TempSource
SET TimeStop = @ReportStopTime
WHERE TimeStop IS NULL OR TimeStop > @ReportStopTime
This produces a working set which looks like:
--MachineID TimeStart TimeStop Tag
----------- ----------------------- ----------------------- ------
--1 2012-03-13 00:00:00.000 2012-03-13 04:00:00.000 Active
--2 2012-03-13 00:00:00.000 2012-03-13 10:00:00.000 Active
--3 2012-03-13 05:00:00.000 2012-03-13 12:00:00.000 Idle
--1 2012-03-13 03:30:00.000 2012-03-13 12:00:00.000 Active
--3 2012-03-13 10:00:00.000 2012-03-13 17:00:00.000 Active
--1 2012-03-13 13:00:00.000 2012-03-13 16:00:00.000 Idle
--2 2012-03-13 09:00:00.000 2012-03-13 15:00:00.000 Idle
--1 2012-03-13 16:00:00.000 2012-03-14 00:00:00.000 Active
--3 2012-03-13 17:00:00.000 2012-03-13 18:00:00.000 Idle
--3 2012-03-13 18:00:00.000 2012-03-14 00:00:00.000 Active
The problem I'm having at this point is working out how to calculate the overlapping ranges for each individual machine. If an "Active" time overlaps an "Idle" time, then the "Idle" time is clipped to the boundary of the "Active" time. "Offline" time would be a calculated value as: ReportTime - (IdleTime + ActiveTime).
The final result, including format that I need is this:
--MachineID TimeActive TimeIdle TimeOffline
----------- ---------- -------- -----------
--1 20.00.00 03.00.00 01.00.00
--2 10.00.00 05.00.00 09.00.00
--3 13.00.00 06.00.00 05.00.00
Any help or guidance on how to get from my working set to the final version is greatly appreciated.
March 20, 2012 at 5:27 pm
LOVE the way you put down the requirement... As Barney Stinson(from How I Met Your Mother serial) says, Legennnnn.... wait for it... daryy..
Sure others might have also started working on this, but i am on it too.. will post back in sometime...
March 20, 2012 at 5:51 pm
this is a nice problem! unforunately for me, I'm immediately drawn to an iterative approach, successively replacing overlapping pairs with nonoverlapping pairs with adjusted start or end times according to the specs given, once no more overlapping pairs of actives are found, repeat for first one active second one idle. This is probably because I'm not set oriented, but in thinking this through, since its indeterminate how many pairs would be overlapping, wouldn't the replacement approach be limitted to an iterative approach anyways?
So next up how about simply specifying what we want? How about just for the general overlapping case? Maybe this would be selecting the minimum start time, and a maximum endtime who's row containing this maximum endtime didn't also contain a starttime greater than the row with the minimum starttime's endtime? But, this only produces a new minimum starttime with a new maximum endtime that still defines a contiguous timespan? Don't you then have to repeat again using this new time pair? Iterative again!
I would have bailed to perl about an hour ago LOL
March 21, 2012 at 1:45 am
Hey Justin!
I think the following works:
;WITH cteClip AS (
SELECT MachineID
,CASE WHEN Tag = 'Active' THEN TimeStart
WHEN TimeStart < ISNULL((SELECT MAX(t2.TimeStop) FROM @TempSource t2
WHERE t1.MachineID = t2.MachineID and t2.Tag = 'Active' and
t2.TimeStart < t1.TimeStart), t1.TimeStart)
THEN ISNULL((SELECT MIN(t2.TimeStop) FROM @TempSource t2
WHERE t1.MachineID = t2.MachineID and t2.Tag = 'Active' and
t2.TimeStart < t1.TimeStart), t1.TimeStart)
ELSE TimeStart END As TimeStart
,CASE WHEN Tag = 'Active' THEN TimeStop
WHEN TimeStop > ISNULL((SELECT MIN(t2.TimeStart) FROM @TempSource t2
WHERE t1.MachineID = t2.MachineID and t2.Tag = 'Active' and
t2.TimeStop > t1.TimeStop), t1.TimeStop)
THEN ISNULL((SELECT MIN(t2.TimeStart) FROM @TempSource t2
WHERE t1.MachineID = t2.MachineID and t2.Tag = 'Active' and
t2.TimeStop > t1.TimeStop), t1.TimeStop)
ELSE TimeStop END As TimeStop
, Tag
FROM @TempSource t1
),
cteElapsed as (
SELECT MachineID, Tag, SUM(DATEDIFF(second, TimeStart, TimeStop)) As TotalTime
FROM cteClip
GROUP BY MachineID, Tag
),
cteCalcOL AS (
SELECT MachineID
,MAX(CASE WHEN Tag = 'Active' THEN TotalTime ELSE 0 END) As TimeActive
,MAX(CASE WHEN Tag = 'Idle' THEN TotalTime ELSE 0 END) As TimeIdle
FROM cteElapsed
GROUP BY MachineID
),
cteReformat AS (
SELECT MachineID, TimeActive, TimeIdle
,(SELECT DATEDIFF(second, MIN(TimeStart), MAX(TimeStop))
FROM @Tempsource) - (TimeActive + TimeIdle) As TimeOffline
FROM cteCalcOL
)
SELECT MachineID
,RIGHT('00'+CAST(TimeActive/3600 AS VARCHAR), 2) + '.' +
RIGHT('00'+CAST((TimeActive/60)%60 AS VARCHAR), 2) + '.' +
RIGHT('00'+CAST(TimeActive%3600 AS VARCHAR), 2)
AS TimeActive
,RIGHT('00'+CAST(TimeIdle/3600 AS VARCHAR), 2) + '.' +
RIGHT('00'+CAST((TimeIdle/60)%60 AS VARCHAR), 2) + '.' +
RIGHT('00'+CAST(TimeIdle%3600 AS VARCHAR), 2)
AS TimeIdle
,RIGHT('00'+CAST(TimeOffline/3600 AS VARCHAR), 2) + '.' +
RIGHT('00'+CAST((TimeOffline/60)%60 AS VARCHAR), 2) + '.' +
RIGHT('00'+CAST(TimeOffline%3600 AS VARCHAR), 2)
AS TimeOffline
FROM cteReformat
However I don't agree with your results set. Mine is:
MachineIDTimeActiveTimeIdleTimeOffline
120.30.0003.00.0000.30.00
210.00.0005.00.0009.00.00
313.00.0006.00.0005.00.00
Notice how on the second record for MachineID = 1 the time starts at 03:30? That's an active record so I get 4+8.5+8 hours Active. The 3rd (Idle) record for machine 1 gets clipped to 03:30.
Please check me and let me know if this works for you.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
March 21, 2012 at 7:37 am
This gives you just the active and idle time in minutes. You'll have to format this to give HH:MM:SS.
WITH AllTimes AS (
SELECT MachineID, TimeStart AS tm
FROM @TempSource
UNION
SELECT MachineID, TimeStop
FROM @TempSource
WHERE TimeStop IS NOT NULL),
Grped AS (
SELECT a.MachineID,
a.tm,
MIN(t.Tag) AS Tag, -- If Active and Idle if present, use Active
ROW_NUMBER() OVER(PARTITION BY a.MachineID ORDER BY a.tm) AS rn
FROM AllTimes a
LEFT OUTER JOIN @TempSource t ON t.MachineID=a.MachineID
AND a.tm >= t.TimeStart
AND (a.tm < t.TimeStop OR t.TimeStop IS NULL)
GROUP BY a.MachineID,a.tm)
SELECT s.MachineID,
SUM(CASE WHEN s.Tag='Active' THEN DATEDIFF(minute,s.tm,e.tm) ELSE 0 END) AS ActiveMins,
SUM(CASE WHEN s.Tag='Idle' THEN DATEDIFF(minute,s.tm,e.tm) ELSE 0 END) AS IdleMins
FROM Grped s
INNER JOIN Grped e ON e.MachineID=s.MachineID
AND e.rn=s.rn+1
WHERE s.Tag IS NOT NULL
GROUP BY s.MachineID
ORDER BY s.MachineID;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 21, 2012 at 9:53 am
Thanks!! I'd love to see what you can come up with
March 21, 2012 at 9:57 am
I've done this before in c#, but the current requirement is that it be in sql. I would have much preferred to do this in client code.
March 21, 2012 at 10:04 am
Hi Dwayne
Thanks for the post. Your solution looks really good. Your calculation for machineID=1 is a little off though. The first record for machineID=1 and the second record for that same machine overlap by 30 mins. The first one ends at 4:00 and the second one starts at 3:30. So when those to ranges are unioned it would look like this:
Range(00:00, 04:00) UNION Range(03:30, 12:00) = Range(00:00, 12:00)
That half hour overlap would be clipped.
March 21, 2012 at 10:08 am
Hi Mark!!
Thanks so much for this solution!! The formatting is no issue, and the OfflineTime is easily calculated. I'll be trying this out with some live data and see how it performs. My tables consist of just shy of 8 million rows and an average reporting period would fluctuate between 60K - 90K rows.
March 21, 2012 at 10:09 am
jones.justinw (3/21/2012)
Thanks!! I'd love to see what you can come up with
Im still working on.. in the meantime, im in the process of learning C#.. it would be great if you could share your C# solution..
March 21, 2012 at 10:13 am
Another question to you, the start and end of the report times (the 2 vairables u have declared), what will the range be? is it a day or a week or 2 days? how long the range can be?
March 21, 2012 at 10:18 am
ColdCoffee (3/21/2012)
Another question to you, the start and end of the report times (the 2 vairables u have declared), what will the range be? is it a day or a week or 2 days? how long the range can be?
It can be whatever my program manger wants. Typically, this is a weekly report, and the start and end report dates represent a two week rolling window. I left them as variables so that they could be easily modified without having to muck with the rest of the sql.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply