Overlapping Date Ranges with Gaps Possible

Question

Overlapping Date Ranges with Gaps Possible

datagoat

Ten Centuries

Points: 1264
More actions
April 9, 2010 at 12:02 pm

#221128

Here is a head scratcher...
Back Story:
Operations has come and informed me that they now allow employees to log in and work in two different systems simultaneously. The problem this causes is that our payroll system needs to calculate hourly wages, but we need to be able to identify any overlapping shifts because the employee does not receive a wage for each system they are logged into. They also do not need to be logged into to both systems, they can also log in and out of either multiple times throughout a give day.
Here is some sample data of my problem and the results I need in the end. I've been plugging away at this but can't seem to generate the results I need. Any help at all is greatly appreciated.
/* Example Data */
CREATE TABLE #ShiftData
(
emp_id INT,
shift_id INT IDENTITY(1,1),
shift_type VARCHAR(20),
start_time DATETIME,
end_time DATETIME,
msgs_handled INT
)
INSERT INTO #ShiftData
(emp_id, shift_type, start_time, end_time, msgs_handled)
VALUES
(111,'A',CAST('2010-04-08 12:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 45)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(111,'B',CAST('2010-04-08 13:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 20)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(111,'A',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 16:00:00' AS DATETIME), 25)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(111,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 100)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(111,'A',CAST('2010-04-08 16:30:00' AS DATETIME), CAST('2010-04-08 17:30:00' AS DATETIME), 30)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(222,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 101)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(222,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 104)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(333,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 50)
Desired Results:
emp_idhrs_worked
------------------
1115
2224
3332

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

I would use a calendar table (holding the day values), expand it to cover 24hrs per day (or the possible working hours) and join that to the ShiftData table.

Something like this:

;WITH cte AS

(

SELECT DATEADD(hh,number,DATE) datehr

FROM calendar

CROSS APPLY

(

SELECT TOP 24 number FROM master..spt_values

WHERE TYPE ='P'

) hr

)

SELECT

emp_id,

COUNT(DISTINCT datehr) AS hrs_worked

FROM cte

INNER JOIN #ShiftData

ON datehr>=start_time

AND datehr<end_time

GROUP BY emp_id

Side note: you need to define how you'll handle working periods that do not begin or end at the hour and will be less than one hour. With the code as above a period from 13:00 to 14:59 will be considered as one hr the same like working from 13:59 to 14:01. Please clarify.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

datagoat Ten Centuries Points: 1264 More actions · Answer 2

Thanks for that. I actually came across a post on an Oracle forum that detailed what you mention above. I got up, took a walk, wished I could down a beer or two and returned to my desk and came up with this:

SELECT emp_id, CAST(SUM(DATEDIFF(ss, start_time, end_time)/3600.00) AS DECIMAL(5,2))

FROM

(

SELECT s1.emp_id, MIN(s1.start_time) AS start_time, MAX(s1.end_time) AS end_time

FROM

#ShiftData s1 LEFT OUTER JOIN #ShiftData t1

ON s1.start_time <= t1.end_time

AND NOT EXISTS

(

SELECT * FROM #ShiftData t2

WHERE t1.end_time >= t2.start_time AND t1.end_time < t2.end_time

)

WHERE NOT EXISTS

(

SELECT * FROM #ShiftData s2

WHERE s1.start_time > s2.start_time AND s1.start_time <= s2.end_time

)

GROUP BY s1.start_time, s1.emp_id

) AS T

GROUP BY emp_id

It gives me the crucial part of my desired results. I just need to connect it with the shift data for employees without overlapping or multiple shifts (employee 333 in my example data), but that should be easy.

I am not sure how efficient my code is yet, but at least it is not a cursor like so much of the legacy code I have seen in our systems. I have a to run it through some more testing to ensure it captures all the different scenarios that could occur.

LutzM SSC Guru Points: 107049 More actions · Answer 3

Your reply doesn't really answer the question on how to actually calculate the working hours.

Your sample data do make room for some interpretation... Also, your desired result is an integer value (which would require some rounding routines), whereas the second method calculates the working hours with rounding to two decimal places based on the seconds worked. Again, please clarify.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

datagoat Ten Centuries Points: 1264 More actions · Answer 4

Sorry for any confusion. What I am inquiring about is a subset of a larger process and my example data is simplified. I need to calculate hours worked to the second, but then round to a precision of two decimal places when converting to hours.

CAST(SUM(DATEDIFF(ss, start_time, end_time)/3600.00) AS DECIMAL(5,2))

My query above actually failed to accurately calculate the hours worked for emp_id 222, it was off by one hour. Just got back from a meeting and hope to dig in and find out why.

My expected results should be:

emp_id hrs_worked

------------------

111 5.00

222 4.00

333 2.00

LutzM SSC Guru Points: 107049 More actions · Answer 5

Here's another version.

Side note: instead of expanding it to 24hrs per day I just used the hours between noon and 10pm. If you need all 24hrs, you'd need to change the hr subquery.

I'm not sure how it's going to perform compared to the other solution you posted (once it returns the expected result).

;WITH cte AS

(

SELECT DATEADD(hh,number,DATE) datehr_from,

DATEADD(hh,number+1,DATE) datehr_to

FROM calendar

CROSS APPLY

(

SELECT TOP 10 number FROM master..spt_values

WHERE TYPE ='P'

and number >11

) hr

where mnth=4

),

cte2 AS

(

SELECT

MAX(

CASE WHEN start_time>datehr_from AND start_time<datehr_to

THEN DATEDIFF(ss,start_time,datehr_to)

WHEN end_time>datehr_from AND end_time<datehr_to

THEN DATEDIFF(ss,datehr_from,end_time)

ELSE 3600 END

) AS dur,

datehr_from,

emp_id

FROM cte

INNER JOIN #ShiftData

ON datehr_to>start_time

AND datehr_from< end_time

GROUP BY emp_id,datehr_from

)

SELECT emp_id,cast(SUM(dur)/3600.00 as decimal(5,2))

FROM cte2

GROUP BY emp_id

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Ron McCullough SSC Guru Points: 63877 More actions · Answer 6

Sorry did not read your last post (about using seconds) but this will compute total hours worked. I also edited / added some additional data to test for start / end time not ending on an hour, but as some minutes after the hour.

Disregard this solution as pointed out by imu

lmu92 (4/9/2010)
Unfortunately, your solution does lead to pay twice for one hour.
For example emp_id 222:
The hour between 17:00 18:00 is included twice in the sample data (for shift A and for shift B) but only needs to be counted once. Therefore, the result should be 4.25, not 5.25.

SELECT emp_id,shift_type,start_time,end_time FROM #ShiftData ORDER BY emp_id

--==My input

emp_idshift_typestart_timeend_time

111A2010-04-08 12:00:00.0002010-04-08 14:00:00.000

111B2010-04-08 13:00:00.0002010-04-08 14:00:00.000

111A2010-04-08 15:00:00.0002010-04-08 16:00:00.000

111B2010-04-08 15:00:00.0002010-04-08 18:00:00.000

111A2010-04-08 16:30:00.0002010-04-08 17:30:00.000

222B2010-04-08 15:00:00.0002010-04-08 18:00:00.000

222A2010-04-08 17:00:00.0002010-04-08 19:00:00.000

222A2010-04-08 19:05:00.0002010-04-08 19:20:00.000

333A2010-04-08 17:00:00.0002010-04-08 19:00:00.000

444A2010-04-08 17:00:00.0002010-04-08 17:59:00.000

--===

SELECT emp_id,CAST(SUM(({fn HOUR(end_time)}-{fn HOUR(start_time)})) AS DECIMAL(4,2)) +

CAST(SUM({fn MINUTE( end_time )}- {fn MINUTE( start_time )})AS DECIMAL(4,2))/60 AS 'Tottal Hrs Worked'

FROM #ShiftData GROUP BY emp_id

/* results

emp_idTottal Hrs Worked

1118.000000

2225.250000

3332.000000

4440.983333

*/

Note that this will not handle starting and ending on consecutive days

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

LutzM SSC Guru Points: 107049 More actions · Answer 7

@Ron:

Unfortunately, your solution does lead to pay twice for one hour.

For example emp_id 222:

The hour between 17:00 18:00 is included twice in the sample data (for shift A and for shift B) but only needs to be counted once. Therefore, the result should be 4.25, not 5.25.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

datagoat Ten Centuries Points: 1264 More actions · Answer 8

bitbucket,

Your solution is not correct as far as what I'm attempting to accomplish.

Shift_Type A and B denote the two different systems an employee can be logged into (which can overlap, but is not always the case). I need to pay the total hours worked, but that needs to take into consideration these overlaps. When an overlap occurs I do not want to pay the agent the hours accrued in both systems, but simply the hours accrued as if it were one system.

My example data is simplified and this is what the results should be for employee 111:

5 hours worked

Following the shift data I created in my original post for emp 111 this is how the logic should flow.

agent logged into system A (shift_type A) at 12:00:00 and logged off at 14:00:00, however during that time he also logged into system B from 13:00:00 to 14:00:00, therefore he should be paid for two hours.

Then there is a gap of 1 hour and the agent logs back into both system A and B at 15:00:00, logs off A at 16:00:00, but remains on B until 18:00:00, however logs back into system A at 16:30:00 and works until 17:30:00. Therefore, we must pay this agent for another 3 hours.

Totaling 5 hours worked for employee 111 for the day.

Ron McCullough SSC Guru Points: 63877 More actions · Answer 9

lmu92 (4/9/2010)
@Ron:
Unfortunately, your solution does lead to pay twice for one hour.
For example emp_id 222:
The hour between 17:00 18:00 is included twice in the sample data (for shift A and for shift B) but only needs to be counted once. Therefore, the result should be 4.25, not 5.25.

Whoops and I think that is data I fat fingered.

So back to work for me .... Imu thanks for catching the error

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 10

Assuming that records with later shift_ids for the same employee never start earlier that earlier records (if you see what I mean)...this is a running total problem, solvable using the quirky update:

CREATE TABLE #ShiftData

(

emp_id INT,

shift_id INT IDENTITY(1,1),

shift_type VARCHAR(20),

start_time DATETIME,

end_time DATETIME,

msgs_handled INT

);

INSERT #ShiftData