April 9, 2010 at 12:02 pm
Here is a head scratcher...
Back Story:
Operations has come and informed me that they now allow employees to log in and work in two different systems simultaneously. The problem this causes is that our payroll system needs to calculate hourly wages, but we need to be able to identify any overlapping shifts because the employee does not receive a wage for each system they are logged into. They also do not need to be logged into to both systems, they can also log in and out of either multiple times throughout a give day.
Here is some sample data of my problem and the results I need in the end. I've been plugging away at this but can't seem to generate the results I need. Any help at all is greatly appreciated.
/* Example Data */
CREATE TABLE #ShiftData
(
emp_id INT,
shift_id INT IDENTITY(1,1),
shift_type VARCHAR(20),
start_time DATETIME,
end_time DATETIME,
msgs_handled INT
)
INSERT INTO #ShiftData
(emp_id, shift_type, start_time, end_time, msgs_handled)
VALUES
(111,'A',CAST('2010-04-08 12:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 45)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(111,'B',CAST('2010-04-08 13:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 20)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(111,'A',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 16:00:00' AS DATETIME), 25)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(111,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 100)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(111,'A',CAST('2010-04-08 16:30:00' AS DATETIME), CAST('2010-04-08 17:30:00' AS DATETIME), 30)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(222,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 101)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(222,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 104)
INSERT INTO #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES
(333,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 50)
Desired Results:
emp_idhrs_worked
------------------
1115
2224
3332
April 9, 2010 at 12:42 pm
I would use a calendar table (holding the day values), expand it to cover 24hrs per day (or the possible working hours) and join that to the ShiftData table.
Something like this:
;WITH cte AS
(
SELECT DATEADD(hh,number,DATE) datehr
FROM calendar
CROSS APPLY
(
SELECT TOP 24 number FROM master..spt_values
WHERE TYPE ='P'
) hr
)
SELECT
emp_id,
COUNT(DISTINCT datehr) AS hrs_worked
FROM cte
INNER JOIN #ShiftData
ON datehr>=start_time
AND datehr<end_time
GROUP BY emp_id
Side note: you need to define how you'll handle working periods that do not begin or end at the hour and will be less than one hour. With the code as above a period from 13:00 to 14:59 will be considered as one hr the same like working from 13:59 to 14:01. Please clarify.
April 9, 2010 at 12:57 pm
Thanks for that. I actually came across a post on an Oracle forum that detailed what you mention above. I got up, took a walk, wished I could down a beer or two and returned to my desk and came up with this:
SELECT emp_id, CAST(SUM(DATEDIFF(ss, start_time, end_time)/3600.00) AS DECIMAL(5,2))
FROM
(
SELECT s1.emp_id, MIN(s1.start_time) AS start_time, MAX(s1.end_time) AS end_time
FROM
#ShiftData s1 LEFT OUTER JOIN #ShiftData t1
ON s1.start_time <= t1.end_time
AND NOT EXISTS
(
SELECT * FROM #ShiftData t2
WHERE t1.end_time >= t2.start_time AND t1.end_time < t2.end_time
)
WHERE NOT EXISTS
(
SELECT * FROM #ShiftData s2
WHERE s1.start_time > s2.start_time AND s1.start_time <= s2.end_time
)
GROUP BY s1.start_time, s1.emp_id
) AS T
GROUP BY emp_id
It gives me the crucial part of my desired results. I just need to connect it with the shift data for employees without overlapping or multiple shifts (employee 333 in my example data), but that should be easy.
I am not sure how efficient my code is yet, but at least it is not a cursor like so much of the legacy code I have seen in our systems. I have a to run it through some more testing to ensure it captures all the different scenarios that could occur.
April 9, 2010 at 1:08 pm
Your reply doesn't really answer the question on how to actually calculate the working hours.
Your sample data do make room for some interpretation... Also, your desired result is an integer value (which would require some rounding routines), whereas the second method calculates the working hours with rounding to two decimal places based on the seconds worked. Again, please clarify.
April 9, 2010 at 1:56 pm
Sorry for any confusion. What I am inquiring about is a subset of a larger process and my example data is simplified. I need to calculate hours worked to the second, but then round to a precision of two decimal places when converting to hours.
CAST(SUM(DATEDIFF(ss, start_time, end_time)/3600.00) AS DECIMAL(5,2))
My query above actually failed to accurately calculate the hours worked for emp_id 222, it was off by one hour. Just got back from a meeting and hope to dig in and find out why.
My expected results should be:
emp_id hrs_worked
------------------
111 5.00
222 4.00
333 2.00
April 9, 2010 at 2:13 pm
Here's another version.
Side note: instead of expanding it to 24hrs per day I just used the hours between noon and 10pm. If you need all 24hrs, you'd need to change the hr subquery.
I'm not sure how it's going to perform compared to the other solution you posted (once it returns the expected result).
;WITH cte AS
(
SELECT DATEADD(hh,number,DATE) datehr_from,
DATEADD(hh,number+1,DATE) datehr_to
FROM calendar
CROSS APPLY
(
SELECT TOP 10 number FROM master..spt_values
WHERE TYPE ='P'
and number >11
) hr
where mnth=4
),
cte2 AS
(
SELECT
MAX(
CASE WHEN start_time>datehr_from AND start_time<datehr_to
THEN DATEDIFF(ss,start_time,datehr_to)
WHEN end_time>datehr_from AND end_time<datehr_to
THEN DATEDIFF(ss,datehr_from,end_time)
ELSE 3600 END
) AS dur,
datehr_from,
emp_id
FROM cte
INNER JOIN #ShiftData
ON datehr_to>start_time
AND datehr_from< end_time
GROUP BY emp_id,datehr_from
)
SELECT emp_id,cast(SUM(dur)/3600.00 as decimal(5,2))
FROM cte2
GROUP BY emp_id
April 9, 2010 at 2:26 pm
Sorry did not read your last post (about using seconds) but this will compute total hours worked. I also edited / added some additional data to test for start / end time not ending on an hour, but as some minutes after the hour.
Disregard this solution as pointed out by imu
lmu92 (4/9/2010)
Unfortunately, your solution does lead to pay twice for one hour.
For example emp_id 222:
The hour between 17:00 18:00 is included twice in the sample data (for shift A and for shift B) but only needs to be counted once. Therefore, the result should be 4.25, not 5.25.
SELECT emp_id,shift_type,start_time,end_time FROM #ShiftData ORDER BY emp_id
--==My input
emp_idshift_typestart_timeend_time
111A2010-04-08 12:00:00.0002010-04-08 14:00:00.000
111B2010-04-08 13:00:00.0002010-04-08 14:00:00.000
111A2010-04-08 15:00:00.0002010-04-08 16:00:00.000
111B2010-04-08 15:00:00.0002010-04-08 18:00:00.000
111A2010-04-08 16:30:00.0002010-04-08 17:30:00.000
222B2010-04-08 15:00:00.0002010-04-08 18:00:00.000
222A2010-04-08 17:00:00.0002010-04-08 19:00:00.000
222A2010-04-08 19:05:00.0002010-04-08 19:20:00.000
333A2010-04-08 17:00:00.0002010-04-08 19:00:00.000
444A2010-04-08 17:00:00.0002010-04-08 17:59:00.000
--===
SELECT emp_id,CAST(SUM(({fn HOUR(end_time)}-{fn HOUR(start_time)})) AS DECIMAL(4,2)) +
CAST(SUM({fn MINUTE( end_time )}- {fn MINUTE( start_time )})AS DECIMAL(4,2))/60 AS 'Tottal Hrs Worked'
FROM #ShiftData GROUP BY emp_id
/* results
emp_idTottal Hrs Worked
1118.000000
2225.250000
3332.000000
4440.983333
*/
Note that this will not handle starting and ending on consecutive days
April 9, 2010 at 2:39 pm
@Ron:
Unfortunately, your solution does lead to pay twice for one hour.
For example emp_id 222:
The hour between 17:00 18:00 is included twice in the sample data (for shift A and for shift B) but only needs to be counted once. Therefore, the result should be 4.25, not 5.25.
April 9, 2010 at 2:53 pm
bitbucket,
Your solution is not correct as far as what I'm attempting to accomplish.
Shift_Type A and B denote the two different systems an employee can be logged into (which can overlap, but is not always the case). I need to pay the total hours worked, but that needs to take into consideration these overlaps. When an overlap occurs I do not want to pay the agent the hours accrued in both systems, but simply the hours accrued as if it were one system.
My example data is simplified and this is what the results should be for employee 111:
5 hours worked
Following the shift data I created in my original post for emp 111 this is how the logic should flow.
agent logged into system A (shift_type A) at 12:00:00 and logged off at 14:00:00, however during that time he also logged into system B from 13:00:00 to 14:00:00, therefore he should be paid for two hours.
Then there is a gap of 1 hour and the agent logs back into both system A and B at 15:00:00, logs off A at 16:00:00, but remains on B until 18:00:00, however logs back into system A at 16:30:00 and works until 17:30:00. Therefore, we must pay this agent for another 3 hours.
Totaling 5 hours worked for employee 111 for the day.
April 9, 2010 at 2:54 pm
lmu92 (4/9/2010)
@Ron:Unfortunately, your solution does lead to pay twice for one hour.
For example emp_id 222:
The hour between 17:00 18:00 is included twice in the sample data (for shift A and for shift B) but only needs to be counted once. Therefore, the result should be 4.25, not 5.25.
Whoops and I think that is data I fat fingered.
So back to work for me .... Imu thanks for catching the error
April 11, 2010 at 3:20 am
Assuming that records with later shift_ids for the same employee never start earlier that earlier records (if you see what I mean)...this is a running total problem, solvable using the quirky update:
CREATE TABLE #ShiftData
(
emp_id INT,
shift_id INT IDENTITY(1,1),
shift_type VARCHAR(20),
start_time DATETIME,
end_time DATETIME,
msgs_handled INT
);
INSERT #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES (111,'A',CAST('2010-04-08 11:30:00' AS DATETIME), CAST('2010-04-08 12:30:00' AS DATETIME), 50)
INSERT #ShiftData
(emp_id, shift_type, start_time, end_time, msgs_handled)
VALUES (111,'A',CAST('2010-04-08 12:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 45)
INSERT #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES (111,'B',CAST('2010-04-08 13:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 20)
INSERT #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES (111,'A',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 16:00:00' AS DATETIME), 25)
INSERT #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES (111,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 100)
INSERT #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES (111,'A',CAST('2010-04-08 16:30:00' AS DATETIME), CAST('2010-04-08 18:30:00' AS DATETIME), 30)
INSERT #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES (222,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 101)
INSERT #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES (222,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 104)
INSERT #ShiftData
(emp_id,shift_type,start_time,end_time,msgs_handled)
VALUES (333,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 50)
-- Add a running total column (or transfer the data to a new table with SELECT...INTO)
ALTER TABLE #ShiftData ADD total_minutes INTEGER NOT NULL DEFAULT 0 WITH VALUES;
-- Clustered index required for running total update
CREATE UNIQUE CLUSTERED INDEX c ON #ShiftData (emp_id, shift_id);
-- Variables used by running total update
DECLARE @emp_id INTEGER,
@current_end DATETIME,
@total_minutes INTEGER;
-- Initialise
SET @current_end = '1900-01-01T00:00:00';
SET @emp_id = -1;
SET @total_minutes = 0;
-- Running total update
UPDATE #ShiftData WITH (TABLOCKX)
SET @total_minutes =
total_minutes =
CASE
-- New employee: running total of minutes = length of first shift
WHEN @emp_id <> emp_id THEN DATEDIFF(MINUTE, start_time, end_time)
ELSE
-- Same employee
CASE
-- Add contribution from this shift
-- New segment
WHEN start_time > @current_end THEN @total_minutes + DATEDIFF(MINUTE, start_time, end_time)
-- Extension of current segment
WHEN end_time > @current_end THEN @total_minutes + DATEDIFF(MINUTE, @current_end, end_time)
-- Ignore record completely covered already
ELSE @total_minutes
END
END,
-- Current end of shift
@current_end =
CASE
-- New employee = new end of shift
WHEN @emp_id <> emp_id THEN end_time
-- Later end time
WHEN end_time > @current_end THEN end_time
-- Not later = ignore
ELSE @current_end
END,
-- Remember the current employee
@emp_id = emp_id
OUTPUT inserted.*
OPTION (MAXDOP 1);
-- Final results
SELECT emp_id,
minutes_worked = MAX(total_minutes),
hours_worked = CONVERT(DECIMAL(5,2), MAX(total_minutes) / 60.0)
FROM #ShiftData
GROUP BY
emp_id
ORDER BY
emp_id;
GO
-- Tidy up
DROP TABLE #ShiftData
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
April 12, 2010 at 7:31 am
Thank you, Paul and everyone for your help. I can ensure that the shift start times are sorted properly so this will work for my needs.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply