December 14, 2012 at 12:50 am
Morning,
I have a data set that contains employee shift data, when they logged in and logged out of a certain media type.
The problem is that the employee could be logged into multiple media types at any given time.
I need to report the employees total login duration for the day but obviously I cant just sum all the login time due to the concurrent activities it may be 12 hours when in actually real time it was only 7 hours.
For example
DKeyEmployeeIdLoginTimeLogOutTime
850107:13:1307:18:04
950107:13:1307:13:26
1050108:35:5608:36:20
1150108:35:5609:00:00
1250114:08:0818:30:26
1250115:45:2518:30:32
In realtime would actually be this
EmployeeIdLogInTimeLogOutTime
50107:13:1307:18:04
50108:35:5609:00:00
50114:08:0818:30:32
I've done a bit of digging on the internet forums etc and cant seem to crack it.
I've added the DDl below which will help explain it a bit better, the first 2 result sets show it works if the source table only has 1 employee in it but when there is more than 1 it doesn't return the correct results.
Please help
Regards
Steve
CREATE TABLE [Overlapping](
[DKey] [int] IDENTITY(1,1) NOT NULL,
[EmployeeId] [int] NULL,
[LoginTime] [varchar](8) NULL,
[LogOutTime] [varchar](8) NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [Overlapping] ON
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (8, 501 , '07:13:13','07:18:04')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (9, 501 , '07:13:13','07:13:26')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (10, 501 , '08:35:56','08:36:20')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (11, 501 , '08:35:56','09:00:00')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (12, 501 , '14:08:08','18:30:26')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (12, 501 , '15:45:25','18:30:32')
SET IDENTITY_INSERT [Overlapping] OFF
Select * from Overlapping
SELECT
s1.EmployeeId,
s1.LogInTime,
MIN(t1.LogOutTime) AS LogOutTime
FROM Overlapping s1
INNER JOIN Overlapping t1 ON s1.LogInTime <= t1.LogOutTime
AND NOT EXISTS(SELECT * FROM Overlapping t2
WHERE t1.LogOutTime >= t2.LogInTime AND t1.LogOutTime < t2.LogOutTime)
WHERE NOT EXISTS(SELECT * FROM Overlapping s2
WHERE s1.LogInTime > s2.LogInTime AND s1.LogInTime <= s2.LogOutTime)
GROUP BY s1.EmployeeId,s1.LogInTime
ORDER BY s1.LogInTime
Drop table overlapping
CREATE TABLE [Overlapping](
[DKey] [int] IDENTITY(1,1) NOT NULL,
[EmployeeId] [int] NULL,
[LoginTime] [varchar](8) NULL,
[LogOutTime] [varchar](8) NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [Overlapping] ON
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (1, 500 ,'07:47:19', '08:03:05')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (2, 500 ,'07:47:19','09:30:06')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (3, 500 ,'07:47:19','19:55:32')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (4, 500 ,'09:47:08', '11:30:13')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (5, 500 ,'11:45:56','13:14:51')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (6, 500 ,'13:47:38', '14:33:32')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (7, 500 ,'15:45:49', '19:55:27')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (8, 501 , '07:13:13','07:18:04')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (9, 501 , '07:13:13','07:13:26')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (10, 501 , '08:35:56','08:36:20')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (11, 501 , '08:35:56','09:00:00')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (12, 501 , '14:08:08','18:30:26')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (12, 501 , '15:45:25','18:30:32')
SET IDENTITY_INSERT [Overlapping] OFF
Select * from Overlapping
SELECT
s1.EmployeeId,
s1.LogInTime,
MIN(t1.LogOutTime) AS LogOutTime
FROM
Overlapping s1
INNER JOIN Overlapping t1 ON s1.LogInTime <= t1.LogOutTime
And s1.EmployeeId = t1.EmployeeId
AND NOT EXISTS(SELECT * FROM Overlapping t2
WHERE t1.LogOutTime >= t2.LogInTime AND t1.LogOutTime < t2.LogOutTime
Group By EmployeeId)
WHERE NOT EXISTS(SELECT * FROM Overlapping s2
WHERE s1.LogInTime > s2.LogInTime AND s1.LogInTime <= s2.LogOutTime
Group By EmployeeId)
GROUP BY s1.EmployeeId,s1.LogInTime
ORDER BY s1.LogInTime
Drop table overlapping
December 14, 2012 at 2:59 am
Are you familiar with this article by Jeff Moden on grouping islands of contiguous dates?
http://www.sqlservercentral.com/articles/T-SQL/71550/
I believe you can probably use this technique to solve your problem. I confess, I found it somewhat complicated to grasp at first and if I have some available time I may try to apply it directly to your problem.
But I suggest you read it anyway so that even if someone does post a solution that uses it (likely), you understand how it works.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 14, 2012 at 3:27 am
Sorry there my friend. I gave you the wrong link. Fortunately I saved a script with this incredibly elegant technique by SQL guru/MVP Itzik Ben-Gan.
DECLARE @T TABLE
(DKey INT, EmployeeId INT, LoginTime TIME, LogOutTime TIME)
INSERT INTO @T
SELECT 8,501,'07:13:13','07:18:04'
UNION ALL SELECT 9,501,'07:13:13','07:13:26'
UNION ALL SELECT 10,501,'08:35:56','08:36:20'
UNION ALL SELECT 11,501,'08:35:56','09:00:00'
UNION ALL SELECT 12,501,'14:08:08','18:30:26'
UNION ALL SELECT 12,501,'15:45:25','18:30:32'
SELECT *
FROM @T
-- Method by Itzik Ben-Gan
-- http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx
;WITH C1 AS (
SELECT EmployeeId, ts, Type
,e=CASE Type WHEN 1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY EmployeeId, Type ORDER BY LogoutTime) END
,s=CASE Type WHEN -1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY EmployeeId, Type ORDER BY LoginTime) END
FROM @T
CROSS APPLY (
VALUES (1, LoginTime), (-1, LogoutTime)) a(Type, ts)
),
C2 AS (
SELECT C1.*
,se=ROW_NUMBER() OVER (PARTITION BY EmployeeId ORDER BY ts, Type DESC)
FROM C1),
C3 AS (
SELECT EmployeeId, ts
,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY EmployeeId ORDER BY ts)-1) / 2 + 1)
FROM C2
WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0)
SELECT EmployeeId, StartDate=MIN(ts), EndDate=MAX(ts)
FROM C3
GROUP BY EmployeeId, grpnm
The link to his article is in the code.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 14, 2012 at 3:40 am
Wow thanks Dwain (and Itzik Ben-Gan) worked an absoulate treat 🙂
Just to demontrate that it worked with multiple employees I've slightly tweaked it but all in all fantastic, I shall save that forever.
Thanks
Steve
CREATE TABLE [Overlapping](
[DKey] [int] IDENTITY(1,1) NOT NULL,
[EmployeeId] [int] NULL,
[LoginTime] [varchar](8) NULL,
[LogOutTime] [varchar](8) NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [Overlapping] ON
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (1, 500 ,'07:47:19', '08:03:05')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (2, 500 ,'07:47:19','09:30:06')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (3, 500 ,'07:47:19','19:55:32')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (4, 500 ,'09:47:08', '11:30:13')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (5, 500 ,'11:45:56','13:14:51')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (6, 500 ,'13:47:38', '14:33:32')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (7, 500 ,'15:45:49', '19:55:27')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (8, 501 , '07:13:13','07:18:04')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (9, 501 , '07:13:13','07:13:26')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (10, 501 , '08:35:56','08:36:20')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (11, 501 , '08:35:56','09:00:00')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (12, 501 , '14:08:08','18:30:26')
INSERT [Overlapping] ([DKey], [EmployeeId], [LoginTime], [LogOutTime]) VALUES (12, 501 , '15:45:25','18:30:32')
SET IDENTITY_INSERT [Overlapping] OFF
Select * from overlapping
-- Method by Itzik Ben-Gan
-- http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx
;WITH C1 AS (
SELECT EmployeeId, ts, Type
,e=CASE Type WHEN 1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY EmployeeId, Type ORDER BY LogoutTime) END
,s=CASE Type WHEN -1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY EmployeeId, Type ORDER BY LoginTime) END
FROM overlapping
CROSS APPLY (
VALUES (1, LoginTime), (-1, LogoutTime)) a(Type, ts)
),
C2 AS (
SELECT C1.*
,se=ROW_NUMBER() OVER (PARTITION BY EmployeeId ORDER BY ts, Type DESC)
FROM C1),
C3 AS (
SELECT EmployeeId, ts
,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY EmployeeId ORDER BY ts)-1) / 2 + 1)
FROM C2
WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0)
SELECT EmployeeId, StartDate=MIN(ts), EndDate=MAX(ts)
FROM C3
GROUP BY EmployeeId, grpnm
Drop table Overlapping
December 14, 2012 at 3:57 am
I must admit to only having had a quick glance at this, but couldn't you do something like this instead?
SELECT EmployeeId, MIN(LoginTime) AS LoginTime, MAX(LogOutTime) AS LogOutTime
FROM (SELECT DENSE_RANK() OVER(PARTITION BY base.EmployeeId ORDER BY base.LoginTime),
base.EmployeeId, base.LoginTime, overLap.LogOutTime
FROM Overlapping base
INNER JOIN Overlapping overLap ON base.EmployeeId = overLap.EmployeeId
WHERE (overLap.LoginTime > Base.LoginTime AND overLap.LoginTime < base.LogOutTime)
OR (overLap.LoginTime = base.LoginTime AND overLap.LogOutTime > base.LogOutTime)
AND base.DKey <> overLap.DKey
)a(pos,EmployeeId,LoginTime,LogOutTime)
GROUP BY pos, EmployeeId;
Don't know whether or not it'd be any faster. . . could probably do with a performance check but I haven't really got the time to knock up a million row sample test to check.
December 14, 2012 at 4:37 am
Steven - You're welcome and sorry again for sending you the initial red herring, although Jeff's article is also one to read and save.
Cadavre - I tested the Ben-Gan approach against several others (although not one that looked like yours) and it was lightening fast.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 14, 2012 at 6:41 am
dwain.c (12/14/2012)
Cadavre - I tested the Ben-Gan approach against several others (although not one that looked like yours) and it was lightening fast.
Don't suppose you still have the test scripts? Just curious what other implementations there are 🙂
December 14, 2012 at 4:50 pm
Cadavre (12/14/2012)
dwain.c (12/14/2012)
Cadavre - I tested the Ben-Gan approach against several others (although not one that looked like yours) and it was lightening fast.Don't suppose you still have the test scripts? Just curious what other implementations there are 🙂
PM'd you with an answer.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 17, 2012 at 2:43 am
WOW!
Ok so a little more background, the DDL was just a made up example that I thought if cracked would provide the solution to my real dataset.
Thought being the definative word.
The solution above does indeed work for multiple employees but only on a single date.
I began a backdate on Friday and when I looked at the results it all went Pete Tong.
So i've read over your explanation a good 5-6 times and still cant get my head round it (tbh I fell at the first hurdle when you mentioned HULU (wtf is hulu lol) - then I realised as I'm british it's acceptable to not know what hulu is 😉
Anyway to the point, I've implemented your solution this morning and... Bang, the dirt is gone 🙂
Worked a treat.
I had to strip out the media type as I can report on them seperately, my solution was just to get the logged hours for the agent for the day.
Thanks Celko 🙂
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply