October 19, 2011 at 2:35 am
Dear Friends
I have a to make a T-SQL query to display the employee name and login duration.
Explanation with Example:
If the time difference b/w 2 logins is > 3 then we consider it as a separate group
For example a employee abc has loggedin 10 times a day
which were :
10:01 : this is 1 group bcoz the time diff for 10:01 and 10:07 is > 3
now the time will be 3m becoz for each report 3 min view time
10:07
10:09
10:11: this is group 2 for this (10:11+3)-10:07 which is 7m
10:30
10:32
10:34
10:35: this is group 3 (10:35+3)-10:30 time: 8 m
10:50: this is 3m
Finally, we have to add all these groups 3m + 7m + 8m + 3m = 21m, which is total login time for a day.
Scripts to create the sample table:
CREATE TABLE [dbo].[LoginTime](
[ID] [int] NOT NULL,
[EmpName] [varchar](50) NULL,
[LoginTime] [datetime] NULL
)
GO
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (1, N'Steve', CAST(0x00009CF100A511D0 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (2, N'Steve', CAST(0x00009CF100A6B7B0 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (3, N'Steve', CAST(0x00009CF100A74450 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (4, N'Steve', CAST(0x00009CF100A7D0F0 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (5, N'Steve', CAST(0x00009CF100AD08E0 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (6, N'Steve', CAST(0x00009CF100AD9580 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (7, N'Steve', CAST(0x00009CF100AE2220 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (8, N'Steve', CAST(0x00009CF100AE6870 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (9, N'Steve', CAST(0x00009CF100B28720 AS DateTime))
Any inputs on this issue is highly appreciated.
Thanks in advance
Sunny
October 19, 2011 at 6:26 am
You will need to watch boundary conditions, but try something like the following:
WITH EmpLoginTime
AS
(
SELECT EmpName
,ROW_NUMBER() OVER (PARTITION BY EmpName ORDER BY LoginTime) AS EmpOrder
,LoginTime
FROM dbo.logintime
)
,EmpTimes
AS
(
SELECT T1.EmpName, T1.EmpOrder, T1.LoginTime AS StartTime
,CASE
WHEN DATEDIFF(minute, T1.LoginTime, T2.LoginTime) <= 3
THEN T2.LoginTime
ELSE DATEADD(minute, 3, T1.LoginTime)
END AS EndTime
FROM EmpLoginTime T1
LEFT JOIN EmpLoginTime T2
ON T1.EmpName = T2.EmpName
AND T1.EmpOrder = T2.EmpOrder - 1
)
SELECT EmpName
,SUM(DATEDIFF(minute, StartTime, EndTime)) AS LoginMinutes
FROM EmpTimes
GROUP BY EmpName
October 19, 2011 at 7:11 am
Hi Sunny,
This will return a list of periods plus the total (Ken, your query just returns a total). It is broken into very discrete sections at the moment, primarily to show the logic involved. I'm sure this could be condensed somewhat to make a more efficient query.
Note that the rollup functionality is deprecated, so if you're planning to run this as a production job, you might want to use the newer rollup (<field list>) syntax.
-- level one - add a row number to the source data
; with cte as (
select *
, row_number() over (partition by EmpName order by EmpName, LoginTime) as JoinKey
from #LoginTime
)
-- level two - offset join to identify start points
-- note coalesce to avoid dropping the unmatched row from the offset group
, cte2 as (
select cte.JoinKey
, coalesce(cte.EmpName, offset.EmpName) as EmpName
, coalesce(cte.LoginTime, offset.LoginTime) as LoginTime
, isnull(
case
when datediff(mi,isnull(cte.LoginTime,'1 January 2050'),offset.LoginTime) > 3
then 1
else 0
end
, 1) as IsBlockEnd
from cte
left join cte offset
on cte.EmpName = offset.EmpName
and cte.JoinKey = offset.JoinKey - 1
)
--select * from cte2
-- level three - get the corresponding block start
-- need to make sure that the first row is set to 1 also
-- hence use of isnull()
, cte3 as (
select cte2.*
, case
when isnull(rejoin.IsBlockEnd,1) = 1
then 1
else 0
end as IsBlockStart
from cte2
left join cte2 rejoin
on cte2.EmpName = rejoin.EmpName
and cte2.JoinKey = rejoin.JoinKey + 1
)
-- level four - return only the start and end times
, cte4 as (
select cte3.*
, row_number() over (partition by EmpName order by JoinKey) as NewJoinKey
from cte3
where IsBlockEnd = 1
or IsBlockStart = 1
)
-- level five - do the maths
-- again we offset to find the next value
, cte5 as (
select cte4.*
, offset.LoginTime as oLoginTime
, isnull(offset.IsBlockEnd,2) as oIsBlockEnd
, case
when cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 1
or (cte4.IsBlockStart = 1 and isnull(offset.IsBlockEnd,2) = 2) -- catch the last element if it is an isolated time
then 3
when (cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 0)
then datediff(mi,cte4.LoginTime, offset.LoginTime) + 3
else null
end as Interval
from cte4
left join cte4 offset
on cte4.EmpName = offset.EmpName
and cte4.NewJoinKey = offset.NewJoinKey - 1
where cte4.IsBlockStart = 1
or (cte4.IsBlockStart = 0 and isnull(offset.IsBlockEnd,2) = 2)
)
-- query with rollup to get the totals
, cte6 as (
select EmpName
, isnull(convert(varchar(20),LoginTime,120),'Total') as LoginTime
, sum(Interval) as Interval
from cte5
group by EmpName, LoginTime with rollup
)
select * from cte6
where EmpName is not null
order by EmpName, LoginTime
Not sure how efficient this is likely to be. I'd recommend a clustered index on EmpName & LoginTime to speed this up a bit. I reckon there is probably some sort of cool 'quirky update' based solution that might be a better option also, maybe someone else might be able to write one...
Regards, Iain
Edit: DDL for # table used in example + add EmpName to joins (thanks Ken!)
CREATE TABLE #LoginTime(
[ID] [int] NOT NULL,
[EmpName] [varchar](50) NULL,
[LoginTime] [datetime] NULL
)
GO
INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (1, N'Steve', CAST(0x00009CF100A511D0 AS DateTime))
INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (2, N'Steve', CAST(0x00009CF100A6B7B0 AS DateTime))
INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (3, N'Steve', CAST(0x00009CF100A74450 AS DateTime))
INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (4, N'Steve', CAST(0x00009CF100A7D0F0 AS DateTime))
INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (5, N'Steve', CAST(0x00009CF100AD08E0 AS DateTime))
INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (6, N'Steve', CAST(0x00009CF100AD9580 AS DateTime))
INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (7, N'Steve', CAST(0x00009CF100AE2220 AS DateTime))
INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (8, N'Steve', CAST(0x00009CF100AE6870 AS DateTime))
INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (9, N'Steve', CAST(0x00009CF100B28720 AS DateTime))
create clustered index ix_tmp on #LoginTime(EmpName,LoginTime)
October 19, 2011 at 10:10 am
Sunny,
If you want the groups to show then:
1. like Iain, I suspect the quirky update will be quickest.
The following article, by Jeff Moden, gives the details; you should read the discussion as well.
http://www.sqlservercentral.com/Authors/Articles/Jeff_Moden/80567/
2. if you want a set based solution, I would also try using a numbers/tally table.
The following example assumes that a login does not occur over more than one day and uses spt_values for convenience.
WITH EmpDateMinutes
AS
(
SELECT EmpName
,DATEADD(day, DATEDIFF(day, 0, LoginTime), 0) AS LoginDate
,DATEDIFF(minute, DATEADD(day, DATEDIFF(day, 0, LoginTime), 0), LoginTime) AS LoginMinute
FROM #logintime
)
,EmpDateMinuteOrder
AS
(
SELECT EmpName, LoginDate, LoginMinute
,ROW_NUMBER() OVER(PARTITION BY EmpName, LoginDate ORDER BY LoginMinute) AS EmpOrder
FROM EmpDateMinutes
)
,EmpDateLogins
AS
(
SELECT T1.EmpName, T1.LoginDate, T1.LoginMinute AS StartMinute
,CASE
WHEN T2.LoginMinute - T1.LoginMinute <= 3
THEN T2.LoginMinute
ELSE T1.LoginMinute + 3
END AS EndMinute
FROM EmpDateMinuteOrder T1
LEFT JOIN EmpDateMinuteOrder T2
ON T1.EmpName = T2.EmpName
AND T1.LoginDate = T2.LoginDate
AND T1.EmpOrder = T2.EmpOrder - 1
)
,EmpDateGrps
AS
(
SELECT E.EmpName, E.LoginDate, N.number AS LoginMinute
,N.number - ROW_NUMBER() OVER (PARTITION BY E.EmpName, E.LoginDate ORDER BY N.number) AS grp
FROM EmpDateLogins E
JOIN master.dbo.spt_values N
ON N.[type] = 'P'
AND N.number >= E.StartMinute
AND N.number < E.EndMinute
)
,EmpDateGrpRollup
AS
(
SELECT EmpName, LoginDate, grp
,MIN(LoginMinute) AS StartMinute
,MAX(LoginMinute) AS EndMinute
FROM EmpDateGrps
GROUP BY EmpName, LoginDate, grp
)
SELECT EmpName
,DATEADD(minute, StartMinute, LoginDate) AS LoginTime
,EndMinute - StartMinute + 1 AS MinsDuration
,SUM(EndMinute - StartMinute + 1) OVER (PARTITION BY EmpName, LoginDate) AS MinsDayDuration
FROM EmpDateGrpRollup
ORDER BY EmpName, LoginTime;
October 19, 2011 at 11:06 am
Interesting. Never seen spt_values used like that before.
Figured I'd give it a side by side comparison, hope you don't mind 🙂
My revised version (with a few errors revised out and uneccessary stuff removed)
-- level one - add a row number to the source data
; with cte as (
select *
, row_number() over (partition by EmpName order by EmpName, LoginTime) as JoinKey
from #LoginTime
)
-- level two - offset join to identify start points
, cte2 as (
select cte.*
, isnull(
case
when datediff(mi,cte.LoginTime,isnull(offset.LoginTime,'1 January 2050')) > 3
then 1
else 0
end
, 1) as IsBlockEnd
from cte
left join cte offset
on cte.EmpName = offset.EmpName
and cte.JoinKey = offset.JoinKey - 1
)
--select * from cte2 order by EmpName, LoginTime
-- level three - get the corresponding block start
-- need to make sure that the first row is set to 1 also
-- hence use of isnull()
, cte3 as (
select cte2.*
, case
when isnull(rejoin.IsBlockEnd,1) = 1
then 1
else 0
end as IsBlockStart
from cte2
left join cte2 rejoin
on cte2.EmpName = rejoin.EmpName
and cte2.JoinKey = rejoin.JoinKey + 1
)
-- level four - return only the start and end times
-- add anew join key to allow us to offset again in the following cte
, cte4 as (
select cte3.*
, row_number() over (partition by EmpName order by JoinKey) as NewJoinKey
from cte3
where IsBlockEnd = 1
or IsBlockStart = 1
)
-- level five - do the maths
-- again we offset to put start and end values on the same row
, cte5 as (
select cte4.*
, case
when cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 1
then 3
when (cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 0)
then datediff(mi,cte4.LoginTime, offset.LoginTime) + 3
else null
end as Interval
from cte4
left join cte4 offset
on cte4.EmpName = offset.EmpName
and cte4.NewJoinKey = offset.NewJoinKey - 1
where cte4.IsBlockStart = 1
)
-- level six - query with rollup to get the totals
, cte6 as (
select EmpName
, isnull(convert(varchar(20),LoginTime,120),'Total') as LoginTime
, sum(Interval) as Interval
from cte5
group by EmpName, LoginTime with rollup
)
select * from cte6
where EmpName is not null
order by EmpName, LoginTime
The results as follows for #LoginTime row count = 1728 (note that I just copied the existing data in the table, hence the returned row count of 15/18 stays the same for each test):
========== cte only ============
(18 row(s) affected)
Table '#LoginTime'. Scan count 3468, logical reads 92592, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4717 ms, elapsed time = 1348 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
========================================
========== cte w tally ============
(15 row(s) affected)
Table '#LoginTime'. Scan count 10, logical reads 92, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'spt_values'. Scan count 1728, logical reads 13120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 10, logical reads 44, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5814 ms, elapsed time = 6484 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
========================================
So the tally approach is more expensive at low row counts, as a result of all those scans on spt_values. So I bumped up the row counts to 13824:
========== cte only ============
(18 row(s) affected)
Table '#LoginTime'. Scan count 27660, logical reads 3749212, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 294111 ms, elapsed time = 81638 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
========================================
========== cte w tally ============
(15 row(s) affected)
Table '#LoginTime'. Scan count 10, logical reads 178, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'spt_values'. Scan count 13824, logical reads 104960, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 13831, logical reads 37381738, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 125981 ms, elapsed time = 78011 ms.
========================================
Here the tally approach has already overtaken the cte only method as a result of the lower number of rows in spt_values and the 14 clustered index scans on the cte method.
So, in conclusion, Ken's tally method performs better at larger row counts and so you should use that.
Regards, Iain
October 19, 2011 at 11:42 am
Hi Iain,
Thanks for the performance comparison.
We both seem to expect the quirky update to run faster. As it is non-relational I prefer to let people use their own judgement on whether to use it; I have only ever used it for data loads.
I gave the row difference method of obtaining groups, which I think is attributable to Steve Kass of Drew University, as one option the OP may wish to consider. (In reality one would use a number table with a clustered index, instead of spt_values, so even low row counts should be OK.)
If the data set is small, then even a cursor may be adequate.
Regards,
Ken
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply