T-SQL Query

Question

Post reply

T-SQL Query

Sy-1148362

Right there with Babe

Points: 759
More actions
October 19, 2011 at 2:35 am

#259006

Dear Friends
I have a to make a T-SQL query to display the employee name and login duration.
Explanation with Example:
If the time difference b/w 2 logins is > 3 then we consider it as a separate group
For example a employee abc has loggedin 10 times a day
which were :
10:01 : this is 1 group bcoz the time diff for 10:01 and 10:07 is > 3
now the time will be 3m becoz for each report 3 min view time
10:07
10:09
10:11: this is group 2 for this (10:11+3)-10:07 which is 7m
10:30
10:32
10:34
10:35: this is group 3 (10:35+3)-10:30 time: 8 m
10:50: this is 3m
Finally, we have to add all these groups 3m + 7m + 8m + 3m = 21m, which is total login time for a day.
Scripts to create the sample table:
CREATE TABLE [dbo].[LoginTime](
[ID] [int] NOT NULL,
[EmpName] [varchar](50) NULL,
[LoginTime] [datetime] NULL
)
GO
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (1, N'Steve', CAST(0x00009CF100A511D0 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (2, N'Steve', CAST(0x00009CF100A6B7B0 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (3, N'Steve', CAST(0x00009CF100A74450 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (4, N'Steve', CAST(0x00009CF100A7D0F0 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (5, N'Steve', CAST(0x00009CF100AD08E0 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (6, N'Steve', CAST(0x00009CF100AD9580 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (7, N'Steve', CAST(0x00009CF100AE2220 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (8, N'Steve', CAST(0x00009CF100AE6870 AS DateTime))
INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (9, N'Steve', CAST(0x00009CF100B28720 AS DateTime))
Any inputs on this issue is highly appreciated.
Thanks in advance
Sunny

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Ken McKelvey SSCoach Points: 18876 More actions · Answer 1

You will need to watch boundary conditions, but try something like the following:

WITH EmpLoginTime

AS

(

SELECT EmpName

,ROW_NUMBER() OVER (PARTITION BY EmpName ORDER BY LoginTime) AS EmpOrder

,LoginTime

FROM dbo.logintime

)

,EmpTimes

AS

(

SELECT T1.EmpName, T1.EmpOrder, T1.LoginTime AS StartTime

,CASE

WHEN DATEDIFF(minute, T1.LoginTime, T2.LoginTime) <= 3

THEN T2.LoginTime

ELSE DATEADD(minute, 3, T1.LoginTime)

END AS EndTime

FROM EmpLoginTime T1

LEFT JOIN EmpLoginTime T2

ON T1.EmpName = T2.EmpName

AND T1.EmpOrder = T2.EmpOrder - 1

)

SELECT EmpName

,SUM(DATEDIFF(minute, StartTime, EndTime)) AS LoginMinutes

FROM EmpTimes

GROUP BY EmpName

irobertson Hall of Fame Points: 3695 More actions · Answer 2

Hi Sunny,

This will return a list of periods plus the total (Ken, your query just returns a total). It is broken into very discrete sections at the moment, primarily to show the logic involved. I'm sure this could be condensed somewhat to make a more efficient query.

Note that the rollup functionality is deprecated, so if you're planning to run this as a production job, you might want to use the newer rollup (<field list>) syntax.

-- level one - add a row number to the source data

; with cte as (

select *

, row_number() over (partition by EmpName order by EmpName, LoginTime) as JoinKey

from #LoginTime

)

-- level two - offset join to identify start points

-- note coalesce to avoid dropping the unmatched row from the offset group

, cte2 as (

select cte.JoinKey

, coalesce(cte.EmpName, offset.EmpName) as EmpName

, coalesce(cte.LoginTime, offset.LoginTime) as LoginTime

, isnull(

case

when datediff(mi,isnull(cte.LoginTime,'1 January 2050'),offset.LoginTime) > 3

then 1

else 0

end

, 1) as IsBlockEnd

from cte

left join cte offset

on cte.EmpName = offset.EmpName

and cte.JoinKey = offset.JoinKey - 1

)

--select * from cte2

-- level three - get the corresponding block start

-- need to make sure that the first row is set to 1 also

-- hence use of isnull()

, cte3 as (

select cte2.*

, case

when isnull(rejoin.IsBlockEnd,1) = 1

then 1

else 0

end as IsBlockStart

from cte2

left join cte2 rejoin

on cte2.EmpName = rejoin.EmpName

and cte2.JoinKey = rejoin.JoinKey + 1

)

-- level four - return only the start and end times

, cte4 as (

select cte3.*

, row_number() over (partition by EmpName order by JoinKey) as NewJoinKey

from cte3

where IsBlockEnd = 1

or IsBlockStart = 1

)

-- level five - do the maths

-- again we offset to find the next value

, cte5 as (

select cte4.*

, offset.LoginTime as oLoginTime

, isnull(offset.IsBlockEnd,2) as oIsBlockEnd

, case

when cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 1

or (cte4.IsBlockStart = 1 and isnull(offset.IsBlockEnd,2) = 2) -- catch the last element if it is an isolated time

then 3

when (cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 0)

then datediff(mi,cte4.LoginTime, offset.LoginTime) + 3

else null

end as Interval

from cte4

left join cte4 offset

on cte4.EmpName = offset.EmpName

and cte4.NewJoinKey = offset.NewJoinKey - 1

where cte4.IsBlockStart = 1

or (cte4.IsBlockStart = 0 and isnull(offset.IsBlockEnd,2) = 2)

)

-- query with rollup to get the totals

, cte6 as (

select EmpName

, isnull(convert(varchar(20),LoginTime,120),'Total') as LoginTime

, sum(Interval) as Interval

from cte5

group by EmpName, LoginTime with rollup

)

select * from cte6

where EmpName is not null

order by EmpName, LoginTime

Not sure how efficient this is likely to be. I'd recommend a clustered index on EmpName & LoginTime to speed this up a bit. I reckon there is probably some sort of cool 'quirky update' based solution that might be a better option also, maybe someone else might be able to write one...

Regards, Iain

Edit: DDL for # table used in example + add EmpName to joins (thanks Ken!)

CREATE TABLE #LoginTime(

[ID] [int] NOT NULL,

[EmpName] [varchar](50) NULL,

[LoginTime] [datetime] NULL

)

GO

INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (1, N'Steve', CAST(0x00009CF100A511D0 AS DateTime))

INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (2, N'Steve', CAST(0x00009CF100A6B7B0 AS DateTime))

INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (3, N'Steve', CAST(0x00009CF100A74450 AS DateTime))

INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (4, N'Steve', CAST(0x00009CF100A7D0F0 AS DateTime))

INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (5, N'Steve', CAST(0x00009CF100AD08E0 AS DateTime))

INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (6, N'Steve', CAST(0x00009CF100AD9580 AS DateTime))

INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (7, N'Steve', CAST(0x00009CF100AE2220 AS DateTime))

INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (8, N'Steve', CAST(0x00009CF100AE6870 AS DateTime))

INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (9, N'Steve', CAST(0x00009CF100B28720 AS DateTime))

create clustered index ix_tmp on #LoginTime(EmpName,LoginTime)

Ken McKelvey SSCoach Points: 18876 More actions · Answer 3

Sunny,

If you want the groups to show then:

1. like Iain, I suspect the quirky update will be quickest.

The following article, by Jeff Moden, gives the details; you should read the discussion as well.

http://www.sqlservercentral.com/Authors/Articles/Jeff_Moden/80567/

2. if you want a set based solution, I would also try using a numbers/tally table.

The following example assumes that a login does not occur over more than one day and uses spt_values for convenience.

WITH EmpDateMinutes

AS

(

SELECT EmpName

,DATEADD(day, DATEDIFF(day, 0, LoginTime), 0) AS LoginDate

,DATEDIFF(minute, DATEADD(day, DATEDIFF(day, 0, LoginTime), 0), LoginTime) AS LoginMinute

FROM #logintime

)

,EmpDateMinuteOrder

AS

(

SELECT EmpName, LoginDate, LoginMinute

,ROW_NUMBER() OVER(PARTITION BY EmpName, LoginDate ORDER BY LoginMinute) AS EmpOrder

FROM EmpDateMinutes

)

,EmpDateLogins

AS

(

SELECT T1.EmpName, T1.LoginDate, T1.LoginMinute AS StartMinute

,CASE

WHEN T2.LoginMinute - T1.LoginMinute <= 3

THEN T2.LoginMinute

ELSE T1.LoginMinute + 3

END AS EndMinute

FROM EmpDateMinuteOrder T1

LEFT JOIN EmpDateMinuteOrder T2

ON T1.EmpName = T2.EmpName

AND T1.LoginDate = T2.LoginDate

AND T1.EmpOrder = T2.EmpOrder - 1

)

,EmpDateGrps

AS

(

SELECT E.EmpName, E.LoginDate, N.number AS LoginMinute

,N.number - ROW_NUMBER() OVER (PARTITION BY E.EmpName, E.LoginDate ORDER BY N.number) AS grp

FROM EmpDateLogins E

JOIN master.dbo.spt_values N

ON N.[type] = 'P'

AND N.number >= E.StartMinute

AND N.number < E.EndMinute

)

,EmpDateGrpRollup

AS

(

SELECT EmpName, LoginDate, grp

,MIN(LoginMinute) AS StartMinute

,MAX(LoginMinute) AS EndMinute

FROM EmpDateGrps

GROUP BY EmpName, LoginDate, grp

)

SELECT EmpName

,DATEADD(minute, StartMinute, LoginDate) AS LoginTime

,EndMinute - StartMinute + 1 AS MinsDuration

,SUM(EndMinute - StartMinute + 1) OVER (PARTITION BY EmpName, LoginDate) AS MinsDayDuration

FROM EmpDateGrpRollup

ORDER BY EmpName, LoginTime;

irobertson Hall of Fame Points: 3695 More actions · Answer 4

Interesting. Never seen spt_values used like that before.

Figured I'd give it a side by side comparison, hope you don't mind 🙂

My revised version (with a few errors revised out and uneccessary stuff removed)

-- level one - add a row number to the source data

; with cte as (

select *

, row_number() over (partition by EmpName order by EmpName, LoginTime) as JoinKey

from #LoginTime

)

-- level two - offset join to identify start points

, cte2 as (

select cte.*

, isnull(

case

when datediff(mi,cte.LoginTime,isnull(offset.LoginTime,'1 January 2050')) > 3

then 1

else 0

end

, 1) as IsBlockEnd

from cte

left join cte offset

on cte.EmpName = offset.EmpName

and cte.JoinKey = offset.JoinKey - 1

)

--select * from cte2 order by EmpName, LoginTime

-- level three - get the corresponding block start

-- need to make sure that the first row is set to 1 also

-- hence use of isnull()

, cte3 as (

select cte2.*

, case

when isnull(rejoin.IsBlockEnd,1) = 1

then 1

else 0

end as IsBlockStart

from cte2

left join cte2 rejoin

on cte2.EmpName = rejoin.EmpName

and cte2.JoinKey = rejoin.JoinKey + 1

)

-- level four - return only the start and end times

-- add anew join key to allow us to offset again in the following cte

, cte4 as (

select cte3.*

, row_number() over (partition by EmpName order by JoinKey) as NewJoinKey

from cte3

where IsBlockEnd = 1

or IsBlockStart = 1

)

-- level five - do the maths

-- again we offset to put start and end values on the same row

, cte5 as (

select cte4.*

, case

when cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 1

then 3

when (cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 0)

then datediff(mi,cte4.LoginTime, offset.LoginTime) + 3

else null

end as Interval

from cte4

left join cte4 offset

on cte4.EmpName = offset.EmpName

and cte4.NewJoinKey = offset.NewJoinKey - 1

where cte4.IsBlockStart = 1

)

-- level six - query with rollup to get the totals

, cte6 as (

select EmpName

, isnull(convert(varchar(20),LoginTime,120),'Total') as LoginTime

, sum(Interval) as Interval

from cte5

group by EmpName, LoginTime with rollup

)

select * from cte6

where EmpName is not null

order by EmpName, LoginTime

The results as follows for #LoginTime row count = 1728 (note that I just copied the existing data in the table, hence the returned row count of 15/18 stays the same for each test):

========== cte only ============

(18 row(s) affected)

Table '#LoginTime'. Scan count 3468, logical reads 92592, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 4717 ms, elapsed time = 1348 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

========================================

========== cte w tally ============

(15 row(s) affected)

Table '#LoginTime'. Scan count 10, logical reads 92, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'spt_values'. Scan count 1728, logical reads 13120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 10, logical reads 44, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 5814 ms, elapsed time = 6484 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

========================================

So the tally approach is more expensive at low row counts, as a result of all those scans on spt_values. So I bumped up the row counts to 13824:

========== cte only ============

(18 row(s) affected)

Table '#LoginTime'. Scan count 27660, logical reads 3749212, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 294111 ms, elapsed time = 81638 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

========================================

========== cte w tally ============

(15 row(s) affected)

Table '#LoginTime'. Scan count 10, logical reads 178, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'spt_values'. Scan count 13824, logical reads 104960, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 13831, logical reads 37381738, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 125981 ms, elapsed time = 78011 ms.

========================================

Here the tally approach has already overtaken the cte only method as a result of the lower number of rows in spt_values and the 14 clustered index scans on the cte method.

So, in conclusion, Ken's tally method performs better at larger row counts and so you should use that.

Regards, Iain

Ken McKelvey SSCoach Points: 18876 More actions · Answer 5

Hi Iain,

Thanks for the performance comparison.

We both seem to expect the quirky update to run faster. As it is non-relational I prefer to let people use their own judgement on whether to use it; I have only ever used it for data loads.

I gave the row difference method of obtaining groups, which I think is attributable to Steve Kass of Drew University, as one option the OP may wish to consider. (In reality one would use a number table with a clustered index, instead of spt_values, so even low row counts should be OK.)

If the data set is small, then even a cursor may be adequate.

Regards,

Ken