T-SQL Query

  • Dear Friends

    I have a to make a T-SQL query to display the employee name and login duration.

    Explanation with Example:

    If the time difference b/w 2 logins is > 3 then we consider it as a separate group

    For example a employee abc has loggedin 10 times a day

    which were :

    10:01 : this is 1 group bcoz the time diff for 10:01 and 10:07 is > 3

    now the time will be 3m becoz for each report 3 min view time

    10:07

    10:09

    10:11: this is group 2 for this (10:11+3)-10:07 which is 7m

    10:30

    10:32

    10:34

    10:35: this is group 3 (10:35+3)-10:30 time: 8 m

    10:50: this is 3m

    Finally, we have to add all these groups 3m + 7m + 8m + 3m = 21m, which is total login time for a day.

    Scripts to create the sample table:

    CREATE TABLE [dbo].[LoginTime](

    [ID] [int] NOT NULL,

    [EmpName] [varchar](50) NULL,

    [LoginTime] [datetime] NULL

    )

    GO

    INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (1, N'Steve', CAST(0x00009CF100A511D0 AS DateTime))

    INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (2, N'Steve', CAST(0x00009CF100A6B7B0 AS DateTime))

    INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (3, N'Steve', CAST(0x00009CF100A74450 AS DateTime))

    INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (4, N'Steve', CAST(0x00009CF100A7D0F0 AS DateTime))

    INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (5, N'Steve', CAST(0x00009CF100AD08E0 AS DateTime))

    INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (6, N'Steve', CAST(0x00009CF100AD9580 AS DateTime))

    INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (7, N'Steve', CAST(0x00009CF100AE2220 AS DateTime))

    INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (8, N'Steve', CAST(0x00009CF100AE6870 AS DateTime))

    INSERT [dbo].[LoginTime] ([ID], [EmpName], [LoginTime]) VALUES (9, N'Steve', CAST(0x00009CF100B28720 AS DateTime))

    Any inputs on this issue is highly appreciated.

    Thanks in advance

    Sunny

  • You will need to watch boundary conditions, but try something like the following:

    WITH EmpLoginTime

    AS

    (

    SELECT EmpName

    ,ROW_NUMBER() OVER (PARTITION BY EmpName ORDER BY LoginTime) AS EmpOrder

    ,LoginTime

    FROM dbo.logintime

    )

    ,EmpTimes

    AS

    (

    SELECT T1.EmpName, T1.EmpOrder, T1.LoginTime AS StartTime

    ,CASE

    WHEN DATEDIFF(minute, T1.LoginTime, T2.LoginTime) <= 3

    THEN T2.LoginTime

    ELSE DATEADD(minute, 3, T1.LoginTime)

    END AS EndTime

    FROM EmpLoginTime T1

    LEFT JOIN EmpLoginTime T2

    ON T1.EmpName = T2.EmpName

    AND T1.EmpOrder = T2.EmpOrder - 1

    )

    SELECT EmpName

    ,SUM(DATEDIFF(minute, StartTime, EndTime)) AS LoginMinutes

    FROM EmpTimes

    GROUP BY EmpName

  • Hi Sunny,

    This will return a list of periods plus the total (Ken, your query just returns a total). It is broken into very discrete sections at the moment, primarily to show the logic involved. I'm sure this could be condensed somewhat to make a more efficient query.

    Note that the rollup functionality is deprecated, so if you're planning to run this as a production job, you might want to use the newer rollup (<field list>) syntax.

    -- level one - add a row number to the source data

    ; with cte as (

    select *

    , row_number() over (partition by EmpName order by EmpName, LoginTime) as JoinKey

    from #LoginTime

    )

    -- level two - offset join to identify start points

    -- note coalesce to avoid dropping the unmatched row from the offset group

    , cte2 as (

    select cte.JoinKey

    , coalesce(cte.EmpName, offset.EmpName) as EmpName

    , coalesce(cte.LoginTime, offset.LoginTime) as LoginTime

    , isnull(

    case

    when datediff(mi,isnull(cte.LoginTime,'1 January 2050'),offset.LoginTime) > 3

    then 1

    else 0

    end

    , 1) as IsBlockEnd

    from cte

    left join cte offset

    on cte.EmpName = offset.EmpName

    and cte.JoinKey = offset.JoinKey - 1

    )

    --select * from cte2

    -- level three - get the corresponding block start

    -- need to make sure that the first row is set to 1 also

    -- hence use of isnull()

    , cte3 as (

    select cte2.*

    , case

    when isnull(rejoin.IsBlockEnd,1) = 1

    then 1

    else 0

    end as IsBlockStart

    from cte2

    left join cte2 rejoin

    on cte2.EmpName = rejoin.EmpName

    and cte2.JoinKey = rejoin.JoinKey + 1

    )

    -- level four - return only the start and end times

    , cte4 as (

    select cte3.*

    , row_number() over (partition by EmpName order by JoinKey) as NewJoinKey

    from cte3

    where IsBlockEnd = 1

    or IsBlockStart = 1

    )

    -- level five - do the maths

    -- again we offset to find the next value

    , cte5 as (

    select cte4.*

    , offset.LoginTime as oLoginTime

    , isnull(offset.IsBlockEnd,2) as oIsBlockEnd

    , case

    when cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 1

    or (cte4.IsBlockStart = 1 and isnull(offset.IsBlockEnd,2) = 2) -- catch the last element if it is an isolated time

    then 3

    when (cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 0)

    then datediff(mi,cte4.LoginTime, offset.LoginTime) + 3

    else null

    end as Interval

    from cte4

    left join cte4 offset

    on cte4.EmpName = offset.EmpName

    and cte4.NewJoinKey = offset.NewJoinKey - 1

    where cte4.IsBlockStart = 1

    or (cte4.IsBlockStart = 0 and isnull(offset.IsBlockEnd,2) = 2)

    )

    -- query with rollup to get the totals

    , cte6 as (

    select EmpName

    , isnull(convert(varchar(20),LoginTime,120),'Total') as LoginTime

    , sum(Interval) as Interval

    from cte5

    group by EmpName, LoginTime with rollup

    )

    select * from cte6

    where EmpName is not null

    order by EmpName, LoginTime

    Not sure how efficient this is likely to be. I'd recommend a clustered index on EmpName & LoginTime to speed this up a bit. I reckon there is probably some sort of cool 'quirky update' based solution that might be a better option also, maybe someone else might be able to write one...

    Regards, Iain

    Edit: DDL for # table used in example + add EmpName to joins (thanks Ken!)

    CREATE TABLE #LoginTime(

    [ID] [int] NOT NULL,

    [EmpName] [varchar](50) NULL,

    [LoginTime] [datetime] NULL

    )

    GO

    INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (1, N'Steve', CAST(0x00009CF100A511D0 AS DateTime))

    INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (2, N'Steve', CAST(0x00009CF100A6B7B0 AS DateTime))

    INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (3, N'Steve', CAST(0x00009CF100A74450 AS DateTime))

    INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (4, N'Steve', CAST(0x00009CF100A7D0F0 AS DateTime))

    INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (5, N'Steve', CAST(0x00009CF100AD08E0 AS DateTime))

    INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (6, N'Steve', CAST(0x00009CF100AD9580 AS DateTime))

    INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (7, N'Steve', CAST(0x00009CF100AE2220 AS DateTime))

    INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (8, N'Steve', CAST(0x00009CF100AE6870 AS DateTime))

    INSERT #LoginTime ([ID], [EmpName], [LoginTime]) VALUES (9, N'Steve', CAST(0x00009CF100B28720 AS DateTime))

    create clustered index ix_tmp on #LoginTime(EmpName,LoginTime)

  • Sunny,

    If you want the groups to show then:

    1. like Iain, I suspect the quirky update will be quickest.

    The following article, by Jeff Moden, gives the details; you should read the discussion as well.

    http://www.sqlservercentral.com/Authors/Articles/Jeff_Moden/80567/

    2. if you want a set based solution, I would also try using a numbers/tally table.

    The following example assumes that a login does not occur over more than one day and uses spt_values for convenience.

    WITH EmpDateMinutes

    AS

    (

    SELECT EmpName

    ,DATEADD(day, DATEDIFF(day, 0, LoginTime), 0) AS LoginDate

    ,DATEDIFF(minute, DATEADD(day, DATEDIFF(day, 0, LoginTime), 0), LoginTime) AS LoginMinute

    FROM #logintime

    )

    ,EmpDateMinuteOrder

    AS

    (

    SELECT EmpName, LoginDate, LoginMinute

    ,ROW_NUMBER() OVER(PARTITION BY EmpName, LoginDate ORDER BY LoginMinute) AS EmpOrder

    FROM EmpDateMinutes

    )

    ,EmpDateLogins

    AS

    (

    SELECT T1.EmpName, T1.LoginDate, T1.LoginMinute AS StartMinute

    ,CASE

    WHEN T2.LoginMinute - T1.LoginMinute <= 3

    THEN T2.LoginMinute

    ELSE T1.LoginMinute + 3

    END AS EndMinute

    FROM EmpDateMinuteOrder T1

    LEFT JOIN EmpDateMinuteOrder T2

    ON T1.EmpName = T2.EmpName

    AND T1.LoginDate = T2.LoginDate

    AND T1.EmpOrder = T2.EmpOrder - 1

    )

    ,EmpDateGrps

    AS

    (

    SELECT E.EmpName, E.LoginDate, N.number AS LoginMinute

    ,N.number - ROW_NUMBER() OVER (PARTITION BY E.EmpName, E.LoginDate ORDER BY N.number) AS grp

    FROM EmpDateLogins E

    JOIN master.dbo.spt_values N

    ON N.[type] = 'P'

    AND N.number >= E.StartMinute

    AND N.number < E.EndMinute

    )

    ,EmpDateGrpRollup

    AS

    (

    SELECT EmpName, LoginDate, grp

    ,MIN(LoginMinute) AS StartMinute

    ,MAX(LoginMinute) AS EndMinute

    FROM EmpDateGrps

    GROUP BY EmpName, LoginDate, grp

    )

    SELECT EmpName

    ,DATEADD(minute, StartMinute, LoginDate) AS LoginTime

    ,EndMinute - StartMinute + 1 AS MinsDuration

    ,SUM(EndMinute - StartMinute + 1) OVER (PARTITION BY EmpName, LoginDate) AS MinsDayDuration

    FROM EmpDateGrpRollup

    ORDER BY EmpName, LoginTime;

  • Interesting. Never seen spt_values used like that before.

    Figured I'd give it a side by side comparison, hope you don't mind 🙂

    My revised version (with a few errors revised out and uneccessary stuff removed)

    -- level one - add a row number to the source data

    ; with cte as (

    select *

    , row_number() over (partition by EmpName order by EmpName, LoginTime) as JoinKey

    from #LoginTime

    )

    -- level two - offset join to identify start points

    , cte2 as (

    select cte.*

    , isnull(

    case

    when datediff(mi,cte.LoginTime,isnull(offset.LoginTime,'1 January 2050')) > 3

    then 1

    else 0

    end

    , 1) as IsBlockEnd

    from cte

    left join cte offset

    on cte.EmpName = offset.EmpName

    and cte.JoinKey = offset.JoinKey - 1

    )

    --select * from cte2 order by EmpName, LoginTime

    -- level three - get the corresponding block start

    -- need to make sure that the first row is set to 1 also

    -- hence use of isnull()

    , cte3 as (

    select cte2.*

    , case

    when isnull(rejoin.IsBlockEnd,1) = 1

    then 1

    else 0

    end as IsBlockStart

    from cte2

    left join cte2 rejoin

    on cte2.EmpName = rejoin.EmpName

    and cte2.JoinKey = rejoin.JoinKey + 1

    )

    -- level four - return only the start and end times

    -- add anew join key to allow us to offset again in the following cte

    , cte4 as (

    select cte3.*

    , row_number() over (partition by EmpName order by JoinKey) as NewJoinKey

    from cte3

    where IsBlockEnd = 1

    or IsBlockStart = 1

    )

    -- level five - do the maths

    -- again we offset to put start and end values on the same row

    , cte5 as (

    select cte4.*

    , case

    when cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 1

    then 3

    when (cte4.IsBlockStart = 1 and cte4.IsBlockEnd = 0)

    then datediff(mi,cte4.LoginTime, offset.LoginTime) + 3

    else null

    end as Interval

    from cte4

    left join cte4 offset

    on cte4.EmpName = offset.EmpName

    and cte4.NewJoinKey = offset.NewJoinKey - 1

    where cte4.IsBlockStart = 1

    )

    -- level six - query with rollup to get the totals

    , cte6 as (

    select EmpName

    , isnull(convert(varchar(20),LoginTime,120),'Total') as LoginTime

    , sum(Interval) as Interval

    from cte5

    group by EmpName, LoginTime with rollup

    )

    select * from cte6

    where EmpName is not null

    order by EmpName, LoginTime

    The results as follows for #LoginTime row count = 1728 (note that I just copied the existing data in the table, hence the returned row count of 15/18 stays the same for each test):

    ========== cte only ============

    (18 row(s) affected)

    Table '#LoginTime'. Scan count 3468, logical reads 92592, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4717 ms, elapsed time = 1348 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    ========================================

    ========== cte w tally ============

    (15 row(s) affected)

    Table '#LoginTime'. Scan count 10, logical reads 92, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'spt_values'. Scan count 1728, logical reads 13120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 10, logical reads 44, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 5814 ms, elapsed time = 6484 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    ========================================

    So the tally approach is more expensive at low row counts, as a result of all those scans on spt_values. So I bumped up the row counts to 13824:

    ========== cte only ============

    (18 row(s) affected)

    Table '#LoginTime'. Scan count 27660, logical reads 3749212, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 294111 ms, elapsed time = 81638 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    ========================================

    ========== cte w tally ============

    (15 row(s) affected)

    Table '#LoginTime'. Scan count 10, logical reads 178, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'spt_values'. Scan count 13824, logical reads 104960, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 13831, logical reads 37381738, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 125981 ms, elapsed time = 78011 ms.

    ========================================

    Here the tally approach has already overtaken the cte only method as a result of the lower number of rows in spt_values and the 14 clustered index scans on the cte method.

    So, in conclusion, Ken's tally method performs better at larger row counts and so you should use that.

    Regards, Iain

  • Hi Iain,

    Thanks for the performance comparison.

    We both seem to expect the quirky update to run faster. As it is non-relational I prefer to let people use their own judgement on whether to use it; I have only ever used it for data loads.

    I gave the row difference method of obtaining groups, which I think is attributable to Steve Kass of Drew University, as one option the OP may wish to consider. (In reality one would use a number table with a clustered index, instead of spt_values, so even low row counts should be OK.)

    If the data set is small, then even a cursor may be adequate.

    Regards,

    Ken

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply