Time intervals with Number of Employees

  • Hello,

    I've written a query that is way to slow and I'm woundering if someone has a smarter (and correct) way to solve my problem and care to share that. Also, my query doesn't produce the perfect outcome, barely good enough.

    What needs to be done is to find how many employees (for each GroupingId) there are at each time during a day, from 00:00 to 23:59 (24 hour time) grouped by min and max times, for every date. If number of employees stays the same the time interval continues.

    The source table has 8 weeks of data and is updated daily (truncate/load)

    Ideally the query is fast enough for a view, that way the daily load can stay the as is. Atm the view is so slow that I have to do it as a separate insert step after the daily update, moving the logic from the view to a proc, because I can't have a view that takes up to 2 minutes to execute.

    In source table each row is allways for one date, so if the work shift continues over midnight it's broken into two parts. But for expected output the time interval can continue over midnight.

    In my query the EndDateTime sometime is the minute before actual EndDateTime which is fine but not optimal.

    It also cuts at midnight, also fine but not optimal.

    Not good is that I get wrong number of Employees if there are the same EndDateTime as StartDateTime (see hour 01:00 and 16:00 from my query)

    Expected output

    GroupingId|StartDateTime|EndDateTime|NumberOfEmployees

    1|2023-10-16 19:00:00|2023-10-17 08:00:00|1

    1|2023-10-17 08:00:00|2023-10-17 10:00:00|2

    1|2023-10-17 10:00:00|2023-10-17 10:30:00|0

    1|2023-10-17 10:30:00|2023-10-17 16:00:00|1

    1|2023-10-17 16:00:00|2023-10-18 00:00:00|2

    Sample data

    create table dbo.TimeShift(
    GroupingId int
    , EmpId int
    , RegDate date
    , StartDateTime datetime2(0)
    , EndDateTime datetime2(0)
    )

    insert into dbo.TimeShift (GroupingId, Empid, RegDate, StartDateTime, EndDateTime) values
    (1, 1, '2023-10-16', '2023-10-16 19:00:00', '2023-10-17 00:00:00'),
    (1, 1, '2023-10-17', '2023-10-17 00:00:00', '2023-10-17 01:00:00'),
    (1, 7, '2023-10-17', '2023-10-17 01:00:00', '2023-10-17 10:00:00'),
    (1, 2, '2023-10-17', '2023-10-17 08:00:00', '2023-10-17 10:00:00'),
    (1, 2, '2023-10-17', '2023-10-17 10:30:00', '2023-10-17 16:00:00'),
    (1, 3, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),
    (1, 4, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00')

    http://sqlfiddle.com/#!18/401a43/6

    My query

    WITH CTE_cal AS(
    SELECT
    DATEADD(
    MINUTE
    , rn
    , CAST((SELECT MIN(RegDate) FROM dbo.TimeShift) AS datetime2(0))
    ) AS CalendarMinuteId
    FROM(
    SELECT TOP (83520) --all minutes between "MIN(RegDate) AND 8 weeks ahead plus one day"
    ROW_NUMBER() OVER(ORDER BY a.object_id) - 1 AS rn
    FROM
    sys.all_objects AS a
    CROSS APPLY
    sys.all_objects AS b
    ) AS n
    ), CTE_Presence_Intervals AS(
    SELECT
    GroupingId
    , RegDate
    , NumberOfEmployees
    , MIN(CalendarMinuteId) AS StartDateTime
    , MAX(CalendarMinuteId) AS EndDateTime
    FROM(
    SELECT
    *
    ,COUNT(InitGroup) OVER (ORDER BY GroupingId, RegDate, CalendarMinuteId ROWS UNBOUNDED PRECEDING) AS Grp
    FROM(
    SELECT
    *
    ,DATEDIFF(MINUTE, LAG(CalendarMinuteId, 1) OVER(PARTITION BY GroupingId ORDER BY CalendarMinuteId), CalendarMinuteId) AS diffwithnextrow
    ,CASE WHEN
    DATEDIFF(MINUTE, LAG(CalendarMinuteId, 1) OVER(PARTITION BY GroupingId, NumberOfEmployees ORDER BY CalendarMinuteId), CalendarMinuteId) IS NULL
    THEN 1
    ELSE CASE
    WHEN DATEDIFF(MINUTE, LAG(CalendarMinuteId, 1) OVER(PARTITION BY GroupingId, NumberOfEmployees ORDER BY CalendarMinuteId), CalendarMinuteId) > 1
    THEN 1
    ELSE NULL
    END
    END AS InitGroup
    FROM(
    SELECT
    GroupingId
    , RegDate
    , CalendarMinuteId
    , COUNT(DISTINCT EmpId) AS NumberOfEmployees
    FROM(
    SELECT --top 3000
    tr.GroupingId
    , tr.RegDate
    , cal.CalendarMinuteId
    , tr.EmpId
    FROM
    dbo.TimeShift AS tr
    LEFT OUTER JOIN
    CTE_cal AS cal
    ON cal.CalendarMinuteId BETWEEN tr.StartDateTime AND tr.EndDateTime
    WHERE 1=1
    GROUP BY GroupingId, tr.RegDate, cal.CalendarMinuteId, tr.EmpId
    ) AS n
    GROUP BY GroupingId, RegDate, CalendarMinuteId
    ) AS y
    ) AS x
    ) AS m
    GROUP BY
    GroupingId
    , RegDate
    , NumberOfEmployees
    , Grp
    )

    SELECT
    *
    FROM
    CTE_Presence_Intervals

    UNION ALL

    SELECT
    GroupingId
    , RegDate
    , 0 AS NumberOfEmployees
    , PreviousEndDateTime AS StartDateTime
    , StartDateTime AS EndDateTime
    FROM(
    SELECT
    *
    , DATEDIFF(MINUTE, LAG(EndDateTime, 1) OVER(PARTITION BY GroupingId ORDER BY StartDateTime), StartDateTime) AS DiffPreviousEndDateTime
    , LAG(EndDateTime, 1) OVER(PARTITION BY GroupingId ORDER BY StartDateTime) AS PreviousEndDateTime
    FROM
    CTE_Presence_Intervals
    ) AS n
    WHERE 1=1
    AND DiffPreviousEndDateTime > 1

    ORDER BY 4

     

    Any help is appreciated!

  • Try something like:

    WITH InOuts
    AS
    (
    SELECT T.GroupingId, X.InOut
    ,CASE
    WHEN X.InOut = 1
    THEN DATEADD(minute, DATEDIFF(minute, '2020', T.StartDateTime), '2020')
    ELSE DATEADD(minute, DATEDIFF(minute, '2020', T.EndDateTime), '2020')
    END AS InOutDate
    FROM dbo.TimeShift T
    CROSS APPLY (VALUES (1), (-1)) X (InOut)
    )
    ,Nums
    AS
    (
    SELECT GroupingId, InOutDate
    ,SUM(InOut) OVER (PARTITION BY GroupingId ORDER BY InOutDate) AS NumOfEmps
    FROM InOuts
    )
    ,Boundaries
    AS
    (
    SELECT GroupingId, InOutDate, NumOfEmps
    ,CASE
    WHEN NumOfEmps <> LAG(NumOfEmps) OVER (PARTITION BY GroupingId ORDER BY InOutDate)
    THEN 1
    ELSE 0
    END AS Boundary
    FROM Nums
    )
    ,Grps
    AS
    (
    SELECT GroupingId, InOutDate, NumOfEmps
    ,SUM(Boundary) OVER (PARTITION BY GroupingId ORDER BY InOutDate) AS Grp
    ,LEAD(InOutDate) OVER (PARTITION BY GroupingId ORDER BY InOutDate) AS NextInOutDate
    FROM Boundaries
    )
    SELECT GroupingId
    ,MIN(InOutDate) AS StartDateTime
    ,MAX(NextInOutDate) AS EndDateTime
    ,MIN(NumOfEmps) AS NumOfEmps
    FROM Grps
    WHERE NextInOutDate IS NOT NULL
    GROUP BY GroupingId, Grp
    HAVING MIN(InOutDate) <> MAX(NextInOutDate)
    ORDER BY GroupingId, StartDateTime;
  • Wow that is so amazing, thank you so much!

    One qustion: whats the purpose of

    THEN DATEADD(minute, DATEDIFF(minute, '2020', T.StartDateTime), '2020')
    ELSE DATEADD(minute, DATEDIFF(minute, '2020', T.EndDateTime), '2020')

    instead of

    THEN T.StartDateTime
    ELSE T.EndDateTime

    Looks like it produces the same result, but guess I'm missing something.

  • Alfred wrote:

    One qustion: whats the purpose of

    THEN DATEADD(minute, DATEDIFF(minute, '2020', T.StartDateTime), '2020')
    ELSE DATEADD(minute, DATEDIFF(minute, '2020', T.EndDateTime), '2020')

    instead of

    THEN T.StartDateTime
    ELSE T.EndDateTime

    Looks like it produces the same result, but guess I'm missing something.

    You implied per minute so this rounds the time down to the minute. As your test data has no seconds the results will be the same.

    You might also like to experiment with making the Nums CTE a SELECT DISTINCT. As DISTINCT is expensive I suspect the number of rows, going into the Boundaries CTE, would need to be less than 15% of Nums without DISTINCT to make it worthwhile.

     

     

  • Thanks for your reply.

    I've tested the query on a larger set of testdata and it's very fast. But will experiment with distinct as well.

  • This gives the same results as Ken's query.  (NOTE: I changed your permanent table to a temp table.)

    WITH InOutStatuses AS
    (
    SELECT ts.GroupingId, i.InOutDate, SUM(i.InOutStatus) AS EmployeeChanges, MAX(i.InOutDate) OVER(PARTITION BY ts.GroupingId) AS EndDate
    FROM #TimeShift AS ts
    CROSS APPLY (VALUES(ts.StartDateTime, 1), (ts.EndDateTime, -1)) AS i(InOutDate, InOutStatus)
    GROUP BY ts.GroupingId, i.InOutDate
    HAVING SUM(i.InOutStatus) <> 0
    )
    SELECT i.GroupingId, i.InOutDate, LEAD(i.InOutDate, 1, i.EndDate) OVER(PARTITION BY i.GroupingId ORDER BY i.InOutDate), SUM(i.EmployeeChanges) OVER(ORDER BY i.InOutDate ROWS UNBOUNDED PRECEDING) AS NumEmployees
    FROM InOutStatuses AS i
    WHERE i.InOutDate < i.EndDate
    ORDER BY i.InOutDate

    Here is a comparison of the reads:

    -- Ken's query
    Table 'Worktable'. Scan count 18, logical reads 146, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table '#TimeShift___000000009E6D'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    -- Drew's query
    Table 'Worktable'. Scan count 3, logical reads 17, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table '#TimeShift___000000009E6D'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Alfred wrote:

    ...

    In source table each row is allways for one date, so if the work shift continues over midnight it's broken into two parts. But for expected output the time interval can continue over midnight.

    In my query the EndDateTime sometime is the minute before actual EndDateTime which is fine but not optimal.

    It also cuts at midnight, also fine but not optimal.

    Imo if the EndDateTime is sometimes the minute before it makes sense to address this issue when the data gets loaded to the table.  Fixing "close but not matching" issues like this in a query is often problematic.  Suppose I just want to fix the interval

     drop TABLE if exists #TimeShift;
    go
    create table #TimeShift(
    GroupingId int
    , EmpId int
    , RegDate date
    , StartDateTime datetime2(0)
    , EndDateTime datetime2(0)
    )

    insert into #TimeShift (GroupingId, Empid, RegDate, StartDateTime, EndDateTime) values
    (1, 1, '2023-10-16', '2023-10-16 19:00:00', '2023-10-17 00:00:00'),
    (1, 1, '2023-10-17', '2023-10-17 00:00:00', '2023-10-17 01:00:00'),
    (1, 7, '2023-10-17', '2023-10-17 01:00:00', '2023-10-17 10:00:00'),
    (1, 2, '2023-10-17', '2023-10-17 08:00:00', '2023-10-17 10:00:00'),
    (1, 2, '2023-10-17', '2023-10-17 10:30:00', '2023-10-17 16:00:00'),
    (1, 3, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),
    (1, 4, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),

    (2, 1, '2023-10-16', '2023-10-16 19:00:00', '2023-10-16 23:59:00'),/* changed */
    (2, 1, '2023-10-17', '2023-10-17 00:00:00', '2023-10-17 01:00:00'),
    (2, 7, '2023-10-17', '2023-10-17 01:00:00', '2023-10-17 10:00:00'),
    (2, 2, '2023-10-17', '2023-10-17 08:00:00', '2023-10-17 10:00:00'),
    (2, 2, '2023-10-17', '2023-10-17 10:30:00', '2023-10-17 16:00:00'),
    (2, 3, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),
    (2, 4, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),

    (3, 1, '2023-10-16', '2023-10-16 19:00:00', '2023-10-17 00:00:00'),
    (3, 1, '2023-10-17', '2023-10-17 00:00:00', '2023-10-17 01:00:00'),
    (3, 7, '2023-10-17', '2023-10-17 01:00:00', '2023-10-17 10:00:00'),
    (3, 2, '2023-10-17', '2023-10-17 08:00:00', '2023-10-17 10:00:00'),
    (3, 2, '2023-10-17', '2023-10-17 10:30:00', '2023-10-17 16:00:00'),
    (3, 3, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),
    (3, 4, '2023-10-17', '2023-10-17 16:00:00', '2023-10-17 23:58:00'); /* changed */

    /* fix the EndDateTime */
    declare @minutes_to_midnight int=3;

    select ts.*, ddif.dtzm, v.FixedEndDateTime
    from #TimeShift ts
    cross apply (values (1440+datediff(minute, cast(ts.EndDateTime as time), cast('00:00:00' as time)))) ddif(dtzm)
    cross apply (values (iif(ddif.dtzm between 1 and @minutes_to_midnight,
    dateadd(minute, ddif.dtzm, ts.EndDateTime),
    ts.EndDateTime))) v(FixedEndDateTime);


    /* fix the interval */
    select ts.*, ddif.dtzm, v.FixedEndDateTime, coalesce(interval.FinalEndDateTime, v.FixedEndDateTime) FinalEndDateTime
    from #TimeShift ts
    cross apply (values (1440+datediff(minute, cast(ts.EndDateTime as time), cast('00:00:00' as time)))) ddif(dtzm)
    cross apply (values (iif(ddif.dtzm between 1 and @minutes_to_midnight,
    dateadd(minute, ddif.dtzm, ts.EndDateTime),
    ts.EndDateTime))) v(FixedEndDateTime)
    outer apply (select tt.EndDateTime
    from #TimeShift tt
    where tt.GroupingId=ts.GroupingId
    and tt.EmpId=ts.EmpId
    and tt.StartDateTime=v.FixedEndDateTime) interval(FinalEndDateTime)

    where not exists (select 1
    from #TimeShift tt
    where tt.GroupingId=ts.GroupingId
    and tt.EmpId=ts.EmpId
    and tt.EndDateTime=ts.StartDateTime); /* should be the FixedEndDateTime */

    My query was going to OUTER APPLY Drew Allen's query to the fixed interval above.  In the NOT EXISTS condition in the WHERE clause the JOIN condition should use the FixedEndDateTime imo

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • drew.allen wrote:

    This gives the same results as Ken's query.  (NOTE: I changed your permanent table to a temp table.)

    WITH InOutStatuses AS
    (
    SELECT ts.GroupingId, i.InOutDate, SUM(i.InOutStatus) AS EmployeeChanges, MAX(i.InOutDate) OVER(PARTITION BY ts.GroupingId) AS EndDate
    FROM #TimeShift AS ts
    CROSS APPLY (VALUES(ts.StartDateTime, 1), (ts.EndDateTime, -1)) AS i(InOutDate, InOutStatus)
    GROUP BY ts.GroupingId, i.InOutDate
    HAVING SUM(i.InOutStatus) <> 0
    )
    SELECT i.GroupingId, i.InOutDate, LEAD(i.InOutDate, 1, i.EndDate) OVER(PARTITION BY i.GroupingId ORDER BY i.InOutDate), SUM(i.EmployeeChanges) OVER(ORDER BY i.InOutDate ROWS UNBOUNDED PRECEDING) AS NumEmployees
    FROM InOutStatuses AS i
    WHERE i.InOutDate < i.EndDate
    ORDER BY i.InOutDate

    Here is a comparison of the reads:

    -- Ken's query
    Table 'Worktable'. Scan count 18, logical reads 146, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table '#TimeShift___000000009E6D'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    -- Drew's query
    Table 'Worktable'. Scan count 3, logical reads 17, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table '#TimeShift___000000009E6D'. Scan count 1, logical reads 1, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    Drew

    Thank you Drew!

     

    Steve Collins wrote:

    Alfred wrote:

    ... In source table each row is allways for one date, so if the work shift continues over midnight it's broken into two parts. But for expected output the time interval can continue over midnight.

    In my query the EndDateTime sometime is the minute before actual EndDateTime which is fine but not optimal.

    It also cuts at midnight, also fine but not optimal.

    Imo if the EndDateTime is sometimes the minute before it makes sense to address this issue when the data gets loaded to the table.  Fixing "close but not matching" issues like this in a query is often problematic.  Suppose I just want to fix the interval

     drop TABLE if exists #TimeShift;
    go
    create table #TimeShift(
    GroupingId int
    , EmpId int
    , RegDate date
    , StartDateTime datetime2(0)
    , EndDateTime datetime2(0)
    )

    insert into #TimeShift (GroupingId, Empid, RegDate, StartDateTime, EndDateTime) values
    (1, 1, '2023-10-16', '2023-10-16 19:00:00', '2023-10-17 00:00:00'),
    (1, 1, '2023-10-17', '2023-10-17 00:00:00', '2023-10-17 01:00:00'),
    (1, 7, '2023-10-17', '2023-10-17 01:00:00', '2023-10-17 10:00:00'),
    (1, 2, '2023-10-17', '2023-10-17 08:00:00', '2023-10-17 10:00:00'),
    (1, 2, '2023-10-17', '2023-10-17 10:30:00', '2023-10-17 16:00:00'),
    (1, 3, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),
    (1, 4, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),

    (2, 1, '2023-10-16', '2023-10-16 19:00:00', '2023-10-16 23:59:00'),/* changed */
    (2, 1, '2023-10-17', '2023-10-17 00:00:00', '2023-10-17 01:00:00'),
    (2, 7, '2023-10-17', '2023-10-17 01:00:00', '2023-10-17 10:00:00'),
    (2, 2, '2023-10-17', '2023-10-17 08:00:00', '2023-10-17 10:00:00'),
    (2, 2, '2023-10-17', '2023-10-17 10:30:00', '2023-10-17 16:00:00'),
    (2, 3, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),
    (2, 4, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),

    (3, 1, '2023-10-16', '2023-10-16 19:00:00', '2023-10-17 00:00:00'),
    (3, 1, '2023-10-17', '2023-10-17 00:00:00', '2023-10-17 01:00:00'),
    (3, 7, '2023-10-17', '2023-10-17 01:00:00', '2023-10-17 10:00:00'),
    (3, 2, '2023-10-17', '2023-10-17 08:00:00', '2023-10-17 10:00:00'),
    (3, 2, '2023-10-17', '2023-10-17 10:30:00', '2023-10-17 16:00:00'),
    (3, 3, '2023-10-17', '2023-10-17 16:00:00', '2023-10-18 00:00:00'),
    (3, 4, '2023-10-17', '2023-10-17 16:00:00', '2023-10-17 23:58:00'); /* changed */

    /* fix the EndDateTime */
    declare @minutes_to_midnight int=3;

    select ts.*, ddif.dtzm, v.FixedEndDateTime
    from #TimeShift ts
    cross apply (values (1440+datediff(minute, cast(ts.EndDateTime as time), cast('00:00:00' as time)))) ddif(dtzm)
    cross apply (values (iif(ddif.dtzm between 1 and @minutes_to_midnight,
    dateadd(minute, ddif.dtzm, ts.EndDateTime),
    ts.EndDateTime))) v(FixedEndDateTime);


    /* fix the interval */
    select ts.*, ddif.dtzm, v.FixedEndDateTime, coalesce(interval.FinalEndDateTime, v.FixedEndDateTime) FinalEndDateTime
    from #TimeShift ts
    cross apply (values (1440+datediff(minute, cast(ts.EndDateTime as time), cast('00:00:00' as time)))) ddif(dtzm)
    cross apply (values (iif(ddif.dtzm between 1 and @minutes_to_midnight,
    dateadd(minute, ddif.dtzm, ts.EndDateTime),
    ts.EndDateTime))) v(FixedEndDateTime)
    outer apply (select tt.EndDateTime
    from #TimeShift tt
    where tt.GroupingId=ts.GroupingId
    and tt.EmpId=ts.EmpId
    and tt.StartDateTime=v.FixedEndDateTime) interval(FinalEndDateTime)

    where not exists (select 1
    from #TimeShift tt
    where tt.GroupingId=ts.GroupingId
    and tt.EmpId=ts.EmpId
    and tt.EndDateTime=ts.StartDateTime); /* should be the FixedEndDateTime */

    My query was going to OUTER APPLY Drew Allen's query to the fixed interval above.  In the NOT EXISTS condition in the WHERE clause the JOIN condition should use the FixedEndDateTime imo

    What I meant by a minute before is the result my bad query produced, not the source data. The source time shift rows can start and end at any minute during the day, never in seconds. And never over midnight. So regardless of type of query it should show same EndDateTime as the actual source row.

    I can't try any of the answers now but will do that first thing tomorrow.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply