Finding patterns in rows (date ordered)

  • Well, a staff member just didn't have a shift logged on that date - (usually an admin error explains this - someone forgot to enter them). But the thing is that shouldn't break the "episode" if (as an example) sick shifts are either side of this. There's no seven day week thing going on here, episodes can be hundreds of days (long term sick).

    I have found some CTE code to generate dates, but not sure this is needed.

    I could frig my source data to add in shifts if they are "missing" as an option ?

    Would you mind explaining what the "working" line does in the cross apply line (ie is this the trigger for a new episode ?)

    Regards, Greg.

    PS I often get called Red Bull - we're both fizzy !

  • greg.bull (2/7/2013)


    ...Would you mind explaining what the "working" line does in the cross apply line (ie is this the trigger for a new episode ?)

    Regards, Greg.

    PS I often get called Red Bull - we're both fizzy !

    In this instance, it's a tool to get around repetition of a chunk of code. Here's the same code without;

    SELECT

    staffid,

    periodstart = MIN(shiftdate),

    periodend = MAX(shiftdate),

    shiftdesc = MAX(shiftdesc),

    shifthours = SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours, Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END,

    n = shiftdate -

    DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END)

    FROM @Shifts

    ) a

    GROUP BY staffid, n, Working

    ORDER BY staffid, periodstart

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • OK best I can think of then is to use a calendar table to plug holes in the data - processing each staffid one by one in a supervisory cursor loop.

  • greg.bull (2/8/2013)


    OK best I can think of then is to use a calendar table to plug holes in the data - processing each staffid one by one in a supervisory cursor loop.

    Bull!

    I think you're giving up too fast!

    Is this closer?

    DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 0

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 0 -- Two shifts in a day

    UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one.

    UNION ALL SELECT 1, 'Working', '2012-02-06', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

    UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

    UNION ALL SELECT 3, 'Working', '2012-02-05', 8

    ;WITH Calendar ([date]) AS (

    SELECT DATEADD(day, n, (SELECT MIN(shiftdate) FROM @Shifts))

    FROM (

    SELECT n=0 UNION ALL

    SELECT TOP (SELECT DATEDIFF(day, MIN(shiftdate), MAX(shiftdate)) FROM @Shifts)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns) a)

    SELECT staffid, shiftdesc=MAX(shiftdesc), shiftstart=MIN(shiftdate), shiftend=MAX(shiftdate)

    ,TotalHours=SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc=MAX(shiftdesc), shiftdate, shifthours=SUM(shifthours)

    ,n=shiftdate-ROW_NUMBER() OVER (PARTITION BY staffid, MAX(shiftdesc) ORDER BY shiftdate)

    FROM (

    SELECT staffid

    ,shiftdesc

    ,shiftdate

    ,shifthours

    FROM @Shifts

    UNION ALL

    SELECT staffid, 'Sick', [date], 0 -- Make 0 --> 8 if you want to count the missing days as sick

    FROM Calendar

    CROSS APPLY (SELECT DISTINCT staffid FROM @Shifts) a

    ) a

    GROUP BY staffid, shiftdate

    ) b

    GROUP BY staffid, n

    ORDER BY staffid, shiftstart


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • greg.bull (2/8/2013)


    OK best I can think of then is to use a calendar table to plug holes in the data - processing each staffid one by one in a supervisory cursor loop.

    Gosh, youngsters these days go belly up sooo quickly! 😀

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Well I must say:

    1) You guys operate at a level of SQL comprehension way beyond me ! It is (and I use the word without reservation) awesome (PS I am not an American).

    2) I did hope, cunningly, that the "threat" of cursors might trigger some kind of reaction from you.

    3) I'm not that young. I'm not very good at SQL either, but I do have other talents - although these (sailing, drinking, writing -- Oh God, I can't think of any...) are of limited value when I'm trying to program stuff.

    I'm going to give this a good test now. Thanks chaps.

    Greg.

  • greg.bull (2/8/2013)


    2) I did hope, cunningly, that the "threat" of cursors might trigger some kind of reaction from you.

    Greg.

    :w00t::crying::-P:angry:!!!!CURSORS!!!!:w00t::crying::-P:angry:

    Grrrrr!!!!!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Chaps,

    Due to my poor test data, it's not quite there. In fact it's worse, but that's my fault.

    What's happening is that all Off (basically placeholders for "nothing's happening here") shifts are being turned into sick shifts. Feel free to rant about the poor test data and let me know if it's getting too complex.

    Here's better test data and comments:

    DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 0

    UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one. This bit works OK.

    UNION ALL SELECT 1, 'Off', '2012-02-06', 0 -- Ignore - it's a filler and doesn't trigger a new episode or extend an previous one.

    UNION ALL SELECT 1, 'Working', '2012-02-07', 8 -- There's just one episode here - from the 7-12

    UNION ALL SELECT 1, 'Working', '2012-02-08', 8

    UNION ALL SELECT 1, 'Off', '2012-02-09', 0 -- See above, these are presently turned into sickness, but are actually part of the working episode.

    UNION ALL SELECT 1, 'Working', '2012-02-11', 8 -- Working is from 07 -> 11. Sickness is 01->05. Leave is 13->18.

    UNION ALL SELECT 1, 'Off', '2012-02-12', 0 -- this is not part of the Working episode, which starts and ends with Working (but may have Offs or gaps in the middle)

    -- Off's don't trigger episode changes. So there's only 3 episodes here

    UNION ALL SELECT 1, 'Leave', '2012-02-13', 8

    UNION ALL SELECT 1, 'Leave', '2012-02-14', 8

    UNION ALL SELECT 1, 'Off', '2012-02-15', 0

    UNION ALL SELECT 1, 'Leave', '2012-02-17', 8

    UNION ALL SELECT 1, 'Leave', '2012-02-18', 8

  • One idea would be for me to delete those Off shifts. The remaining problems may be easier to address:

    Whilst for sickness, gaps in days don't trigger new episodes, they do for working, and...

    leave shifts are ignored.

    Greg

  • It's easy enough to fill in gaps. What you want to do after that is up to you. Here are a couple of ideas.

    SET DATEFORMAT YMD

    DECLARE @Shifts TABLE

    (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

    INSERT INTO @Shifts

    SELECT 1, 'Sick', '2012-02-01', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-02', 0

    UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

    UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one. This bit works OK.

    UNION ALL SELECT 1, 'Off', '2012-02-06', 0 -- Ignore - it's a filler and doesn't trigger a new episode or extend an previous one.

    UNION ALL SELECT 1, 'Working', '2012-02-07', 8 -- There's just one episode here - from the 7-12

    UNION ALL SELECT 1, 'Working', '2012-02-08', 8

    UNION ALL SELECT 1, 'Off', '2012-02-09', 0 -- See above, these are presently turned into sickness, but are actually part of the working episode.

    UNION ALL SELECT 1, 'Working', '2012-02-11', 8 -- Working is from 07 -> 11. Sickness is 01->05. Leave is 13->18.

    UNION ALL SELECT 1, 'Off', '2012-02-12', 0 -- this is not part of the Working episode, which starts and ends with Working (but may have Offs or gaps in the middle)

    -- Off's don't trigger episode changes. So there's only 3 episodes here

    UNION ALL SELECT 1, 'Leave', '2012-02-13', 8

    UNION ALL SELECT 1, 'Leave', '2012-02-14', 8

    UNION ALL SELECT 1, 'Off', '2012-02-15', 0

    UNION ALL SELECT 1, 'Leave', '2012-02-17', 8

    UNION ALL SELECT 1, 'Leave', '2012-02-18', 8

    -- fill in the gaps

    ;WITH

    e1(n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

    e2(n) AS (SELECT 1 FROM e1 a cross join e1 b),

    e4(n) AS (SELECT 1 FROM e2 a cross join e2 b),

    Tally(n) AS (SELECT -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM e4 a cross join e2 b),

    DatawithoutDateGaps AS (

    SELECT

    ss.staffid,

    sh.shiftdesc,

    shiftdate = x.Allshiftdate,

    shifthours = CASE WHEN shiftdate < x.Allshiftdate THEN 0 ELSE shifthours END,

    MissingRow = CASE WHEN shiftdate < x.Allshiftdate THEN 1 ELSE 0 END

    FROM ( -- ss get the earliest start and latest end dates per staffid

    SELECT staffid, StartJob = MIN(shiftdate), EndJob = MAX(shiftdate)

    FROM @Shifts

    GROUP BY staffid

    ) ss

    INNER JOIN Tally t ON t.n <= DATEDIFF(dd,ss.StartJob,ss.EndJob)

    CROSS APPLY (SELECT Allshiftdate = DATEADD(dd,t.n,ss.StartJob)) x

    CROSS APPLY (

    SELECT TOP 1 shiftdesc, shiftdate, shifthours

    FROM @Shifts sh

    WHERE sh.staffid = ss.staffid

    AND sh.shiftdate <= x.Allshiftdate

    ORDER BY sh.shiftdate DESC

    ) sh

    )

    -- run results into a #temp table for convenience

    SELECT *

    INTO #DatawithoutDateGaps

    FROM DatawithoutDateGaps

    ORDER BY staffid, shiftdate

    -- take a look

    SELECT * FROM #DatawithoutDateGaps

    -- have a play

    SELECT

    staffid,

    periodstart = MIN(shiftdate),

    periodend = MAX(shiftdate),

    shiftdesc = MAX(CASE Working WHEN 1 THEN 'Working' ELSE 'Other' END),

    shifthours = SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working,

    n = shiftdate -

    ROW_NUMBER() OVER (PARTITION BY staffid, x.Working ORDER BY shiftdate)

    FROM #DatawithoutDateGaps s

    CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x

    ) a

    GROUP BY staffid, n, Working

    ORDER BY staffid, periodstart

    -- have another play

    SELECT

    staffid,

    periodstart = MIN(shiftdate),

    periodend = MAX(shiftdate),

    shiftdesc,

    shifthours = SUM(shifthours)

    FROM (

    SELECT staffid, shiftdesc, shiftdate, shifthours,

    n = shiftdate -

    ROW_NUMBER() OVER (PARTITION BY staffid, shiftdesc ORDER BY shiftdate)

    FROM #DatawithoutDateGaps s

    ) a

    GROUP BY staffid, n, shiftdesc

    ORDER BY staffid, periodstart

    go

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply