Finding patterns in rows (date ordered)

Question

Finding patterns in rows (date ordered)

greg.bull

SSCrazy

Points: 2093
More actions
February 4, 2013 at 9:36 am

#273769

Dear All,
Here's the background - I have a table of staffids and the shifts/durations worked - with usually 1 shift worked per day. A simple version of the table looks like this:
staffid (int), shiftdate (datetime), shiftdesc (varchar), shifthours (float).
An example pattern of shiftdescs for a staffid could be:
Sick, Sick, Nothing, Sick, Working, Working, Sick.
So I need to hunt through the shifts in date order, and for each staffid to find:
A count of the sickness episodes (unbroken by any other working shift, but we can include "nothing" shifts) - in this case there are 2.
A sum of the hours counted for each episode (sum of shiftlength).
The number of days in each episode - in this case the fiurst eipsode has 4 days, the second is just 1.
The output can be 1 row per episode, which I can then sum up/work out the episode duration etc.
staffid, episodestart, episodeend, totalshifthours.
I am trying to avoid using cursors, and thinking about how to do something column-based. I could do with a few clues on the techniques side.
Regards, Greg.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Cadavre SSC-Forever Points: 41690 More actions · Answer 1

Hello and welcome to SSC,

If you could knock up some sample data and DDL scripts, then your expected results based on the sample data provided that would be extremely useful in allowing people to help you.

Have a read through this link --> http://www.sqlservercentral.com/articles/Best+Practices/61537/%5B/url%5D, if you're unsure how best to lay this information out. Remember that if your sample data and DDL script is readily consumable, then the volunteers for this forum are much more likely to take time out from their work to give you a hand.

Thanks.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Sean Lange SSC Guru Points: 286590 More actions · Answer 2

greg.bull (2/4/2013)
I am trying to avoid using cursors, and thinking about how to do something column-based. I could do with a few clues on the techniques side.
Regards, Greg.

Not to diminish Cadavre's valid point but since you said you want some clues, it sounds like you want to figure it out yourself but don't know where to start. Try reading this article, it explains how to accomplish exactly what you are trying to do. http://www.sqlservercentral.com/articles/T-SQL/71550/[/url]

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

greg.bull SSCrazy Points: 2093 More actions · Answer 3

Well, thanks. I wish I'd been able to find this earlier. As you cleverly noted - I really want to do this myself - so am grateful for the steer.

Regards, Greg.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 4

greg.bull (2/4/2013)
Well, thanks. I wish I'd been able to find this earlier. As you cleverly noted - I really want to do this myself - so am grateful for the steer.
Regards, Greg.

*** SPOILER ALERT!!! ***

In case you're still challenged to make that article work for you (I assume it's Jeff Moden's Grouping Islands of Contiguous Dates article), this should put you over the top.

DECLARE @Shifts TABLE

(staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

INSERT INTO @Shifts

SELECT 1, 'Sick', '2012-02-01', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

UNION ALL SELECT 1, 'Nothing', '2012-02-03', 8

UNION ALL SELECT 1, 'Sick', '2012-02-04', 8

UNION ALL SELECT 1, 'Working', '2012-02-05', 8

UNION ALL SELECT 1, 'Working', '2012-02-06', 8

UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

SELECT staffid, periodstart=MIN(shiftdate), periodend=MAX(shiftdate)

,shiftdesc=MAX(shiftdesc)

,shifthours=SUM(shifthours)

FROM (

SELECT staffid, shiftdesc, shiftdate, shifthours

,n=shiftdate-ROW_NUMBER() OVER (

PARTITION BY staffid,

CASE shiftdesc WHEN 'Working' THEN 0 ELSE 1 END

ORDER BY shiftdate)

FROM @Shifts) a

GROUP BY staffid, n

ORDER BY staffid, periodstart

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

greg.bull SSCrazy Points: 2093 More actions · Answer 5

Golly, thanks for this - saved me a lot of time. I read Jeff's article and was just beginning to understand bits of it when my daughter cam home from school. Which effectively ended my working day. I've modded the code you posted a bit to suit - I didn't want to count hours for all of the shifts. Sooo, a big thank you for taking the time and trouble to gen test data (yes, I should have done this) and for providing a solution that works. Just got to plug it into my main DB, and I suspect it'll run quickly. Regards, Greg.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 6

greg.bull (2/5/2013)
Golly, thanks for this - saved me a lot of time. I read Jeff's article and was just beginning to understand bits of it when my daughter cam home from school. Which effectively ended my working day. I've modded the code you posted a bit to suit - I didn't want to count hours for all of the shifts. Sooo, a big thank you for taking the time and trouble to gen test data (yes, I should have done this) and for providing a solution that works. Just got to plug it into my main DB, and I suspect it'll run quickly. Regards, Greg.

You're welcome!

Actually, I'll confess that I had some issues understanding what was going on in the technique Jeff used in that article at first. Mucked around with it quite a bit before understanding finally dawned. So I like to practice to make sure I haven't forgotten it.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

greg.bull SSCrazy Points: 2093 More actions · Answer 7

I loved your "Hoo-uh" bit - was just watching Black Hawk Down again last night...

greg.bull SSCrazy Points: 2093 More actions · Answer 8

Oh. Got a bit stuck - when there are two shifts for the same person on a day. This forces a new period, as in the test data/code below. have played around with this, but can't see how to accomodate it. The period for staffid 1 should be 01-04. Any ideas ?

DECLARE @Shifts TABLE

(staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

INSERT INTO @Shifts

SELECT 1, 'Sick', '2012-02-01', 8

UNION ALL SELECT 1, 'SickOff', '2012-02-02', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 8 -- Two shifts in a day

UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

UNION ALL SELECT 1, 'SickOff', '2012-02-04', 8

UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

UNION ALL SELECT 3, 'Working', '2012-02-05', 8

UNION ALL SELECT 1, 'Working', '2012-02-06', 8

UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

SELECT staffid, periodstart=MIN(shiftdate), periodend=MAX(shiftdate)

,shiftdesc=MAX(shiftdesc)

,shifthours=SUM(shifthours)

FROM (

SELECT staffid, shiftdesc, shiftdate, shifthours

,n=shiftdate-ROW_NUMBER() OVER (

PARTITION BY staffid,

CASE shiftdesc WHEN 'Working' THEN 0 ELSE 1 END

ORDER BY shiftdate)

FROM @Shifts) a

GROUP BY staffid, n

ORDER BY staffid, periodstart

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

DECLARE @Shifts TABLE

(staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

INSERT INTO @Shifts

SELECT 1, 'Sick', '2012-02-01', 8

UNION ALL SELECT 1, 'SickOff', '2012-02-02', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 8 -- Two shifts in a day

UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

UNION ALL SELECT 1, 'SickOff', '2012-02-04', 8

UNION ALL SELECT 1, 'Working', '2012-02-06', 8

UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

UNION ALL SELECT 3, 'Working', '2012-02-05', 8

SELECT

staffid,

periodstart = MIN(shiftdate),

periodend = MAX(shiftdate),

shiftdesc = MAX(shiftdesc),

shifthours = SUM(shifthours)

FROM (

SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working,

n = shiftdate -

DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, x.Working)

FROM @Shifts

CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x

) a

GROUP BY staffid, n, Working

ORDER BY staffid, periodstart

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Dwain Camps SSC Guru Points: 86978 More actions · Answer 10

Maybe like this?

DECLARE @Shifts TABLE

(staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

INSERT INTO @Shifts

SELECT 1, 'Sick', '2012-02-01', 8

UNION ALL SELECT 1, 'SickOff', '2012-02-02', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 8 -- Two shifts in a day

UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

UNION ALL SELECT 1, 'SickOff', '2012-02-04', 8

UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

UNION ALL SELECT 3, 'Working', '2012-02-05', 8

UNION ALL SELECT 1, 'Working', '2012-02-06', 8

UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

SELECT staffid

, periodstart=MIN(shiftdate)

, periodend=MAX(shiftdate)

,shiftdesc=MAX(shiftdesc)

,shifthours=SUM(shifthours)

FROM (

SELECT staffid, shiftdesc, shiftdate, shifthours

,n=shiftdate-ROW_NUMBER() OVER (

PARTITION BY staffid,

CASE shiftdesc WHEN 'Working' THEN 0 ELSE 1 END

ORDER BY shiftdate)

FROM (

SELECT staffid, shiftdesc=MAX(shiftdesc), shiftdate, shifthours=SUM(ShiftHours)

FROM @Shifts

GROUP BY staffid, shiftdate) a) a

GROUP BY staffid, n

ORDER BY staffid, periodstart

Not sure if this will work properly when there's a WORK and a SICK/SICKOFF on the same day (2 shifts).

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 11

ChrisM@Work (2/6/2013)
DECLARE @Shifts TABLE (staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float) INSERT INTO @Shifts SELECT 1, 'Sick', '2012-02-01', 8 UNION ALL SELECT 1, 'SickOff', '2012-02-02', 8 UNION ALL SELECT 1, 'Sick', '2012-02-02', 8 -- Two shifts in a day UNION ALL SELECT 1, 'Sick', '2012-02-03', 8 UNION ALL SELECT 1, 'SickOff', '2012-02-04', 8 UNION ALL SELECT 1, 'Working', '2012-02-06', 8 UNION ALL SELECT 1, 'Sick', '2012-02-07', 8 UNION ALL SELECT 2, 'Sick', '2012-02-04', 8 UNION ALL SELECT 3, 'Working', '2012-02-05', 8 SELECT staffid, periodstart = MIN(shiftdate), periodend = MAX(shiftdate), shiftdesc = MAX(shiftdesc), shifthours = SUM(shifthours) FROM ( SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working, n = shiftdate - DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, x.Working) FROM @Shifts CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x ) a GROUP BY staffid, n, Working
ORDER BY staffid, periodstart

How did you sneak in there so quickly big boy?

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

dwain.c (2/6/2013)

...
How did you sneak in there so quickly big boy?

Had a few minutes spare before a meeting. See that fish in your avatar? I'd like to fry it for dinner...

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

greg.bull SSCrazy Points: 2093 More actions · Answer 13

Perhaps one minor tweak is in order, which I hope you can help with. If there is no shift on a day, I want the period/episode to continue. Err...

Here's some test data - the missing 4th presently creates a new period. If you have a mo., to take a look I'd be grateful !

DECLARE @Shifts TABLE

(staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

INSERT INTO @Shifts

SELECT 1, 'Sick', '2012-02-01', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 0

UNION ALL SELECT 1, 'Sick', '2012-02-02', 0 -- Two shifts in a day

UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one.

UNION ALL SELECT 1, 'Working', '2012-02-06', 8

UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

UNION ALL SELECT 3, 'Working', '2012-02-05', 8

SELECT

staffid,

periodstart = MIN(shiftdate),

periodend = MAX(shiftdate),

shiftdesc = MAX(shiftdesc),

shifthours = SUM(shifthours)

FROM (

SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working,

n = shiftdate -

DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, x.Working)

FROM @Shifts

CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x

) a

GROUP BY staffid, n, Working

ORDER BY staffid, periodstart

Dwain Camps SSC Guru Points: 86978 More actions · Answer 14

greg.bull (2/6/2013)
Perhaps one minor tweak is in order, which I hope you can help with. If there is no shift on a day, I want the period/episode to continue. Err...
Here's some test data - the missing 4th presently creates a new period. If you have a mo., to take a look I'd be grateful !
DECLARE @Shifts TABLE
(staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)
INSERT INTO @Shifts
SELECT 1, 'Sick', '2012-02-01', 8
UNION ALL SELECT 1, 'Sick', '2012-02-02', 8
UNION ALL SELECT 1, 'Sick', '2012-02-02', 0
UNION ALL SELECT 1, 'Sick', '2012-02-02', 0 -- Two shifts in a day
UNION ALL SELECT 1, 'Sick', '2012-02-03', 8
UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one.
UNION ALL SELECT 1, 'Working', '2012-02-06', 8
UNION ALL SELECT 1, 'Sick', '2012-02-07', 8
UNION ALL SELECT 2, 'Sick', '2012-02-04', 8
UNION ALL SELECT 3, 'Working', '2012-02-05', 8
SELECT
staffid,
periodstart = MIN(shiftdate),
periodend = MAX(shiftdate),
shiftdesc = MAX(shiftdesc),
shifthours = SUM(shifthours)
FROM (
SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working,
n = shiftdate -
DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, x.Working)
FROM @Shifts
CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x
) a
GROUP BY staffid, n, Working
ORDER BY staffid, periodstart

I just love watching requirements evolve! :w00t:

How do you know what skipped days to include? Are you on a 7 day work schedule? Or is it just because a different employee has a shift on that date?

If you need your results to reflect a 7 day work week, it might be best to introduce the use of a Calendar table (you can Google that or search this site). There are many articles to explain it. You don't necessarily have to use a permanent table for this either. You can generate the Calendar in a CTE with minimal effort.

I'll watch for your answer to my questions above, because that would likely help me to decide the best approach to suggest to you.

BTW. Are you any relation to Red (Bull)?

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St