Finding patterns in rows (date ordered)

Question

Finding patterns in rows (date ordered)

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply

greg.bull SSCrazy Points: 2093 More actions · Answer 1

Well, a staff member just didn't have a shift logged on that date - (usually an admin error explains this - someone forgot to enter them). But the thing is that shouldn't break the "episode" if (as an example) sick shifts are either side of this. There's no seven day week thing going on here, episodes can be hundreds of days (long term sick).

I have found some CTE code to generate dates, but not sure this is needed.

I could frig my source data to add in shifts if they are "missing" as an option ?

Would you mind explaining what the "working" line does in the cross apply line (ie is this the trigger for a new episode ?)

Regards, Greg.

PS I often get called Red Bull - we're both fizzy !

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

greg.bull (2/7/2013)
...Would you mind explaining what the "working" line does in the cross apply line (ie is this the trigger for a new episode ?)
Regards, Greg.
PS I often get called Red Bull - we're both fizzy !

In this instance, it's a tool to get around repetition of a chunk of code. Here's the same code without;

SELECT

staffid,

periodstart = MIN(shiftdate),

periodend = MAX(shiftdate),

shiftdesc = MAX(shiftdesc),

shifthours = SUM(shifthours)

FROM (

SELECT staffid, shiftdesc, shiftdate, shifthours, Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END,

n = shiftdate -

DENSE_RANK() OVER (PARTITION BY staffid ORDER BY shiftdate, CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END)

FROM @Shifts

) a

GROUP BY staffid, n, Working

ORDER BY staffid, periodstart

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

greg.bull SSCrazy Points: 2093 More actions · Answer 3

OK best I can think of then is to use a calendar table to plug holes in the data - processing each staffid one by one in a supervisory cursor loop.

Dwain Camps SSC Guru Points: 86948 More actions · Answer 4

greg.bull (2/8/2013)
OK best I can think of then is to use a calendar table to plug holes in the data - processing each staffid one by one in a supervisory cursor loop.

Bull!

I think you're giving up too fast!

Is this closer?

DECLARE @Shifts TABLE

(staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

INSERT INTO @Shifts

SELECT 1, 'Sick', '2012-02-01', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 0

UNION ALL SELECT 1, 'Sick', '2012-02-02', 0 -- Two shifts in a day

UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one.

UNION ALL SELECT 1, 'Working', '2012-02-06', 8

UNION ALL SELECT 1, 'Sick', '2012-02-07', 8

UNION ALL SELECT 2, 'Sick', '2012-02-04', 8

UNION ALL SELECT 3, 'Working', '2012-02-05', 8

;WITH Calendar ([date]) AS (

SELECT DATEADD(day, n, (SELECT MIN(shiftdate) FROM @Shifts))

FROM (

SELECT n=0 UNION ALL

SELECT TOP (SELECT DATEDIFF(day, MIN(shiftdate), MAX(shiftdate)) FROM @Shifts)

ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns) a)

SELECT staffid, shiftdesc=MAX(shiftdesc), shiftstart=MIN(shiftdate), shiftend=MAX(shiftdate)

,TotalHours=SUM(shifthours)

FROM (

SELECT staffid, shiftdesc=MAX(shiftdesc), shiftdate, shifthours=SUM(shifthours)

,n=shiftdate-ROW_NUMBER() OVER (PARTITION BY staffid, MAX(shiftdesc) ORDER BY shiftdate)

FROM (

SELECT staffid

,shiftdesc

,shiftdate

,shifthours

FROM @Shifts

UNION ALL

SELECT staffid, 'Sick', [date], 0 -- Make 0 --> 8 if you want to count the missing days as sick

FROM Calendar

CROSS APPLY (SELECT DISTINCT staffid FROM @Shifts) a

) a

GROUP BY staffid, shiftdate

) b

GROUP BY staffid, n

ORDER BY staffid, shiftstart

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

greg.bull (2/8/2013)
OK best I can think of then is to use a calendar table to plug holes in the data - processing each staffid one by one in a supervisory cursor loop.

Gosh, youngsters these days go belly up sooo quickly! 😀

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

greg.bull SSCrazy Points: 2093 More actions · Answer 6

Well I must say:

1) You guys operate at a level of SQL comprehension way beyond me ! It is (and I use the word without reservation) awesome (PS I am not an American).

2) I did hope, cunningly, that the "threat" of cursors might trigger some kind of reaction from you.

3) I'm not that young. I'm not very good at SQL either, but I do have other talents - although these (sailing, drinking, writing -- Oh God, I can't think of any...) are of limited value when I'm trying to program stuff.

I'm going to give this a good test now. Thanks chaps.

Greg.

Dwain Camps SSC Guru Points: 86948 More actions · Answer 7

greg.bull (2/8/2013)
2) I did hope, cunningly, that the "threat" of cursors might trigger some kind of reaction from you.
Greg.

:w00t::crying::-P:angry:!!!!CURSORS!!!!:w00t::crying::-P:angry:

Grrrrr!!!!!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

greg.bull SSCrazy Points: 2093 More actions · Answer 8

Chaps,

Due to my poor test data, it's not quite there. In fact it's worse, but that's my fault.

What's happening is that all Off (basically placeholders for "nothing's happening here") shifts are being turned into sick shifts. Feel free to rant about the poor test data and let me know if it's getting too complex.

Here's better test data and comments:

DECLARE @Shifts TABLE

(staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

INSERT INTO @Shifts

SELECT 1, 'Sick', '2012-02-01', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 0

UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one. This bit works OK.

UNION ALL SELECT 1, 'Off', '2012-02-06', 0 -- Ignore - it's a filler and doesn't trigger a new episode or extend an previous one.

UNION ALL SELECT 1, 'Working', '2012-02-07', 8 -- There's just one episode here - from the 7-12

UNION ALL SELECT 1, 'Working', '2012-02-08', 8

UNION ALL SELECT 1, 'Off', '2012-02-09', 0 -- See above, these are presently turned into sickness, but are actually part of the working episode.

UNION ALL SELECT 1, 'Working', '2012-02-11', 8 -- Working is from 07 -> 11. Sickness is 01->05. Leave is 13->18.

UNION ALL SELECT 1, 'Off', '2012-02-12', 0 -- this is not part of the Working episode, which starts and ends with Working (but may have Offs or gaps in the middle)

-- Off's don't trigger episode changes. So there's only 3 episodes here

UNION ALL SELECT 1, 'Leave', '2012-02-13', 8

UNION ALL SELECT 1, 'Leave', '2012-02-14', 8

UNION ALL SELECT 1, 'Off', '2012-02-15', 0

UNION ALL SELECT 1, 'Leave', '2012-02-17', 8

UNION ALL SELECT 1, 'Leave', '2012-02-18', 8

greg.bull SSCrazy Points: 2093 More actions · Answer 9

One idea would be for me to delete those Off shifts. The remaining problems may be easier to address:

Whilst for sickness, gaps in days don't trigger new episodes, they do for working, and...

leave shifts are ignored.

Greg

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

It's easy enough to fill in gaps. What you want to do after that is up to you. Here are a couple of ideas.

SET DATEFORMAT YMD

DECLARE @Shifts TABLE

(staffid int, shiftdesc varchar(20), shiftdate datetime, shifthours float)

INSERT INTO @Shifts

SELECT 1, 'Sick', '2012-02-01', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 8

UNION ALL SELECT 1, 'Sick', '2012-02-02', 0

UNION ALL SELECT 1, 'Sick', '2012-02-03', 8

UNION ALL SELECT 1, 'Sick', '2012-02-05', 8 -- We skipped the 4th, but want the period to extend to the 5th, not create a new one. This bit works OK.

UNION ALL SELECT 1, 'Off', '2012-02-06', 0 -- Ignore - it's a filler and doesn't trigger a new episode or extend an previous one.

UNION ALL SELECT 1, 'Working', '2012-02-07', 8 -- There's just one episode here - from the 7-12

UNION ALL SELECT 1, 'Working', '2012-02-08', 8

UNION ALL SELECT 1, 'Off', '2012-02-09', 0 -- See above, these are presently turned into sickness, but are actually part of the working episode.

UNION ALL SELECT 1, 'Working', '2012-02-11', 8 -- Working is from 07 -> 11. Sickness is 01->05. Leave is 13->18.

UNION ALL SELECT 1, 'Off', '2012-02-12', 0 -- this is not part of the Working episode, which starts and ends with Working (but may have Offs or gaps in the middle)

-- Off's don't trigger episode changes. So there's only 3 episodes here

UNION ALL SELECT 1, 'Leave', '2012-02-13', 8

UNION ALL SELECT 1, 'Leave', '2012-02-14', 8

UNION ALL SELECT 1, 'Off', '2012-02-15', 0

UNION ALL SELECT 1, 'Leave', '2012-02-17', 8

UNION ALL SELECT 1, 'Leave', '2012-02-18', 8

-- fill in the gaps

;WITH

e1(n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

e2(n) AS (SELECT 1 FROM e1 a cross join e1 b),

e4(n) AS (SELECT 1 FROM e2 a cross join e2 b),

Tally(n) AS (SELECT -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM e4 a cross join e2 b),

DatawithoutDateGaps AS (

SELECT

ss.staffid,

sh.shiftdesc,

shiftdate = x.Allshiftdate,

shifthours = CASE WHEN shiftdate < x.Allshiftdate THEN 0 ELSE shifthours END,

MissingRow = CASE WHEN shiftdate < x.Allshiftdate THEN 1 ELSE 0 END

FROM ( -- ss get the earliest start and latest end dates per staffid

SELECT staffid, StartJob = MIN(shiftdate), EndJob = MAX(shiftdate)

FROM @Shifts

GROUP BY staffid

) ss

INNER JOIN Tally t ON t.n <= DATEDIFF(dd,ss.StartJob,ss.EndJob)

CROSS APPLY (SELECT Allshiftdate = DATEADD(dd,t.n,ss.StartJob)) x

CROSS APPLY (

SELECT TOP 1 shiftdesc, shiftdate, shifthours

FROM @Shifts sh

WHERE sh.staffid = ss.staffid

AND sh.shiftdate <= x.Allshiftdate

ORDER BY sh.shiftdate DESC

) sh

)

-- run results into a #temp table for convenience

SELECT *

INTO #DatawithoutDateGaps

FROM DatawithoutDateGaps

ORDER BY staffid, shiftdate

-- take a look

SELECT * FROM #DatawithoutDateGaps

-- have a play

SELECT

staffid,

periodstart = MIN(shiftdate),

periodend = MAX(shiftdate),

shiftdesc = MAX(CASE Working WHEN 1 THEN 'Working' ELSE 'Other' END),

shifthours = SUM(shifthours)

FROM (

SELECT staffid, shiftdesc, shiftdate, shifthours, x.Working,

n = shiftdate -

ROW_NUMBER() OVER (PARTITION BY staffid, x.Working ORDER BY shiftdate)

FROM #DatawithoutDateGaps s

CROSS APPLY (SELECT Working = CASE shiftdesc WHEN 'Working' THEN 1 ELSE 0 END) x

) a

GROUP BY staffid, n, Working

ORDER BY staffid, periodstart

-- have another play

SELECT

staffid,

periodstart = MIN(shiftdate),

periodend = MAX(shiftdate),

shiftdesc,

shifthours = SUM(shifthours)

FROM (

SELECT staffid, shiftdesc, shiftdate, shifthours,

n = shiftdate -

ROW_NUMBER() OVER (PARTITION BY staffid, shiftdesc ORDER BY shiftdate)

FROM #DatawithoutDateGaps s

) a

GROUP BY staffid, n, shiftdesc

ORDER BY staffid, periodstart

go

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden