December 6, 2010 at 9:36 pm
Comments posted to this topic are about the item Group Islands of Contiguous Dates (SQL Spackle)
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2010 at 10:51 pm
Sorry, I'm dumb. But I completely don't understand how the "test data" dates at the start of the article make any sense without the accompanying comments. Would you mind clarifying please? If so, thanks!
Edit: It's probably not even clear why I'm confused. But I mean, how does anyone know if any of the dates is supposed to be a "start and end date" or just a "start date" or just an "end date" without the comments? I assume I'm missing something obvious here...
December 6, 2010 at 11:21 pm
Relax auto, you're not dumb. If you haven't had a need for it the test data doesn't make sense out of the gate. This comes up a lot, though, for people trying to deal with, say, employee work logging or information from a mechanical switch reporting status, and you want to find 'misses'.
Basically, you're looking for start/end dates for each contiguous grouping of the dates in the test data.
I have a feeling Jeff you're going to be fielding a few questions about the subtracting from the date, but a solid spackle walkthrough. Thanks as always!
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 6, 2010 at 11:26 pm
But how can you tell which dates are "start", "end" or "start and end" without any additional information (like the comments)? I don't see any rhyme or reason here, aside from the dates being in chronological order.
December 6, 2010 at 11:38 pm
Nice article as a whole... a bit weak/confusing intro though. But thx to Craigs little comment it became much more clear what/why was being done.
/T
December 6, 2010 at 11:42 pm
Okay, I think this is starting to make sense now. Contiguous is defined here as "at least one date entry per day during a range". Okay. And now the rest of the article makes sense too. Cool.
Wow, now that I understand what's going on, this article is freaking awesome. Jeff rocks. (Craig, too.)
December 7, 2010 at 12:01 am
Alright, let's make the data a little more real world scenario.
You have a contractor who comes by and does work as needed. You're looking for a per problem average of how long it takes for them to fix whatever issue they were brought in for. You've got a log of said person's hours:
1/1/2000 8
1/2/2000 4
1/4/2000 8
1/5/2000 8
1/6/2000 8
1/9/2000 3
1/12/2000 8
1/13/2000 8
1/14/2000 6
So, you need to find a way to 'group' this data, to know which sections are continuous. Using the code and teh data above, there are four distinct groups:
1/1 - 1/2, 1/4 - 1/6, 1/9, and 1/12 - 1/14
Now that you have them grouped, you can group on the result of the code from the article, and SUM( hours). Now you have a per group sum, which you can then average afterwards.
This is also useful if you have non-contiguous ranges and you need to find gaps. For example, instead of grouping hours above, let's say my consultant was lax in remembering to file his paperwork. I can tell him he's missing information on 1/3, 1/7-1/8, and 1/10 - 1/12.
There's a number of applications to this, the above simply being a couple of examples.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 7, 2010 at 12:03 am
Awesome, got it. Thanks! I've taken up a good deal of your time today, Craig!
December 7, 2010 at 2:05 am
Nice Jeff, very nice indeed.
Now, all I need is to find somewhere I can use this 🙂
December 7, 2010 at 7:33 am
Nice article.
When Jeff said, "even though the DateGroup looks like a date, it really doesn't mean anything," he was fibbing a bit, probably to keep unnecessary detail out of the article.
To see what the DateGroup means, we can watch what happens to it as we move through the test results from the first set of test data. It starts out as '2009-12-31', and it would keep that value on every row if there were no gaps in the dates. Each time we pass a gap, the number of days in the gap are added to the DateGroup. So the DateGroup means:
The day prior to the first day in our input data, adjusted by the number of days skipped along the way to our current row.
Or, put another way:
The day prior to what the first day in our input data would be, if we shuffled all the dates forward to fill in the gaps up to the current row.
OK, so Jeff was probably right to gloss over this, but hang on a minute. Now that we know what DateGroup is, we can use it to count the skipped days. It's just the number of days between our earliest date and DateGroup. Here I changed Jeff's query to group by DaysSkipped instead of DateGroup.
WITH
cteGroupedDates AS
(SELECT UniqueDate = SomeDate,
DaysSkipped = DATEDIFF(dd, MinDate, DATEADD(dd, - ROW_NUMBER() OVER (ORDER BY SomeDate), SomeDate) + 1)
FROM #MyHead
CROSS JOIN (SELECT MIN(SomeDate) AS MinDate FROM #MyHead) m
GROUP BY MinDate, SomeDate
)
SELECT StartDate = MIN(UniqueDate),
EndDate = MAX(UniqueDate),
Days = DATEDIFF(dd,MIN(UniqueDate),MAX(UniqueDate))+1,
DaysSkipped
FROM cteGroupedDates
GROUP BY DaysSkipped
ORDER BY StartDate
;
This gives:
StartDate EndDate Days DaysSkipped
----------------------- ----------------------- ----------- -----------
2010-01-01 00:00:00.000 2010-01-01 00:00:00.000 1 0
2010-01-03 00:00:00.000 2010-01-03 00:00:00.000 1 1
2010-01-05 00:00:00.000 2010-01-06 00:00:00.000 2 2
2010-01-10 00:00:00.000 2010-01-12 00:00:00.000 3 5
DaysSkipped gives us a running total of the number of days in the gaps, which might be useful in some circumstances.
December 7, 2010 at 9:07 am
This was new to me and you did an excellent job of explaining how it works. I was with you every step of the way. 5 stars.
December 7, 2010 at 9:25 am
As always when I read an article by Jeff two things happen.
1. I find a very useful tip that I'm sure I'll need within 1 month of reading the article
2. I am amazed at how Jeff thinks. That's such an elegant solution to that sort of problem.
December 7, 2010 at 9:43 am
dadam (12/7/2010)
When Jeff said, "even though the DateGroup looks like a date, it really doesn't mean anything," he was fibbing a bit, probably to keep unnecessary detail out of the article.
Agh... Thanks for the correction... I should have said,
"...even though the DateGroup looks like a date, it [font="Arial Black"]simply being a date [/font]doesn't really mean anything,"
You are correct about the significance of the date and, for important a concept as contiguous dates all have the same date in the column, I did gloss over it a bit to keep the "spackle" short and mostly left that realization up to the reader by reading the results and the comments from that one query.
And the addition of DaysSkipped is a clever bit of code. Thanks for posting it. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2010 at 9:46 am
Craig Farrell (12/7/2010)
Alright, let's make the data a little more real world scenario.You have a contractor who comes by and does work as needed. You're looking for a per problem average of how long it takes for them to fix whatever issue they were brought in for. You've got a log of said person's hours:...
Very cool explanation on that post. Thanks a bunch for covering for me, Craig. I was busy at work at just couldn't get the time to respond.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2010 at 9:53 am
autoexcrement (12/6/2010)
But how can you tell which dates are "start", "end" or "start and end" without any additional information (like the comments)? I don't see any rhyme or reason here, aside from the dates being in chronological order.
I know Craig covered this, but you actually got it right there. The dates are simply in chronolgical order and the StartDate and EndDates identify the "islands" of dates where there are no missing dates.
I think a lot of folks were looking for a much more complicated answer to this simple problem.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 62 total)
You must be logged in to reply to this topic. Login to reply