December 7, 2010 at 9:56 am
tommyh (12/6/2010)
Nice article as a whole... a bit weak/confusing intro though. But thx to Craigs little comment it became much more clear what/why was being done./T
The folks that know me know where the detail of my articles usually is... in the comments in the code. I should probably get into the habit of stating that, though. :pinch: Thanks for the constructive feedback, Tommy. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2010 at 9:58 am
autoexcrement (12/6/2010)
Okay, I think this is starting to make sense now. Contiguous is defined here as "at least one date entry per day during a range". Okay. And now the rest of the article makes sense too. Cool.Wow, now that I understand what's going on, this article is freaking awesome. Jeff rocks. (Craig, too.)
Now there's a huge compliment. Nah... I don't mean the "Jeff rocks" thing. I mean the huge lightbulb that just turned on. Thanks for the awesome feedback.
Craig... it was your code that clarified it all here. Thanks again for the wonderful "cover" while I was busy.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2010 at 10:00 am
nigel. (12/7/2010)
Nice Jeff, very nice indeed.Now, all I need is to find somewhere I can use this 🙂
Thanks for the feed. With a bit of extrapolation, you can also do a similar thing to find "islands" of just about any numeric sequence. Think the inverse of "find missing identies".
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2010 at 10:02 am
JJ B (12/7/2010)
This was new to me and you did an excellent job of explaining how it works. I was with you every step of the way. 5 stars.
Thanks for the great feedback, JJ. Good to "see" you again. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2010 at 10:05 am
mtassin (12/7/2010)
As always when I read an article by Jeff two things happen.1. I find a very useful tip that I'm sure I'll need within 1 month of reading the article
2. I am amazed at how Jeff thinks. That's such an elegant solution to that sort of problem.
You're too kind, Mark. :blush: Thanks for the awesome compliment. Like I said at the end of the article, it appears that someone else thought of the same thing and published it in a book so I'm certainly not alone in that way of thinking.
What most people don't understand is that I learned a huge amount of what I know and the way I think from this very site and I'm just trying to return the favor.
Glad to see that "no RBAR" flag is still flying high. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2010 at 10:25 am
Jeff Moden (12/7/2010)
Craig... it was your code that clarified it all here. Thanks again for the wonderful "cover" while I was busy.
It was my pleasure. No code though, just a different phrasing. After the whole End Of Week calculation that I think 10 people had to chime in on just to get a different phrasing that finally made sense to me and make it all click, the value of a different voice with a slightly different viewpoint to explain the same thing has become embedded in me again. 🙂
P.S. What's this stuff about you needing to work and not being on SSC 24/7?! BAH! Free internet forums pay the bills!
Don't they?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 7, 2010 at 12:23 pm
yet another GREAT spackle article. (Polyfilla for the people on the other side of the pond) 😛
thanks Jeff.
December 7, 2010 at 12:43 pm
For the original question, how about this:
[font="Courier New"]
select a.SomeDate,
(select max(b.SomeDate)
from #MyHead b
-- where difference of dates equals
-- count of dates in between
where datediff(d,a.SomeDate,b.SomeDate) =
(select count(distinct c.SomeDate)
from #MyHead c
where c.SomeDate > a.SomeDate
and c.SomeDate <= b.SomeDate)) EndDate
from #MyHead a
[/font]
December 7, 2010 at 1:03 pm
RStarr, while your code seems functional (I didn't extensively test it), take a look at the million row test.
If you look near the bottom of the article, Jeff gives a way to build out a very large sample dataset for testing. Try your code out vs. his. You'll (most likely) notice a large discrepency. This will be primarily because of the necessary data manipulation you'll be performing. I think that will actually run the aggregation structure *per row*.
This is as much about optimization as anything else. A while/cursor loop doing a single pass on the data might outperform that.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 7, 2010 at 7:55 pm
Craig Farrell (12/7/2010)
RStarr, while your code seems functional (I didn't extensively test it), take a look at the million row test.If you look near the bottom of the article, Jeff gives a way to build out a very large sample dataset for testing. Try your code out vs. his. You'll (most likely) notice a large discrepency. This will be primarily because of the necessary data manipulation you'll be performing. I think that will actually run the aggregation structure *per row*.
This is as much about optimization as anything else. A while/cursor loop doing a single pass on the data might outperform that.
Just a follow up... The code from the article is nearly instantaneous even on the first run on 100,000 rows without an index. I stopped the other code after 6-1/2 minutes. Looking at the actual execution plan on a 100 row table, it had to "touch" more than 164,000 internal rows because, as you said, the aggregations are fully executed on a "per row" basis.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2010 at 8:07 pm
Geoff A (12/7/2010)
yet another GREAT spackle article. (Polyfilla for the people on the other side of the pond) 😛thanks Jeff.
Thanks for the feedback and for stopping by, Geoff. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2010 at 10:26 am
Great post Jeff!
I had a thought... this could be extended to account for Times in addition to dates... just move the calculation to the lowest significant field (from day to second). I had to code an algorithm in VB to find the "islands" and the "water" for a scheduling application, this may well drop the need for that...
Again, Great Job!
December 8, 2010 at 11:11 am
venoym (12/8/2010)
Great post Jeff!I had a thought... this could be extended to account for Times in addition to dates... just move the calculation to the lowest significant field (from day to second). I had to code an algorithm in VB to find the "islands" and the "water" for a scheduling application, this may well drop the need for that...
Again, Great Job!
You might, but my recommendation would be to add a calculated column that is a datediff(ss, <datefield>, Min(date)), and use this technique off the resultant BIGINT.
Don't do this from date 0 or something, you want to make it tight as possible. There's gonna be a LOT of seconds over the course of a year or three. 🙂
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 8, 2010 at 3:40 pm
Thanks Craig, Jeff for pointing out the inefficiency. I didn't realize that using the aggregation like that would be so slow.
On the other hand the original question was: "How can I determine overlapping dates?" That is not a totally trivial problem. My purpose was to suggest a solution that was relatively simple, but yet got the job done.
I ran a test on 500 dates over the span of four years. It returned in 6 seconds. In the environment I come from that would be sufficient for any practical problems we might come up with.
But I enjoyed reading through the other solutions as well.
December 8, 2010 at 3:46 pm
rstarr-916208 (12/8/2010)
Thanks Craig, Jeff for pointing out the inefficiency. I didn't realize that using the aggregation like that would be so slow.On the other hand the original question was: "How can I determine overlapping dates?" That is not a totally trivial problem. My purpose was to suggest a solution that was relatively simple, but yet got the job done.
I ran a test on 500 dates over the span of four years. It returned in 6 seconds. In the environment I come from that would be sufficient for any practical problems we might come up with.
But I enjoyed reading through the other solutions as well.
Not sure where the original question of overlapping dates is from, I must have missed it even in the re-read. However, easiest way to find that is to do a DENSE_RANK on your datefield, then do an aggregate COUNT(*) by the rank. Anything with HAVING COUNT(*) > 1 will be a duplicate.
EDIT: You know, I'm apparently not on the ball today, there's no reason to even use the dense_rank to find your duplicates. Just group by your date/other qualifiers and look for a having count(*) > 1.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 15 posts - 16 through 30 (of 62 total)
You must be logged in to reply to this topic. Login to reply