June 1, 2010 at 12:22 pm
WayneS (6/1/2010)
Jeff Moden (5/31/2010)
WayneS (5/31/2010)
This post is just to include an alternate method. I haven't tested the various methods, but I believe that the "Quirky Update" solution that Jeff presented will probably be the most efficient.For the very limited number of rows in this example, the two methods take turns winning. Of course, the "Deep Dive" method you presented isn't controversial at all like the Quiky Update is.
Guess I'll have to build a million row test an find out for sure.
Thanks for the great post, Wayne. Alternatives are always good.
Jeff - your mis-spelling, though accidental, is still entirely accurate! :w00t:
Maybe is should be re-named to the Quiky Quirky Update???
😀
--Jeff Moden
Change is inevitable... Change for the better is not.
June 2, 2010 at 9:02 am
Jeff, can you please try the sample you have where the row-number-diff code falls off a cliff with the following changes: 1) temp table instead of table variable and 2) permanent numbers table instead of derived one. Thanks in advance.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 2, 2010 at 9:27 pm
TheSQLGuru (6/2/2010)
Jeff, can you please try the sample you have where the row-number-diff code falls off a cliff with the following changes: 1) temp table instead of table variable and 2) permanent numbers table instead of derived one. Thanks in advance.
I'm not sure what you mean, Kevin... which sample? I was advocating the Quirky Update method and actually showing that the Tally table method pretty much takes in the socks on this one.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 3, 2010 at 5:51 am
TheSQLGuru (6/2/2010)
Jeff, can you please try the sample you have where the row-number-diff code falls off a cliff with the following changes: 1) temp table instead of table variable and 2) permanent numbers table instead of derived one. Thanks in advance.
Here's running from my system with a permanent 3.1 million row tally table.
Jeff's results:
Edit: added Jeff's results. As expected, Quirky Update still rules!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
June 5, 2010 at 9:51 pm
Thanks for all of the suggestions!
I came up with another option using a recursive CTE. Here's how I did it...
declare @rangeTable table (start int, [end] int)
insert @rangeTable
(Start, [End])
select 1, 2 union all
select 3, 4 union all
select 5, 6 union all
select 8, 8 union all
select 9, 9 union all
select 10, 10 union all
select 20, 22 union all
select 23, 25 union all
select 26, 30
;with NewRange as
(
select top 1 Row_Number() over(order by start) as row, start, [end], start MinStart, null PrevEnd
from @rangeTable
union all
select rt.row, rt.start, rt.[end], case
when nr.PrevEnd+1 <> rt.Start
then rt.Start
else nr.MinStart
end as MinStart, rt.[end] PrevEnd
from (select Row_Number() over(order by start) as row, start, [end]
from @rangeTable) rt
join NewRange nr on nr.row+1 = rt.row
)
select MinStart AS Start, max([End]) as [End]
from NewRange
group by MinStart
The limitation with this approach is you can't have more than 32,767 rows. But for my case this isn't an issue.
June 6, 2010 at 4:50 am
Did you compare the performance of your solution vs. the other three methods posted so far?
June 6, 2010 at 7:51 am
I didn't. Mainly because the CTE idea only works with smaller datasets (less than 32767 rows).
June 6, 2010 at 10:29 am
cgreathouse (6/5/2010)
The limitation with this approach is you can't have more than 32,767 rows. But for my case this isn't an issue.
Heh... why do people insist on building "time bombs" in their code just because they believe that a limited number of rows is an excuse for not doing it right? 😉
You need to check out the actual execution plan of your code and see where the table scan on @rangeTable produces 81 rows which is the square of the number of rows actually in the table. This "accidental" Cartesian join is known as "hidden RBAR" and having as few as 1000 rows in @rangeTable will cause SQL Server to have to trudge through a million rows and it takes 0:02. Having 10,000 rows in @rangeTable causes SQL Server to wade through 100 million rows and it took 4:14 to run on my humble desktop. That means it took 127 times longer for only 10 times more rows and that growth pattern isn't linear by any means. Interpolation suggests that only 32000 rows will cause SQL Server to run for 22:43.
If you don't like the idea of using a Quirky Update for this, then write a cursor but, whatever you do, don't put that CTE into production.;-)
--Jeff Moden
Change is inevitable... Change for the better is not.
June 6, 2010 at 10:37 am
Jeff, are you saying a c.u.r.s.o.r. should be preferred instead of the "double-row-number" approach??
June 6, 2010 at 1:52 pm
lmu92 (6/6/2010)
Jeff, are you saying a c.u.r.s.o.r. should be preferred instead of the "double-row-number" approach??
In this case, yes because of the way the double row number problem needs to be expanded to include all numbers in the start/end range. The example data with large ranges between start/end bears strong testimony to that. The cursor would only have to touch a small handlefull of rows in a manner similar to what the Quirky Update did.
But, even the relatively slow double row number in this case is light speed compared to the accidental cross join formed by the CTE solution on this page.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 6, 2010 at 3:54 pm
@jeff: I'm confused...
When talking about the double row_number approach I was referring to the sample code I provided a few posts back (not the "deep dive" approach though).
It's showing a consistent execution plan regardless of the range to cover. What do I miss/overlook?
June 6, 2010 at 4:38 pm
lmu92 (6/6/2010)
@Jeff: I'm confused...When talking about the double row_number approach I was referring to the sample code I provided a few posts back (not the "deep dive" approach though).
It's showing a consistent execution plan regardless of the range to cover. What do I miss/overlook?
Sorry... I thought you were talking about the "deep dive" approach. I'll go back and look.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 6, 2010 at 5:43 pm
Looks good, Lutz. I just wish there were a way to keep from having to read the same table 4 times although I believe it'll still smoke a cursor (although I've not actually tried a cursor on this).
--Jeff Moden
Change is inevitable... Change for the better is not.
June 7, 2010 at 4:06 am
I'm not a fan of the "quirky update", so I had a think about this, and dug out some old code that I wrote at work for a very similar problem.
On my tests this is faster than all other supplied methods, including the quirky update, and has the big advantage (for me at least) of running on SQL2000.
Apologies for the layout, I just can't break the habit of putting commas at the start of the line, even though it annoys some...
selects.start
,(
selectMIN(e.[end])
from@rangeTable e
wheree.[start] > s.[end]
and not exists (
select1
from@rangeTable e1
wheree1.[start] = e.[end] + 1
)
)
from@rangeTable s
where not exists (
select1
from@rangeTable s1
wheres1.[end] = s.start - 1
)
June 7, 2010 at 5:59 am
There's a fly in the ointment, though... try this data...
declare @rangeTable table (start int, [end] int)
insert @rangeTable
(Start, [End])
select 1, 2 union all
select 3, 4 union all
select 5, 6 union all
select 8, 8 union all
select 9, 9 union all
select 10, 10 union all
select 20, 22 union all
select 23, 25 union all
select 26, 30 UNION ALL
select 32, 32 union all
select 34, 34
select s.start
, (
select MIN(e.[end])
from @rangeTable e
where e.[start] > s.[end]
and not exists (
select 1
from @rangeTable e1
where e1.[start] = e.[end] + 1
)
)
from @rangeTable s
where not exists (
select 1
from @rangeTable s1
where s1.[end] = s.start - 1
)
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 38 total)
You must be logged in to reply to this topic. Login to reply