Collapsing a Table

Question

Collapsing a Table

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 1

WayneS (6/1/2010)
Jeff Moden (5/31/2010)
WayneS (5/31/2010)
This post is just to include an alternate method. I haven't tested the various methods, but I believe that the "Quirky Update" solution that Jeff presented will probably be the most efficient.
For the very limited number of rows in this example, the two methods take turns winning. Of course, the "Deep Dive" method you presented isn't controversial at all like the Quiky Update is.
Guess I'll have to build a million row test an find out for sure.
Thanks for the great post, Wayne. Alternatives are always good.
Jeff - your mis-spelling, though accidental, is still entirely accurate! :w00t:
Maybe is should be re-named to the Quiky Quirky Update???

😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 2

Jeff, can you please try the sample you have where the row-number-diff code falls off a cliff with the following changes: 1) temp table instead of table variable and 2) permanent numbers table instead of derived one. Thanks in advance.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

TheSQLGuru (6/2/2010)
Jeff, can you please try the sample you have where the row-number-diff code falls off a cliff with the following changes: 1) temp table instead of table variable and 2) permanent numbers table instead of derived one. Thanks in advance.

I'm not sure what you mean, Kevin... which sample? I was advocating the Quirky Update method and actually showing that the Tally table method pretty much takes in the socks on this one.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WayneS SSC Guru Points: 95460 More actions · Answer 4

TheSQLGuru (6/2/2010)
Jeff, can you please try the sample you have where the row-number-diff code falls off a cliff with the following changes: 1) temp table instead of table variable and 2) permanent numbers table instead of derived one. Thanks in advance.

Here's running from my system with a permanent 3.1 million row tally table.

Jeff's results:

Edit: added Jeff's results. As expected, Quirky Update still rules!

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

cgreathouse SSC Eights! Points: 855 More actions · Answer 5

Thanks for all of the suggestions!

I came up with another option using a recursive CTE. Here's how I did it...

declare @rangeTable table (start int, [end] int)

insert @rangeTable

(Start, [End])

select 1, 2 union all

select 3, 4 union all

select 5, 6 union all

select 8, 8 union all

select 9, 9 union all

select 10, 10 union all

select 20, 22 union all

select 23, 25 union all

select 26, 30

;with NewRange as

(

select top 1 Row_Number() over(order by start) as row, start, [end], start MinStart, null PrevEnd

from @rangeTable

union all

select rt.row, rt.start, rt.[end], case

when nr.PrevEnd+1 <> rt.Start

then rt.Start

else nr.MinStart

end as MinStart, rt.[end] PrevEnd

from (select Row_Number() over(order by start) as row, start, [end]

from @rangeTable) rt

join NewRange nr on nr.row+1 = rt.row

)

select MinStart AS Start, max([End]) as [End]

from NewRange

group by MinStart

The limitation with this approach is you can't have more than 32,767 rows. But for my case this isn't an issue.

LutzM SSC Guru Points: 107049 More actions · Answer 6

Did you compare the performance of your solution vs. the other three methods posted so far?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

cgreathouse SSC Eights! Points: 855 More actions · Answer 7

I didn't. Mainly because the CTE idea only works with smaller datasets (less than 32767 rows).

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

cgreathouse (6/5/2010)
The limitation with this approach is you can't have more than 32,767 rows. But for my case this isn't an issue.

Heh... why do people insist on building "time bombs" in their code just because they believe that a limited number of rows is an excuse for not doing it right? 😉

You need to check out the actual execution plan of your code and see where the table scan on @rangeTable produces 81 rows which is the square of the number of rows actually in the table. This "accidental" Cartesian join is known as "hidden RBAR" and having as few as 1000 rows in @rangeTable will cause SQL Server to have to trudge through a million rows and it takes 0:02. Having 10,000 rows in @rangeTable causes SQL Server to wade through 100 million rows and it took 4:14 to run on my humble desktop. That means it took 127 times longer for only 10 times more rows and that growth pattern isn't linear by any means. Interpolation suggests that only 32000 rows will cause SQL Server to run for 22:43.

If you don't like the idea of using a Quirky Update for this, then write a cursor but, whatever you do, don't put that CTE into production.;-)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

LutzM SSC Guru Points: 107049 More actions · Answer 9

Jeff, are you saying a c.u.r.s.o.r. should be preferred instead of the "double-row-number" approach??

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 10

lmu92 (6/6/2010)
Jeff, are you saying a c.u.r.s.o.r. should be preferred instead of the "double-row-number" approach??

In this case, yes because of the way the double row number problem needs to be expanded to include all numbers in the start/end range. The example data with large ranges between start/end bears strong testimony to that. The cursor would only have to touch a small handlefull of rows in a manner similar to what the Quirky Update did.

But, even the relatively slow double row number in this case is light speed compared to the accidental cross join formed by the CTE solution on this page.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

LutzM SSC Guru Points: 107049 More actions · Answer 11

@jeff: I'm confused...

When talking about the double row_number approach I was referring to the sample code I provided a few posts back (not the "deep dive" approach though).

It's showing a consistent execution plan regardless of the range to cover. What do I miss/overlook?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 12

lmu92 (6/6/2010)
@Jeff: I'm confused...
When talking about the double row_number approach I was referring to the sample code I provided a few posts back (not the "deep dive" approach though).
It's showing a consistent execution plan regardless of the range to cover. What do I miss/overlook?

Sorry... I thought you were talking about the "deep dive" approach. I'll go back and look.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 13

Looks good, Lutz. I just wish there were a way to keep from having to read the same table 4 times although I believe it'll still smoke a cursor (although I've not actually tried a cursor on this).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

paul_ramster Ten Centuries Points: 1382 More actions · Answer 14

I'm not a fan of the "quirky update", so I had a think about this, and dug out some old code that I wrote at work for a very similar problem.

On my tests this is faster than all other supplied methods, including the quirky update, and has the big advantage (for me at least) of running on SQL2000.

Apologies for the layout, I just can't break the habit of putting commas at the start of the line, even though it annoys some...

selects.start

,(

selectMIN(e.[end])

from@rangeTable e

wheree.[start] > s.[end]

and not exists (

select1

from@rangeTable e1

wheree1.[start] = e.[end] + 1

)

from@rangeTable s

where not exists (

select1

from@rangeTable s1

wheres1.[end] = s.start - 1

)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 15

There's a fly in the ointment, though... try this data...

declare @rangeTable table (start int, [end] int)

insert @rangeTable

(Start, [End])

select 1, 2 union all

select 3, 4 union all

select 5, 6 union all

select 8, 8 union all

select 9, 9 union all

select 10, 10 union all

select 20, 22 union all

select 23, 25 union all

select 26, 30 UNION ALL

select 32, 32 union all

select 34, 34

select s.start

, (

select MIN(e.[end])

from @rangeTable e

where e.[start] > s.[end]

and not exists (

select 1

from @rangeTable e1

where e1.[start] = e.[end] + 1

)

from @rangeTable s

where not exists (

select 1

from @rangeTable s1

where s1.[end] = s.start - 1

)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)