Collapsing a Table

  • WayneS (6/1/2010)


    Jeff Moden (5/31/2010)


    WayneS (5/31/2010)


    This post is just to include an alternate method. I haven't tested the various methods, but I believe that the "Quirky Update" solution that Jeff presented will probably be the most efficient.

    For the very limited number of rows in this example, the two methods take turns winning. Of course, the "Deep Dive" method you presented isn't controversial at all like the Quiky Update is.

    Guess I'll have to build a million row test an find out for sure.

    Thanks for the great post, Wayne. Alternatives are always good.

    Jeff - your mis-spelling, though accidental, is still entirely accurate! :w00t:

    Maybe is should be re-named to the Quiky Quirky Update???

    😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, can you please try the sample you have where the row-number-diff code falls off a cliff with the following changes: 1) temp table instead of table variable and 2) permanent numbers table instead of derived one. Thanks in advance.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/2/2010)


    Jeff, can you please try the sample you have where the row-number-diff code falls off a cliff with the following changes: 1) temp table instead of table variable and 2) permanent numbers table instead of derived one. Thanks in advance.

    I'm not sure what you mean, Kevin... which sample? I was advocating the Quirky Update method and actually showing that the Tally table method pretty much takes in the socks on this one.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (6/2/2010)


    Jeff, can you please try the sample you have where the row-number-diff code falls off a cliff with the following changes: 1) temp table instead of table variable and 2) permanent numbers table instead of derived one. Thanks in advance.

    Here's running from my system with a permanent 3.1 million row tally table.

    Jeff's results:

    Edit: added Jeff's results. As expected, Quirky Update still rules!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Thanks for all of the suggestions!

    I came up with another option using a recursive CTE. Here's how I did it...

    declare @rangeTable table (start int, [end] int)

    insert @rangeTable

    (Start, [End])

    select 1, 2 union all

    select 3, 4 union all

    select 5, 6 union all

    select 8, 8 union all

    select 9, 9 union all

    select 10, 10 union all

    select 20, 22 union all

    select 23, 25 union all

    select 26, 30

    ;with NewRange as

    (

    select top 1 Row_Number() over(order by start) as row, start, [end], start MinStart, null PrevEnd

    from @rangeTable

    union all

    select rt.row, rt.start, rt.[end], case

    when nr.PrevEnd+1 <> rt.Start

    then rt.Start

    else nr.MinStart

    end as MinStart, rt.[end] PrevEnd

    from (select Row_Number() over(order by start) as row, start, [end]

    from @rangeTable) rt

    join NewRange nr on nr.row+1 = rt.row

    )

    select MinStart AS Start, max([End]) as [End]

    from NewRange

    group by MinStart

    The limitation with this approach is you can't have more than 32,767 rows. But for my case this isn't an issue.

  • Did you compare the performance of your solution vs. the other three methods posted so far?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I didn't. Mainly because the CTE idea only works with smaller datasets (less than 32767 rows).

  • cgreathouse (6/5/2010)


    The limitation with this approach is you can't have more than 32,767 rows. But for my case this isn't an issue.

    Heh... why do people insist on building "time bombs" in their code just because they believe that a limited number of rows is an excuse for not doing it right? 😉

    You need to check out the actual execution plan of your code and see where the table scan on @rangeTable produces 81 rows which is the square of the number of rows actually in the table. This "accidental" Cartesian join is known as "hidden RBAR" and having as few as 1000 rows in @rangeTable will cause SQL Server to have to trudge through a million rows and it takes 0:02. Having 10,000 rows in @rangeTable causes SQL Server to wade through 100 million rows and it took 4:14 to run on my humble desktop. That means it took 127 times longer for only 10 times more rows and that growth pattern isn't linear by any means. Interpolation suggests that only 32000 rows will cause SQL Server to run for 22:43.

    If you don't like the idea of using a Quirky Update for this, then write a cursor but, whatever you do, don't put that CTE into production.;-)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, are you saying a c.u.r.s.o.r. should be preferred instead of the "double-row-number" approach??



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (6/6/2010)


    Jeff, are you saying a c.u.r.s.o.r. should be preferred instead of the "double-row-number" approach??

    In this case, yes because of the way the double row number problem needs to be expanded to include all numbers in the start/end range. The example data with large ranges between start/end bears strong testimony to that. The cursor would only have to touch a small handlefull of rows in a manner similar to what the Quirky Update did.

    But, even the relatively slow double row number in this case is light speed compared to the accidental cross join formed by the CTE solution on this page.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff: I'm confused...

    When talking about the double row_number approach I was referring to the sample code I provided a few posts back (not the "deep dive" approach though).

    It's showing a consistent execution plan regardless of the range to cover. What do I miss/overlook?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (6/6/2010)


    @Jeff: I'm confused...

    When talking about the double row_number approach I was referring to the sample code I provided a few posts back (not the "deep dive" approach though).

    It's showing a consistent execution plan regardless of the range to cover. What do I miss/overlook?

    Sorry... I thought you were talking about the "deep dive" approach. I'll go back and look.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Looks good, Lutz. I just wish there were a way to keep from having to read the same table 4 times although I believe it'll still smoke a cursor (although I've not actually tried a cursor on this).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm not a fan of the "quirky update", so I had a think about this, and dug out some old code that I wrote at work for a very similar problem.

    On my tests this is faster than all other supplied methods, including the quirky update, and has the big advantage (for me at least) of running on SQL2000.

    Apologies for the layout, I just can't break the habit of putting commas at the start of the line, even though it annoys some...

    selects.start

    ,(

    selectMIN(e.[end])

    from@rangeTable e

    wheree.[start] > s.[end]

    and not exists (

    select1

    from@rangeTable e1

    wheree1.[start] = e.[end] + 1

    )

    )

    from@rangeTable s

    where not exists (

    select1

    from@rangeTable s1

    wheres1.[end] = s.start - 1

    )

  • There's a fly in the ointment, though... try this data...

    declare @rangeTable table (start int, [end] int)

    insert @rangeTable

    (Start, [End])

    select 1, 2 union all

    select 3, 4 union all

    select 5, 6 union all

    select 8, 8 union all

    select 9, 9 union all

    select 10, 10 union all

    select 20, 22 union all

    select 23, 25 union all

    select 26, 30 UNION ALL

    select 32, 32 union all

    select 34, 34

    select s.start

    , (

    select MIN(e.[end])

    from @rangeTable e

    where e.[start] > s.[end]

    and not exists (

    select 1

    from @rangeTable e1

    where e1.[start] = e.[end] + 1

    )

    )

    from @rangeTable s

    where not exists (

    select 1

    from @rangeTable s1

    where s1.[end] = s.start - 1

    )

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply