The T-SQL Quiz

  • dfalso (5/9/2008)


    Jeff Moden (5/6/2008)No... no Reporting Services or Excel required. Take a peek in Books Online for "cross-tab reports" in SQL Server 2000 or follow the following URL...

    http://msdn.microsoft.com/en-us/library/aa172756(sql.80).aspx

    They're very effective, can be made to return more than PIVOT in 2k5, and, in the presence of the correct indexes, just absolutely fly. They're also pretty easy to make in the form of Dynamic SQL and, with the advent of VARCHAR(MAX) in 2k5, can be easily made to return more columns than you'd ever need.

    {I apologize, I was away from the office}

    Ah, I know of that technique (and use it), I just didnt have a formal name (I just think of it as a technique to pivot things). Thanks for the tip.

    Yes, I do appreciate that one can make a PIVOT statement effectively (in fact, that's how I currently solve this problem). I just feel that there should be a CURSOR solution more efficient. My reasoning is in two parts:

    1) if you look at the process to build and run your SQL, you make one pass over the table to get all the row id values (which goes into a variable). Then, you make another pass over the information_schema or sys.columns view to create the SQL for each column, and finally you actually run the SQL which then issues a seperate PIVOT statement for each column in the table (but UNIONing them all together). So you've evaluated all the rows, then all the columns, to create a statment to evaluate each row of each column. There seems like there should be a more efficient way to go there.

    Actually - the "first pass" you talk about would only conceptually happen when there is NO appropriate supporting indexes. with the appropriate indexing - the top levels of the B-tree would be used to find the values.

    As to the rest of the process - that's not what I've seen. The advantage of the set based approach is that all of the column values are being accumulated AT THE SAME TIME, and NOT serially. That's what the worktable is tracking. You're still describing the procedural approach to solving the problem, which is by nature serial, rote and inneficient when compared to an implicitly much more parallel method of computing these values.

    As to cursors - one of the big "weaknesses" of T-SQL cursors (versus other cursor-like objects in procedural languages) is that each row requires individual "commits". There's no way to "accumulate intermediate results, and then flush them all" with T-SQL cursors, which is ultimately why cursors don't stand a chance performance-wise against decent set-based code. The whole point of set-based is to allow you to perform operations on (large) chunks of data, and flush the changes/commit in *batches*, not row-by-row.

    If you do in fact come up with a cursor method that is faster/more efficient than a set-based method for crosstabulation, please share it. There would be a HUGE group of people who would love to get their hands on more efficient ways to aggregate data like that. I think you will however find that set-based will crush the cursor methods, each and every time.

    2) the problem is not a set-level problem, its a cell-level problem. That is, there's no common operation applied to each set member. There's a common heuristic, sure, but each column is evaluated seperately, and each row for that column is evaluated seperately still, so in effect each cell has it's own custom operation (regardless of the syntax we use). Clearly, an array-based solution is the best, so I *think* a cursor is the closest to that. (How well does CLR-stored procedure code do? Can it access SQL Server's memory directly for I/O, or is it another layer? Because if it's efficient, that's probably the best choice...I havent' done work with CLR stored procs yet, I confess)

    CLR might have some interesting prospects to beating the cursor methods, but having tried myself, there would be precious few opportunities to flat out beat T-SQL code in this case. It would likely have to be some bizarre aggregation that SQL Server doesn't naturally provide for (which is ultimately what it takes for CLR to really shine: finding a spot T-SQL is just plain not very good at). Odd running totals, peculiarly scoped percentages, maybe - but most of the calculations would fly through T-SQL (either using aggregate function or the new windowed function calls), and again - eclipse CLR in terms of perf. combine that with its "challenged" resource management, and CLR is not something I would lean on overly heavily for LARGE amount of data processing (which dovetails directly into Microsoft's own recommendations as to best practices for CLR usage).

    Just my experience on it.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (5/9/2008)Actually - the "first pass" you talk about would only conceptually happen when there is NO appropriate supporting indexes. with the appropriate indexing - the top levels of the B-tree would be used to find the values.

    As to the rest of the process - that's not what I've seen. The advantage of the set based approach is that all of the column values are being accumulated AT THE SAME TIME, and NOT serially. That's what the worktable is tracking. You're still describing the procedural approach to solving the problem, which is by nature serial, rote and inneficient when compared to an implicitly much more parallel method of computing these values.

    To clarify: I was describing the process and issues of building and running the SQL statement. If you're using a PIVOT, your syntax is of the form

    SELECT { ID values }

    FROM (

    SELECT {ID field}, {column to transpose}

    FROM mytable

    ) AS sourcetab

    PIVOT (

    AVG({column to transpose})

    FOR {ID field} IN

    ( {ID values} )

    ) AS pivotted

    So, if we're building this dynamically, we'd first create the variable string for the {ID values} part, assuming we don't need to rename. So that's the initial SELECT over the table.

    Then we need to build a PIVOT statment per {column to transpose} field, each of which uses our ID list variable. That's the information_schema SELECT.

    We'll concatenate each string together, but we still have to actually run the code. And the code runs each SELECT and UNIONs them together, although commiting them as a batch, like you said.

    Unfortunately, the CASE WHEN technique cited earlier has the same steps necessary to construct and run the string.

    Informally, I was trying to describe how in this situation, each cell ultimately has a seperate operation because no two rows share the same operator. It's a bit like if we had to do something that would cause us to write a CASE WHEN statement, where we have a WHEN clause for each ID in a table. Yes, the operation happens as a set, but we're not really doing set-based work, so there should be a more efficient way to accomplish the task. That was my point.

    As to cursors - one of the big "weaknesses" of T-SQL cursors (versus other cursor-like objects in procedural languages) is that each row requires individual "commits". There's no way to "accumulate intermediate results, and then flush them all" with T-SQL cursors, which is ultimately why cursors don't stand a chance performance-wise against decent set-based code. The whole point of set-based is to allow you to perform operations on (large) chunks of data, and flush the changes/commit in *batches*, not row-by-row.

    Thanks for the insight, and that's some good information.

  • Chris had the most elegant solution, all he had to do was rearrange the order of his if statements, ie...

    declare @i int

    set @i = 1

    while( @i < 101)

    begin

    if @i%3 = 0 and @i%5 = 0 print 'BizzBuzz'

    else if @i%5 = 0 print 'Buzz'

    else if @i%3 = 0 print 'Bizz'

    else print @i

    set @i = @i+1

    end

    Done.

  • Ummm... you think the use of a While loop is "elegant"? :blink: :Whistling:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Of the solutions printed in the article, his had the fastest computing time.

    Please excuse my ignorance, but what is wrong with while loops?

  • zxn001 (6/27/2008)


    Of the solutions printed in the article, his had the fastest computing time.

    Please excuse my ignorance, but what is wrong with while loops?

    nothing per se, but run this in text mode...

    set nocount on

    declare @start datetime, @end datetime

    select @start = GetDate()

    select case N%5 + N%3

    when 0 then 'BizzBuzz'

    else case N%5

    when 0 then 'Bizz'

    else case N%3

    when 0 then 'Buzz'

    else cast(N as varchar(10))

    end

    end

    end

    from tally t1 --tally is a table of one (int) column, containing numbers 1 - 11,000

    where N < 10001

    select @end = GetDate()

    select datediff(ms, @start, @end) as SetTheoryLoopTime

    select @start = GetDate()

    declare @i int

    set @i = 1

    while( @i < 10001)

    begin

    if @i%3 = 0 and @i%5 = 0 print 'BizzBuzz'

    else if @i%5 = 0 print 'Buzz'

    else if @i%3 = 0 print 'Bizz'

    else print @i

    set @i = @i+1

    end

    select @end = GetDate()

    select datediff(ms, @start, @end) as whileLoopTime

    What do you get back in performance times?

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • The set was much faster.

    However, I must note that in order for the set to work, you must have the pre-existing data for it to compute, which whould require either another loop, or someone to type in the data.

    Is there a way to add a few lines to SetTheoryLoop to first create the table of integers, then mutating it, AND have it still be faster than the while loop.

    But still, thanks for showing me the Set Loop, if I happen to have a pre-existing table, I'll be sure to use it rather than the WHILE.

  • zxn001 (6/27/2008)


    The set was much faster.

    However, I must note that in order for the set to work, you must have the pre-existing data for it to compute, which whould require either another loop, or someone to type in the data.

    Is there a way to add a few lines to SetTheoryLoop to first create the table of integers, then mutating it, AND have it still be faster than the while loop.

    But still, thanks for showing me the Set Loop, if I happen to have a pre-existing table, I'll be sure to use it rather than the WHILE.

    Just so you're convinced:

    drop table #tally

    set nocount on

    declare @start datetime, @end datetime

    select @start = GetDate()

    select top 11000 identity(int,1,1) N

    into #tally

    from sys.all_columns sc1, sys.all_columns sc2

    create unique clustered index ixn on #tally(n)

    select case N%5 + N%3

    when 0 then 'BizzBuzz'

    else case N%5

    when 0 then 'Bizz'

    else case N%3

    when 0 then 'Buzz'

    else cast(N as varchar(10))

    end

    end

    end

    from #tally t1 --tally is a table of one (int) column, containing numbers 1 - 11,000

    where N < 10001

    select @end = GetDate()

    select datediff(ms, @start, @end) as SetTheoryLoopTime

    select @start = GetDate()

    declare @i int

    set @i = 1

    while( @i < 10001)

    begin

    if @i%3 = 0 and @i%5 = 0 print 'BizzBuzz'

    else if @i%5 = 0 print 'Buzz'

    else if @i%3 = 0 print 'Bizz'

    else print @i

    set @i = @i+1

    end

    select @end = GetDate()

    select datediff(ms, @start, @end) as whileLoopTime

    Really - forget loops even exist. the likelihood they will outperform something set-based are minute.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Insert this into the top of the script

    --===== Conditionally drop

    IF OBJECT_ID('dbo.Tally') IS NOT NULL

    DROP TABLE dbo.Tally

    --===== Create and populate the Tally table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    No loop required! 🙂

    On my old-ish server it runs in 80 ms

    Hats off to Jeff M for teaching us this trick

    Edit: I see Matt beat me to it, but he is spot on, forget loops if you can.

    Dave


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • I didnt know SQL could do that.

    And you made me a believer, a very confused one, but still believing.

    So now I'm really curious, WHY does the set loop work fasther? is it the way SQL is implemented or is it in general?

    It almost seems like Set does one commparison of @i to 3 and 5 at the same time, but bytes are binary, I did some testing,

    if u switch the 3 and 5, ie test 3 then 5, it increases the speed, which implies that it doesnt test them at the same time, because if it did it should matter. if it tested 3 then if not test 5, then the increase in speed makes sense due to the freq of 3 being higher than 5.

    so wat exactly does the Set Loop do?

    P.S. thanks everyone for enlightening me

  • zxn001 (6/27/2008)


    I didnt know SQL could do that.

    And you made me a believer, a very confused one, but still believing.

    So now I'm really curious, WHY does the set loop work fasther? is it the way SQL is implemented or is it in general?

    It almost seems like Set does one commparison of @i to 3 and 5 at the same time, but bytes are binary, I did some testing,

    if u switch the 3 and 5, ie test 3 then 5, it increases the speed, which implies that it doesnt test them at the same time, because if it did it should matter. if it tested 3 then if not test 5, then the increase in speed makes sense due to the freq of 3 being higher than 5.

    so wat exactly does the Set Loop do?

    P.S. thanks everyone for enlightening me

    The simple answer is : SQL Server is a SET engine. That's how it thinks, and how it processes. Set-based actions occur on many rows simultaneously (assuming we stay out of SQL Server's way and let it decide how best to do things).

    The set operation here isn't actually a loop (that's a misnomer). The Tally table provides us with a high-speed ordered list of numbers on which we can perform the rules. Because it's set-based - several hundred/thousand rows are being processed in one shot, instead of the one by one you get out of the loop.

    The difference gets even more pronounced when you make for "bigger" sets. The advantage of a set-based approach is that performance doesn't have to decrease linearly as the set gets bigger.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • zxn001 (6/27/2008)


    I didnt know SQL could do that.

    And you made me a believer, a very confused one, but still believing.

    So now I'm really curious, WHY does the set loop work fasther? is it the way SQL is implemented or is it in general?

    It almost seems like Set does one commparison of @i to 3 and 5 at the same time, but bytes are binary, I did some testing,

    if u switch the 3 and 5, ie test 3 then 5, it increases the speed, which implies that it doesnt test them at the same time, because if it did it should matter. if it tested 3 then if not test 5, then the increase in speed makes sense due to the freq of 3 being higher than 5.

    so wat exactly does the Set Loop do?

    P.S. thanks everyone for enlightening me

    I happen to have a link to an article that explains much of that...

    http://www.sqlservercentral.com/articles/TSQL/62867/

    Most of the performance based articles that I've written either use a Tally table to form the set based loop, or other methods. For example, most folks use cursors or While loops or (worse) triangular joins to do running totals... they take comparitively forever to run... rather than list each of the articles individually, take a look at the set based solutions that use some of these "set based loops" in the following collection of articles... try the code in each of them... you'll be amazed...

    http://www.sqlservercentral.com/Authors/Articles/Jeff_Moden/80567/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • zxn001 (6/27/2008)


    I didnt know SQL could do that.

    And you made me a believer, a very confused one, but still believing.

    So now I'm really curious, WHY does the set loop work fasther? is it the way SQL is implemented or is it in general?

    It almost seems like Set does one commparison of @i to 3 and 5 at the same time, but bytes are binary, I did some testing,

    if u switch the 3 and 5, ie test 3 then 5, it increases the speed, which implies that it doesnt test them at the same time, because if it did it should matter. if it tested 3 then if not test 5, then the increase in speed makes sense due to the freq of 3 being higher than 5.

    so wat exactly does the Set Loop do?

    P.S. thanks everyone for enlightening me

    There are some assumptions in what I'm about to say.

    Behind the scenes, SQL Server is always using cursors.

    When you run a set based operation, SQL Server creates a cursor and iterates through the set to perform the requested operations. On a simple set, it will create it's cursor, do the work with locking, unlocking, etc., then deliver the results to you.

    1 >< trip behind the scenes.

    When you setup a loop, every iteration requires a trip behind the scenes. If you are looping through a cursor, then you are

    1. > requesting a cursor

    2. < receiving a reference

    3. > requesting a move

    4. < receive new references

    5. > close the cursor

    6. < return

    7. > kill the cursor

    8. < return

    I've left out the locking because it varies.

    ... and if you are just looping without a cursor, but doing some data manipulation in each loop. The each iteration's behind the scenes request will be causing a new SQL Server cursor to be created, used, and destroyed just to resolve your request.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • Tom Garth (6/27/2008)


    There are some assumptions in what I'm about to say.

    That's a heck of a good explanation, Tom...

    To put it in laymen's terms, there are two ways to buy groceries...

    1. Go to the store, buy all the groceries, come home and unpack. 1 Trip (Set based... you bought a set of groceries).

    ... or... the "RBAR" method

    1. Go to the store

    2. Pick out 1 item

    3. Stand in line at checkout

    4. Pay for item

    5. Take item home

    6. Repeat until all groceries have been bought

    Which one do you think will take the longest and which one do you think will be easier on your gasoline bill?

    Same idea with set based technology vs RBAR.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/27/2008)


    Tom Garth (6/27/2008)


    There are some assumptions in what I'm about to say.

    That's a heck of a good explanation, Tom...

    To put it in laymen's terms, there are two ways to buy groceries...

    1. Go to the store, buy all the groceries, come home and unpack. 1 Trip (Set based... you bought a set of groceries).

    ... or... the "RBAR" method

    1. Go to the store

    2. Pick out 1 item

    3. Stand in line at checkout

    4. Pay for item

    5. Take item home

    6. Repeat until all groceries have been bought

    Which one do you think will take the longest and which one do you think will be easier on your gasoline bill?

    Same idea with set based technology vs RBAR.

    :hehe:

    That's just pure poetry Jeff

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers

Viewing 15 posts - 241 through 255 (of 309 total)

You must be logged in to reply to this topic. Login to reply