The "Numbers" or "Tally" Table: What it is and how it replaces a loop

  • You folks are absolutely the Best! I'm humbled by your awesome compliments and the testing that alot of you have taken the time to do and post.

    I've got a lot of replies to make to all of you and I'll jump on those tonight... contrary to popular myth, I actually do work between answering posts 😛 Thanks again, folks! Keep 'em coming! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What - only 12 useful tips? in a single article?:hehe:

    Stop it now - you're setting the bar too high for the rest of us....

    (superlative as usual).....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I am using the Tally table explanation from this article to parse out an incoming string that consists of whole records. By design, I know that the record has 3 values: an int, a datetime, and a float. What I need to get out are sets of those records parsed from the input string. The following SQL works fine (but I have questions):

    [font="Courier New"]declare @param varchar(8000)

    set @param = '101,2008.04.02,1.1,102,2008.04.03,2.2,103,2008.04.04,3.3'

    declare @result table(idx int identity(1,1), val int, ord datetime, fval float)

    declare @raw table (idx int identity(1,1), val varchar(50))

    set @param = ',' + @param + ','

    insert into @raw (val)

    select substring(@param, N+1, charindex(',',@param,N+1)-N-1)

    from dbo.Tally where N<len(@param) and substring(@param,N,1)=','

    insert into @result(val,ord,fval)

    select cast(R1.val as int), convert(datetime, R2.val, 102), convert(float, R3.val)

    from dbo.Tally T

    join @raw R1 on T.N=R1.idx

    join @raw R2 on T.N+1=R2.idx

    join @raw R3 on T.N+2=R3.idx

    where (T.N-1)%3 = 0

    select * from @result[/font]

    As you can see I am parsing into a raw table that is varchar data with an index, then using that table to move sets of 3 items into a new table with the correct data type conversion.

    My questions are:

    1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while still being somewhat readable and understandable)?

    2) If I remove the identity index from the @result table this stops working. The data conversions fail unless the @result table has the first column as an identity. All I do is insert the 3 converted items into @result, so I can't even begin to imagine why I need the index column. Does anyone else know why that happens?

    Thanks for the extremely interesting and useful article, and for any assistance with these questions.

  • Thanks for taking the time to explain this, Jeff. It was very helpful.

  • Carl Federl (5/7/2008)


    Another great article ! Thanks

    The first reference for a tally table that I can recall is in "Guide to Sybase and SQL Server" by C. J. Date and D.McGoveran published in June of 1992.

    Does anyone know of an earlier reference ?

    Carl, what is the page number of that? As it happens, I have the '92 edition in front of me (no idea why!) and couldn't find it readily.

    It was a great book for its time, it was one of the first two books that I bought when we got 4.21a in the door.

    Thanks!

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Excellent article, Jeff! I came across an interesting date table article on Tech Republic that was linked here yesterday after it re-surfaced at BuilderAU.Com.au, I updated the loader to run from a tally table and the speed diff was just amazing.

    I have not yet gotten in to string parsing with a tally table, but I'll get there eventually. I did have an amusing little time re-writing the Bizz/Buzz test using it:

    [font="Courier New"]select case when (Number % 15 = 0) then 'BizzBuzz'

    when (Number % 3 = 0) then 'Bizz'

    when (Number % 5 = 0) then 'Buzz'

    else cast(Number as char)

    end

    from zNumbers

    where Number < 100[/font]

    Keep up the good work, Jeff!

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • For the use of a tally table or other mechanism, populating a variable with sample data like the above is completely wrong! Those delimited values would never be passed by an application. The proper approach for generating sample data is to use RANDOM values.

    John,

    Since the only important bit of the data for the performance test is the number of commas using random data is overkill for this. I would, however, agree that using an ordered set is a mistake since you don't notice that the routine ends up sorting the results instead of returning them in the order they were passed as would be necessary for parameter passing

    --

    JimFive

  • Gotta throw one more on the pile. Greta Article Jeff. I've been using Numbers tables for a while now, but this is the best explanation I've ever seen o nthe topic.

    Thanks.

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • 1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while still being somewhat readable and understandable)?

    Probably, but I haven't figured it out yet. One thing I would consider is, instead of splitting everything out at once. First, split everything into groups of 3 and then split the groups of 3 into your final table. That is, instead of selecting every comma, select every 3rd comma (using N%3 = 0) then use that query as the source for your actual query. This would allow you to get rid of the identity column as well since you are keeping your data grouped together. (See next response)

    2) If I remove the identity index from the @result table this stops working. The data conversions fail unless the @result table has the first column as an identity. All I do is insert the 3 converted items into @result, so I can't even begin to imagine why I need the index column. Does anyone else know why that happens?

    Since tables are unordered, if you remove the identity then your data won't be kept adjacent in the intermediate table.

    --

    JimFive

  • Jeff Marcus (5/7/2008)


    I am using the Tally table explanation from this article to parse out an incoming string that consists of whole records. By design, I know that the record has 3 values: an int, a datetime, and a float. What I need to get out are sets of those records parsed from the input string. The following SQL works fine (but I have questions):

    [font="Courier New"]declare @param varchar(8000)

    set @param = '101,2008.04.02,1.1,102,2008.04.03,2.2,103,2008.04.04,3.3'

    declare @result table(idx int identity(1,1), val int, ord datetime, fval float)

    declare @raw table (idx int identity(1,1), val varchar(50))

    set @param = ',' + @param + ','

    insert into @raw (val)

    select substring(@param, N+1, charindex(',',@param,N+1)-N-1)

    from dbo.Tally where N<len(@param) and substring(@param,N,1)=','

    insert into @result(val,ord,fval)

    select cast(R1.val as int), convert(datetime, R2.val, 102), convert(float, R3.val)

    from dbo.Tally T

    join @raw R1 on T.N=R1.idx

    join @raw R2 on T.N+1=R2.idx

    join @raw R3 on T.N+2=R3.idx

    where (T.N-1)%3 = 0

    select * from @result[/font]

    As you can see I am parsing into a raw table that is varchar data with an index, then using that table to move sets of 3 items into a new table with the correct data type conversion.

    My questions are:

    1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while still being somewhat readable and understandable)?

    2) If I remove the identity index from the @result table this stops working. The data conversions fail unless the @result table has the first column as an identity. All I do is insert the 3 converted items into @result, so I can't even begin to imagine why I need the index column. Does anyone else know why that happens?

    Thanks for the extremely interesting and useful article, and for any assistance with these questions.

    > 1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while

    > still being somewhat readable and understandable)?

    You're using the intermediate table to generate ordinal values. If you are using SQL Server 2005 you

    can use ROW_NUMBER instead. Also you can replace the three joins with a single "group by" [in my experience this is more efficient].

    with cte as (

    select row_number() over(order by N) - 1 as num,

    substring(@param, N+1, charindex(',',@param,N+1)-N-1) as val

    from dbo.Tally

    where N<len(@param) and substring(@param,N,1)=',')

    insert into @result(val,ord,fval)

    select cast(max(case when num%3 = 0 then val end) as int) as val,

    convert(datetime, max(case when num%3 = 1 then val end), 102) as ord,

    convert(float, max(case when num%3 = 2 then val end)) as fval

    from cte

    group by num/3

    select * from @result

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • John,

    Since the only important bit of the data for the performance test is the number of commas using random data is overkill for this.

    I disagree to a bit. It also affects the amount of parsing that is being done. Using random values of varying lengths is the proper approach. In addition, the length of the string will also affect the results, particularly when you cross over the 8,000 character (4,000 for UNICODE) limit and get into [N]VARCHAR(MAX).

    I've been involved in this tally table vs. a table-valued function discussion before. I did some benchmarking and found some very interesting results when the list of values was not random data and/or the length of the data was short vs. long.

    I'm not knocking the Tally table. I'm just stating that proper performance testing requires the use of good random data. I'm been around the block for a long time and have seen too many instances where the application worked fine in the performance tests with "generated" data but had major performance problems when "real" data was used -- as in customer data. In the "post-mortem", we'd find that the "generated" data was of such a nature that it was "helping out" the database.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • This is probably a rediculous question, but how can any of your "using the tally table" examples work if you don't declare N? I guess I just don't see how N is getting created/assigned? Thanks!

  • buktseat (5/7/2008)


    This is probably a rediculous question, but how can any of your "using the tally table" examples work if you don't declare N? I guess I just don't see how N is getting created/assigned? Thanks!

    N is a column (or rather: the column) in the Tally table.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • HEEEEYY!!! It's funny how it all snaps into focus once you get past those mundane details! 🙂 I knew it was something rediculous I was missing...

    Thanks!

  • Jeff,

    Plz don't forget to reply me with suggestion and help i requested on page 1 of this disussion.

    Shamshad Ali.

Viewing 15 posts - 31 through 45 (of 511 total)

You must be logged in to reply to this topic. Login to reply