The "Numbers" or "Tally" Table: What it is and how it replaces a loop

  • I did my own optimization on simple Tally tables, and it looks very good in my initial tests with a 99881 characters long list of IDs I am splitting. The result came by adding a few new tweaks to an example I saw in another article, so I like some feedback!

    Here is my code (replace smiley with ')', it is added by forum and not under my control):

    create function dbo.fnTally( @input varchar(max) ) returns table

    as

    return

    (

    select top ( isnull( len( @input ), 0 ) )

    row_number() over ( order by anchor.constant ) as 'N'

    from

    ( select 0 as constant ) anchor

    cross join master.sys.All_Columns as t1 with( nolock )

    cross join master.sys.All_Columns as t2 with( nolock )

    )

    ;

    Test code (100001 chars):

    declare @STR varchar(max)

    set @STR = replicate( convert( varchar(max), '1,32,' ), 20000 ) + '5';

    select len( @STR );

    select * from dbo.fnTally( @STR ) order by N;

    As you can see, I evaded a SQL constraint of not being able to deal with constants in the mandatory 'over ( order by ... )'. This resulted in the effect that only the number of rows and not the values in them matter. This opens a way to (yet unexplored) denser source tables.

    Another thing I did instinctively, but not measured its effect off is to use no locking (by means of a table hint) in a bid to prevent read locks coming into play and limiting resource use.

    In my testing of using the shown tally function on the test input the estimated sub-tree cost of my whole query wend from 0.0131634 to 0.0071212 which is an improvement of over 45% !!! The benefit comes entirely from a no longer performed merge join that resulted from sorting on a column from the source table.

    Another thing I wanted to try but did not do yet do is to make the 2nd cross join a conditional one by encapsulating it in a derived query that handles the condition. This would then just return 1 row in case the cross join is really not needed in full. The optimizer might already do this silently, but it wouldn't hurt to try and cut out some more I/O. It is just an idea.....might not work at all or simply backfire. I will try such a thing when I find some time and if it works will post here again!

  • peter (9/25/2008)


    In my testing of using the shown tally function on the test input the estimated sub-tree cost of my whole query wend from 0.0131634 to 0.0071212 which is an improvement of over 45% !!! The benefit comes entirely from a no longer performed merge join that resulted from sorting on a column from the source table.

    Cool... let's see the code for the test setup you did.

    Another thing I wanted to try but did not do yet do is to make the 2nd cross join a conditional one by encapsulating it in a derived query that handles the condition. This would then just return 1 row in case the cross join is really not needed in full. The optimizer might already do this silently, but it wouldn't hurt to try and cut out some more I/O. It is just an idea.....might not work at all or simply backfire. I will try such a thing when I find some time and if it works will post here again!

    I've found that it's not necessary to do that. A cross join will first grab all the rows from just one of the tables before it starts on the other. Even after that, it's very linear.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As you can see Jeff, I have some *** synthetic *** test code in my post simulating the counting portion of the process. The 45% improvement is what the query plan shows when I compare a version with this optimization against one without. The version without is not using a derived query and performs an order by on a real field instead of the fabricated constant. The improvement in practice will be less of course as the counting is just part of a larger task to be performed. I mostly like the function because of its simplicity and having the option to replace it with something better if that comes along without modifying existing queries.

    Quietly I was hoping you could put this function into one of your no doubt many practical tests and see what impact it will give. I just kind of bumped into this and have no direct non synthetic problem at hand that is big enough to show a clear outcome. In fact I have no access to any production server this and coming days at all (moving office).

  • Heh... I can show you a query plan for two queries... one comes up with 100% of the batch size and the other 0%... when you run the queries, the 100% of the batch size takes a tiny, tiny fraction of what the 0% one does. I don't trust % of batch nor do I trust sub-tree costs... nor should you.

    When I get home from work, I'll see if I can find those queries and post them.

    I'm not sure that I'll actually have the time to test your good code, Peter... new gig has be driving 1:15:00 one way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As I should, I will do some thorough testing with a recognizable real world problem. It will take me a few days before I can do so and am already chewing on another idea :).

    Stay tuned!

  • Jeff, this is a great article.

    I do have a question as I am trying to apply your solution to my needs.

    I have hundreds of addresses in a single address column like this 1111 west street

    When I employ you solution I get rows like this:

    1111

    west

    street

    Now I need to pivot the results into their own columns like this:

    Number,name,suffix

    1111,west,street

    would be be so kind as to show me how?

    Thanks

  • GF (10/14/2008)


    Jeff, this is a great article.

    I do have a question as I am trying to apply your solution to my needs.

    I have hundreds of addresses in a single address column like this 1111 west street

    When I employ you solution I get rows like this:

    1111

    west

    street

    Now I need to pivot the results into their own columns like this:

    Number,name,suffix

    1111,west,street

    would be be so kind as to show me how?

    Thanks

    Hi GF,

    If that's what you need, then forget about the solution using a table of numbers. That solution is great for breaking components up into several rows. If you need them in several columns, you need a different technique.

    About two years ago, I wrote a blog post demonstrating such a technique. It uses the example of first name; middle initial; last name, but the techniques to be used remain the same. Here's the URL:

    http://sqlblog.com/blogs/hugo_kornelis/archive/2006/10/12/Stuff.aspx

    Edit: Fixed the URL


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • GF (10/14/2008)


    Jeff, this is a great article.

    I do have a question as I am trying to apply your solution to my needs.

    I have hundreds of addresses in a single address column like this 1111 west street

    When I employ you solution I get rows like this:

    1111

    west

    street

    Now I need to pivot the results into their own columns like this:

    Number,name,suffix

    1111,west,street

    would be be so kind as to show me how?

    Thanks

    Thanks for the feedback, GF.

    I can show you how to pivot the columns back pretty easily, but the problem with addresses is that they don't always have the same components. For example, you may have an address that looks like 123 East 39th Street. I'd recommend purchasing something like ZP4 or some other CASS certification program that will not only do the split correctly for you, but will also validate the address as being real or not.

    For your "pivot", do you have a maximum number of columns an address would be split into?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • peter (9/30/2008)


    As I should, I will do some thorough testing with a recognizable real world problem. It will take me a few days before I can do so and am already chewing on another idea :).

    Stay tuned!

    Peter, (or anyone else interested)...

    Here's the code where the "Percent of Batch" is, quite literally, 100% wrong.

    SET NOCOUNT ON

    --=======================================================================================

    -- Recursive method shown by (Name with-held)

    --=======================================================================================

    PRINT '========== Recursive method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @DateVal DATETIME

    SET @DateVal = '2008-01-01'

    ;with mycte as

    (

    select @DateVal AS DateVal

    union all

    select DateVal + 1

    from mycte

    where DateVal + 1 < DATEADD(yy, 5, @DateVal)

    )

    select @Bitbucket = d.dateval

    from mycte d

    OPTION (MAXRECURSION 0)

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    GO

    --=======================================================================================

    -- Tally table method by Jeff Moden

    --=======================================================================================

    PRINT '========== Tally table method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @StartDate AS DATETIME

    SET @StartDate = '2008-01-01'

    SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))

    @Bitbucket = @StartDate-1+t.N

    FROM Tally t

    ORDER BY N

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, thank you for taking the time to listen.

    My current needs are to split an address into two parts.

    example: 1234 south street needs to be split into 1234 and south street. I am already able to do this part.

    the problem I am having is when there is only a number like 4567 and no street name.

    example: 4567

    So what I am try to get to is this:

    column1 column2

    1234 south street

    4567

    876 west

    What I currently have is

    column1

    123 north ave

    Thanks in advance.

    Gary

  • Hey, Gary. I just worked on cleaning up a big address table, talk about fun! Anyway, how about something like this:

    select left(Address, charindex(' ', Address) - 1) as StreetNum,

    right(rtrim(Address), len(rtrim(Address)) - charindex(' ', Address)) as Street

    from [whatever]

    where len(rtrim(Address)) > 1

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • GF (10/15/2008)


    Jeff, thank you for taking the time to listen.

    My current needs are to split an address into two parts.

    example: 1234 south street needs to be split into 1234 and south street. I am already able to do this part.

    the problem I am having is when there is only a number like 4567 and no street name.

    example: 4567

    So what I am try to get to is this:

    column1 column2

    1234 south street

    4567

    876 west

    What I currently have is

    column1

    123 north ave

    Thanks in advance.

    Gary

    Gary.... I didn't answer because I thought that Wayne West's code probably did it for you. But, I gotta check... are you all set?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wayne West (10/16/2008)


    Hey, Gary. I just worked on cleaning up a big address table, talk about fun! Anyway, how about something like this:

    select left(Address, charindex(' ', Address) - 1) as StreetNum,

    right(rtrim(Address), len(rtrim(Address)) - charindex(' ', Address)) as Street

    from [whatever]

    where len(rtrim(Address)) > 1

    Since LEN ignores trailing spaces the RTRIM is not strictly necessary but then who knows if MS will change this in the future 😉

    ...the problem I am having is when there is only a number like 4567 and no street name...

    The above query will not cope with this situation

    This is my solution based on Waynes query

    SELECT LEFT(Address, CHARINDEX(' ', Address + ' ') - 1) AS [StreetNum],

    SUBSTRING(Address, CHARINDEX(' ',Address+' ') + 1, 255) AS [Street]

    FROM

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Huh, somehow I never noticed that in the documentation about Len() ignoring trailing spaces. Then again, chances are that I never opened BOL for Len() and just pulled it from previous programming experience. Our ERP system has trailing spaces on char fields, so I make it a habit of always trimming when dealing with any form of string manipulation.

    Thanks for the catch, David. Nice solution. 🙂

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Wayne West (10/29/2008)


    Huh, somehow I never noticed that in the documentation about Len() ignoring trailing spaces. Then again, chances are that I never opened BOL for Len() and just pulled it from previous programming experience. Our ERP system has trailing spaces on char fields, so I make it a habit of always trimming when dealing with any form of string manipulation.

    Thanks for the catch, David. Nice solution. 🙂

    Thanks Wayne 😀

    Don't know if or where it is documented, just came across it when writing sql in the past and it is mentioned on threads on this forum.

    I too trim data a lot as well, especially on data import.

    Far away is close at hand in the images of elsewhere.
    Anon.

Viewing 15 posts - 181 through 195 (of 511 total)

You must be logged in to reply to this topic. Login to reply