The "Numbers" or "Tally" Table: What it is and how it replaces a loop

  • David, I'm working on a blog post about importing data and have this little code fragment that you might find useful (I love playing with system tables):

    select name + ' = ltrim(rtrim(' + name + ')),'

    from syscolumns

    where id = (select id

    from sysobjects

    where name = '[Whatever]')

    and xtype in (35, 98, 99, 167, 175, 231, 239)

    I always import files straight into varchars and do data conversion through a sproc. Add an Update [Whatever] at the start of the block, delete the final comma, and you've got a quickie cleanup of spaces for text fields.

    Yeah, BOL says:

    Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Ok, I'm trying to use the tally as a loop.

    The problem. In one of our databases we use a table to store among other things a reference (DocumentId) to a scanned document, the number of pages and a created date. The documents are stored in the location S:\Document\YYYY\MM\DD\DocumentId-PageNumber (no extension)

    So a document created yesterday may have a document Id of 39757385977, number of pages = 3 and '5 Nov 2008' as a created date

    This maps to 3 files held on the system as

    S:\Document\2008\11\05\39757385977-1

    S:\Document\2008\11\05\39757385977-2

    S:\Document\2008\11\05\39757385977-3

    We need to export the paths of these files as we are moving to a new system.

    I have cracked it with two nested While loops, but as there are over 500,000 records, and some documents have up to 140 pages (Itemised phone bills), I guess it wont run well other than on my small test set. But then again it's a one time exercise, so don't sweat over this, but I'd like to use my Tally 🙂

    Here's the test data setup

    -- Set up test data

    IF OBJECT_ID ( 'tempdb..#t1') IS NOT NULL

    DROP TABLE #t1

    CREATE TABLE #t1 (documentId VARCHAR(30), pages INT, DateCreated datetime)

    INSERT INTO #t1 (documentId, pages, DateCreated)

    select 39757385955, 1 ,'05/11/2008' UNION

    select 39757385956, 2 ,'05/11/2008' UNION

    select 39757385957, 2 ,'05/11/2008' UNION

    select 39757385959, 1 ,'05/11/2008' UNION

    select 39757385960, 1 ,'05/11/2008' UNION

    select 39757385961, 1 ,'05/11/2008' UNION

    select 39757385962, 3 ,'05/11/2008' UNION

    select 39757385963, 1 ,'05/11/2008' UNION

    select 39757385964, 1 ,'05/11/2008' UNION

    select 39757385965, 3 ,'05/11/2008' UNION

    select 39757385966, 3 ,'05/11/2008' UNION

    select 39757385967, 1 ,'05/11/2008' UNION

    select 39757385968, 9 ,'05/11/2008' UNION

    select 39757385969, 5 ,'05/11/2008' UNION

    select 39757385971, 5 ,'05/11/2008' UNION

    select 39757385972, 1 ,'05/11/2008' UNION

    select 39757385973, 3 ,'05/11/2008' UNION

    select 39757385974, 2 ,'05/11/2008' UNION

    select 39757385975, 1 ,'05/11/2008' UNION

    select 39757385976, 1 ,'05/11/2008' UNION

    select 39757385977, 6 ,'05/11/2008' UNION

    select 39757385978, 1 ,'05/11/2008' UNION

    select 39757385980, 10 ,'05/11/2008'

    Here is my solution, using nested loops

    -- My solution

    IF OBJECT_ID ( 'tempdb..#t2') IS NOT NULL

    DROP TABLE #t2

    IF OBJECT_ID ( 'tempdb..#t3') IS NOT NULL

    DROP TABLE #t3

    CREATE TABLE #t2 (documentId VARCHAR(30), pages INT, docPath VARCHAR(1000))

    CREATE TABLE #t3 (documentId VARCHAR(30), pages INT, docPath VARCHAR(1000))

    INSERT INTO #t2

    SELECT DocumentId, pages, 'S:\Document\'

    + CAST(YEAR(DateCreated) AS VARCHAR(10))

    + '\' + RIGHT('00' + CAST(MONTH(DateCreated) AS VARCHAR(30)),2)

    + '\' + RIGHT('00' + CAST(Day(DateCreated) AS VARCHAR(6)),2)

    + '\' + CAST(DocumentId AS VARCHAR(30)) AS docPath

    FROM #t1

    WHERE DateCreated > '3 nov 2008'

    DECLARE @maxPage INT, @currentpage int

    select @maxPage = MAX(pages) FROM #t2 t

    WHILE @maxPage <> 0

    BEGIN

    SELECT @currentpage = @maxPage

    WHILE @currentPage <> 0

    BEGIN

    INSERT INTO #t3 (documentId,

    pages,

    docPath )

    SELECT documentId, pages, docPath + '-' + CAST(@currentPage AS VARCHAR(4)) docPath

    FROM #t2

    WHERE pages = @maxPage

    SELECT @currentPage = @currentPage - 1

    END

    SELECT @maxPage = @maxPage - 1

    END

    SELECT * FROM #t3

    ORDER BY docpath

    Any thoughts? 😉

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • David Jackson (11/6/2008)


    Any thoughts? 😉

    How about this:

    INSERT INTO #t3

    (documentId, pages, docPath)

    SELECT d.DocumentId, d.pages,

    'S:\Document\' + REPLACE(CONVERT(char(10), DateCreated, 111),'/','\')

    + '\' + CAST(DocumentId AS varchar(30)) + '-'

    + CAST(n.Number AS varchar(4))

    FROM #t1 AS d

    INNER JOIN dbo.Numbers AS n

    ON n.Number BETWEEN 1 AND d.pages;

    (edit: Corrected an error in the code - I had forgotten to REPLACE the forward slashes from the date formatting with backward slashing)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Excellent. 😀

    I particularly like the date formatting trick. Why didn't I think of that! :w00t:

    Many thanks Hugo. I'll run the extract later today and post some stats on the two methods. (I'm waiting to be told exactly what other columns they want extracting.) Guess which one my money's on?

    Cheers

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • 🙂 You'll like this

    Hugo's Tally Method first, my hopeless loop second.

    [font="Courier New"]

    CPU Reads Writes Duration

    --------------------------------------------------------------

    014594 880797 0000 15203 (15 Seconds)

    832344 31705470 2223 856936 (14 minutes 15 Seconds)[/font]

    We'll call that a result then 😉

    Dave J

    (edited to try and get the results to line up)


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • David Jackson (11/7/2008)


    🙂 You'll like this

    I do. Thanks for sharing your results, David! 🙂


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • David Jackson (11/7/2008)


    🙂 You'll like this

    Hugo's Tally Method first, my hopeless loop second.

    [font="Courier New"]

    CPU Reads Writes Duration

    --------------------------------------------------------------

    014594 880797 0000 15203 (15 Seconds)

    832344 31705470 2223 856936 (14 minutes 15 Seconds)[/font]

    We'll call that a result then 😉

    Wow! Those results are amazing! Great solution, and the DateFormat replace was perfect!

  • David Jackson (11/7/2008)


    Excellent. 😀

    I particularly like the date formatting trick. Why didn't I think of that! :w00t:

    Many thanks Hugo. I'll run the extract later today and post some stats on the two methods. (I'm waiting to be told exactly what other columns they want extracting.) Guess which one my money's on?

    Cheers

    Dave J

    My recommendation would be to NOT store files in a dated path... date name the files using the ISO yyyymmdd format with a suffix, instead. That way, you can get to all the files easily instead of having to go through a bazillion directories and they can still list in date sorted order.

    The rest is spot on.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm seeing the light on Tally tables. Especially the solution Hugo provided for David Jackson. This a great article and has given me reason to re-think and re-visit a bunch of code.

    Thanks!:Wow:

    -Mike

  • MikeM7 (11/14/2008)


    I'm seeing the light on Tally tables. Especially the solution Hugo provided for David Jackson. This a great article and has given me reason to re-think and re-visit a bunch of code.

    Thanks!:Wow:

    -Mike

    You bet... thanks for the good feedback, Mike.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff for an interesting, understandable and useful article. I just helped my colleague with a view were we made use of a Tally table and Row_number Over. It was a huge improvement over what we had originally planned in simplicity as well as time. And as a bonus, it's much easier to copy this process to future ones of a similar nature.

    Julie

  • Outstanding. Thanks for taking the time to post your great feedback, Julie... it would be very interesting to see your code along with brief explanation if that's possible and you have the time. A lot of folks still don't understand the performance you can get out of some very simple Tally table code and the more they see, the more they'll understand. Thanks either way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A column is populated from a webform that lists the box(es) that were checked in response to a question. The items are in one column, separated by commas (for example: 'abc,dcba,xy' or 'abc'). We needed to create a single row for each item with a sequential number for each item per id. In this case, attribs_hs_activities is the column with the list. Using a tally table (Wrt_tally) with numbers N from 1 to 10000, we were able to create a row for each item in the list. We used row_number over to create a sequential list for each id. Note: The selection below was used to create our view. Then we used the view to insert the records into the desired table.

    --=== If the test table already exist, drop it

    If object_id('tempdb..#CAN_U','U') is not null

    drop table #CAN_U

    --=== create the test table with

    create table #CAN_U

    (

    id_num int not null primary key clustered,

    ATTRIBS_HS_ACTIVITIES varchar(240) null

    )

    INSERT INTO #CAN_U

    (ID_NUM, ATTRIBS_HS_ACTIVITIES)

    select '15431','ARGHS,ASNHS,ASGHS' union all

    select '92093','ABDHS,ABBHS'

    SELECT id_num,

    SUBSTRING(','+attribs_hs_activities+',',N+1,CHARINDEX(',',','+attribs_hs_activities+',',N+1)-N-1) AS attrib_cde,

    ROW_NUMBER() OVER(partition by id_num order by id_num ) as ATTRIB_SEQ

    FROM #can_u

    CROSS join wrt_Tally

    where N < LEN(','+attribs_hs_activities+',')

    AND SUBSTRING(','+attribs_hs_activities+',',N,1) = ','

    and attribs_hs_activities IS NOT NULL

    -- RESULTS

    id_numattrib_cdeATTRIB_SEQ

    15431ARGHS1

    15431ASNHS2

    15431ASGHS3

    92093ABDHS1

    92093ABBHS2

  • Very cool and great explanation of the problem you were trying to solve. Thanks a huge amount for taking the time to post it, Julie! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • By the way, Julie... what does the prefix "Wrt" stand for?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 196 through 210 (of 511 total)

You must be logged in to reply to this topic. Login to reply