Temp number list table

  • What is the most efficient way of creating a tempory number list table where there is only a int key column and the number of rows in the the column = @variable? 

  • Here's one way... nasty fast... takes 60-123 milliseconds for 11,000 rows (more than 30 years worth of dates if needed) and 5.8 to 9.2 seconds for a million rows (if TempDB doesn't need to grow)...

    --===== If the temporary Tally (numbers) table exists, drop it

         IF OBJECT_ID('TempDB..#Tally') IS NOT NULL

      BEGIN

            TRUNCATE TABLE #Tally --Very fast and makes the drop a bit faster

            DROP TABLE #Tally

        END

    --===== Declare and set a variable to hold the number of desired rows

    DECLARE @Rows INT

        SET @Rows = 11000

    --===== Limit the processing to the number of rows desired

        SET ROWCOUNT @Rows

        SET NOCOUNT ON

    --===== Create and populate the Tally table on the fly

     SELECT IDENTITY(INT,1,1) AS N

       INTO #Tally

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Change the processing limit back to normal

        SET ROWCOUNT 0

    --===== Add a Primary Key to maximize performance

      ALTER TABLE #Tally

            ADD CONSTRAINT PK_tmpTally_N PRIMARY KEY CLUSTERED (N)

    My question would be (because there might be something better)... why don't you just create a permanent tally table?  What is it that you are doing?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As new data is added to the base table the number of rows required may change. 

    The statement used is like:

    INSERT INTO Deaths (

     Year_of_death,

     Ageyears,

     Sex,

     Undcause,

     ICD,

     OAcode)

    select  Year_ref, 

     Age_start,

     Sex,

     ICD_ref, 

     ICD_number, 

     'XXXXXXXXXX' 

    from Eng_Wales_Deaths t

    join Number_list n

    on n.number < t.Person_count

    WHERE  n.number < @Max_person_count

    Update:

    I have just found that I don't think that I have enough memory to do this statement.  Can anyone offer an alternative approach?

     

     

     

  • Is the column name in the #Tally table: PK_tmpTally_N?

    Update:

    I can see now that it is N

  • I guess I don't have enough information... first, I'm not sure why you need this...

    join Number_list n

    on n.number < t.Person_count --This makes a triangular join!!!

    WHERE  n.number < @Max_person_count

    Second, are you aware that this makes triangular join (a bit more than half a cross join)??? Depending on the number of records you have, this could make millions of apparent joins (sometimes, billions).... the formula for the number of apparent records in a triangular join is...

    (X2+X)/2

    ... where X would be 1 less than t.Person_Count in this case.  If that number is, for example, 10,000 (very small table), then the number of apparent records processed will be (9,9992 + 9,999)/2 or a whopping big 49,995,000 records!!!  If you have 100,000 records (considered to be small to medium sized table), then number becomes insanely large at 4,999,950,000!!!!!!!!!!  That's right... almost 5 billion apparent joined records will need to be resolved!

    Like I said, I think you may have a flaw in your logic but I don't have enough info to help you figure it out.  Help me help you... please explain why you think the join to the numbers table is necessary...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What's the difference between a triangular and half cross join...

     

    and when the heck do I need to use that !?!?

  • The system was designed for small geographical area statistics.  But my boss wants it to perform the same operation for England and Wales data.  The small area statistics has a seperate row for each death.  The England and Wales data uses Person count instead.  I need to get the England and Wales data in to the same form as the local area data.  This is why I perform the triangular join.  After I perform the triangular join it offers me, for 2003,  363049 rows.  But the number of rows should be 538254.  I changed a few less-than signs in the equation above to <=.  Maybe I should try again but limit the number of years I filter on.  This would tell me if I have a memory problem or not.

    Update:

    Reducing the size of the table does not make any difference.  I don't therefore think that it is a capacity problem.  There must be somthing wrong with the code or the base table.  Update B :  It is a base table problem.  I'm still using your code above though.  Very useful.  Thanks.

  • X/2 records is the difference

    Not sure why Mark thinks one is necessary here even with his latest explanation.  Still trying to figure it out...

    So far as triangular joins go, have seen lot's of folks make the mistake of using it to make running totals, running counts, and the like... get's real slow if more than 10k records are in the source table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • quoteUpdate B :  It is a base table problem.  I'm still using your code above though.  Very useful.  Thanks.

    Thanks Mark... I'm still trying to figure out why the triangular join is necessary at all...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Any exemple of when it should be used??

  • Sure, Remi... thanks for the interest...

    Personally, I think triangular joins should never be used because it's an absolutely insane form of RBAR (a "Modenism" for "Row By Agonizing Row").  Here's an example that calculates both a running total and a running count... this won't look bad in the estimated execution plan (another trap! The correlated subqueries hide the problem!!) but takes huge amounts of time after you approach the 10k record mark as you'll see if you take a look at the record counts in the actual execution plan (the row counts are, in fact, (X2+X)/2) ...

        USE NorthWind

     SELECT x.OrderID,

            x.Freight,

            (SELECT SUM(y.Freight)   FROM dbo.Orders y WHERE y.OrderID <= x.OrderID) AS RunningTotal,

            (SELECT COUNT(y.Freight) FROM dbo.Orders y WHERE y.OrderID <= x.OrderID) AS RunningCount

      FROM dbo.Orders X

     ORDER BY x.OrderID

    ... as you can see (triangular joins in bold blue), each correlated subquery must be executed for each row in the outer select.  A triangular join is included in each subquery which will really bog things down on larger tables.  It IS a very simple way to code running totals/counts and even ranking (2005 has a better way) but it is NOT scalable... the number of apparent rows grows exponentionally...

    If you have a Tally or Numbers table that goes to 10K, you can try this...

    --===== This example "only" takes 1 second to process...

         -- ... I think 1000 rows per second is horribly slow ... but wait for it...

     SELECT x.N,

            (SELECT SUM(y.N)   FROM dbo.Tally y WHERE y.N <= x.N) AS RunningTotal,

            (SELECT COUNT(y.N) FROM dbo.Tally y WHERE y.N <= x.N) AS RunningCount

      FROM dbo.Tally X

     WHERE x.N<=1000

     ORDER BY x.N

    --===== This example takes 56 times longer but only has 10 times the number of records

     SELECT x.N,

            (SELECT SUM(y.N)   FROM dbo.Tally y WHERE y.N <= x.N) AS RunningTotal,

            (SELECT COUNT(y.N) FROM dbo.Tally y WHERE y.N <= x.N) AS RunningCount

      FROM dbo.Tally X

     WHERE x.N<=10000

     ORDER BY x.N

    And those are just normal running totals/counts... not even grouped by user or anything... you can just imagine what the next order of magnitude would be 

    By the way... the last row of the RunningTotal column, in this case, just happens to contain the value of (X2+X)/2 (the number of apparent records processed), but because we have two triangular joins, you have to double that number to appreciate how hard these darned triangular joins make the computer work.

    In fact, I'm writing an article for Steve Jones on how to get around this hidden performance sapper as we speak... The method I'm working on for the article will do a grouped running total on 1 MILLION records in about 13 seconds or so and 4 MILLION records in about 58 seconds.  Takes about the same amount of time to do a flat running total, too.  And, it's scalable... the duration grows in a close to linear fashion when compared to the number of records.

    I won't even get into how bad triangular joins are when they are used in a WHERE clause in a single select 'cause we kind of covered that in the correlated subqueries above.

    (UPDATE) Hope this doesn't sound to much like diatribe but these forms of hidden RBAR are a very serious problem in most Developer's code and (1) they never see it coming and (2) they either don't know how to fix it or will resort to a cursor to fix it.  Like I said, I'm writing an article on the problem and the fix (will be part II of a performance series of articles), but here's a tease...

    SQL Server 2005 has some very neat tricks for doing running counts using the new "Ranking" functions, but what about those of us still using SQL Server 2000? Are we stuck with Cursors or the horrendously slow triangular joins that use "<" to create running totals or running counts? Not hardly…

    SQL Server has a very unique form of UPDATE that most never give a second thought to. Here’s the syntax…

    UPDATE sometablename

       SET @somevariable = somecolumnname = someexpresion/calculation

    Yep, there it is in black and white… you can set the value of a variable and a column at the same time!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok I confess I wrote a few of those... Can't wait for the other solution now .

  • Yap... we've all written "those"... see the update I just posted to my previous message...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ain't that update based on the clustered index???  I already wrote that for something else and I was always wary that something could go wrong if I didn't have exclusive access on the table...

  • Yep... and you've nailed the key to it... exclusive access... now, what table(s) will a developer ALWAYS have exclusive access to?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply