To Generate More than One Billion Numbers on the fly

  • First, recurrsion is nothing more than RBAR on steriods, only it's a bit more difficult to troubleshoot if something goes wrong. 😛

    Second, this is an SQL Server 2000 forum and I'm pretty sure everyone knows that CTE's, recursive or not, are not available in 2000.

    Third, if you do have 2k5, then the overhead recurrsion is absolutely NOT necessary...

    WITH

    L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

    Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY C) AS n FROM L5)

    SELECT N FROM Nums WHERE n <= 1400000

    Thank you brother Itzek. Of course, the 1400000 can be a variable.

    Another way to do it in 2k5 (a bit faster on second runs) is...

    SELECT TOP (1400000 )

    N = ROW_NUMBER() OVER(ORDER BY sc1.Object_ID)

    FROM sys.all_columns sc1,

    sys.all_columns sc2

    Again, the 1400000 can be a variable.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nice one Jeff!

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • EdVassie (1/2/2008)


    Aye, but you can use OPTION (MAXRECURSION 0) to allow unlimited recursion, and get billions of rows in the result set if you need them.

    Cool. Didn't notice that. Thanks

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff Moden (12/31/2007)


    Jason! Hey! How's the new job???

    Just getting settled in now. I'll be better suited to give an answer/opinion in a week or so. 😀

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Jeff Moden (1/2/2008)


    SELECT TOP (1400000 )

    N = ROW_NUMBER() OVER(ORDER BY sc1.Object_ID)

    FROM sys.all_columns sc1,

    sys.all_columns sc2

    Again, the 1400000 can be a variable.

    Now - here's where roles get reversed (Jeff advances a 2005 solution, and I shoot it down:)):

    One smallish caution on using Row_number() over IDENTITY(int,1,1): if you plan on using the dynamically generated number as a PK, or cluster index the sucker, SELECT INTO will mark ROW_NUMBER() as nullable, and IDENTITY as not nullable. So - you lose the perf advantage of select INTO, or you have to throw in an ISNULL or COALLESCE to patch that up.

    From what I've seen, select into+IDENTITY and then alter table..ADD PRIMARY KEY is STILL the fastest method. Of course - as long as you can live with a brand new table...

    For reference purposes - relative times for creating a 10M record tally table (Code used below results):

    Method: Time (ms)

    SELECT INTO/IDENTITY/ADD PRIMARY KEY 31203

    SELECT INTO/ISNULL(ROW_NUMBER(),0)/ADD PK 37546

    SELECT INTO/ROW_NUMBER()/NOT NULL/ADD PK 45953

    CREATE TABLE/INSERT ROW_NUMBER()/ADD PK 103466

    CREATE TABLE with PK/INSERT ROW_NUMBER() 108873

    Of course - IDENTITY(int,1,1) doesn't work on an INSERT, so ROW_NUMBER() can be handy there.

    Here are the queries used - let me know if I missed one:

    declare @rows int

    declare @g datetime

    select @rows=10000000

    --======================================

    select @g=getdate()

    drop table p2

    SELECT TOP (@rows )

    N = identity(int,7,2)

    into p2

    FROM sys.all_columns sc1,

    sys.all_columns sc2

    ALTER TABLE P2

    ADD PRIMARY KEY CLUSTERED (n)

    select 'SELECT INTO/IDENTITY/ADD PRIMARY KEY'+cast(datediff(ms,@g,getdate()) as char)

    --======================================

    select @g=getdate()

    drop table p6

    SELECT TOP (@rows )

    N = isnull(ROW_NUMBER() OVER(ORDER BY sc1.Object_ID),0)

    into p6

    FROM sys.all_columns sc1,

    sys.all_columns sc2

    --ALTER TABLE P1

    --alter column N int NOT NULL

    ALTER TABLE P6

    add PRIMARY KEY CLUSTERED (N)

    select 'SELECT INTO/ISNULL(ROW_NUMBER(),0)/ADD PK:',datediff(ms,@g,getdate())

    select @g=getdate()

    drop table p1

    SELECT TOP (@rows )

    N = (ROW_NUMBER() OVER(ORDER BY sc1.Object_ID))

    into p1

    FROM sys.all_columns sc1,

    sys.all_columns sc2

    ALTER TABLE P1

    alter column N int NOT NULL

    ALTER TABLE P1

    add PRIMARY KEY CLUSTERED (N)

    select 'SELECT INTO/ROW_NUMBER()/NOT NULL/ADD PK:',datediff(ms,@g,getdate())

    --==============================================================

    select @g=getdate()

    drop table p5

    create table p5 (n INT not null)

    insert p5

    SELECT TOP (@rows )

    N = (ROW_NUMBER() OVER(ORDER BY sc1.Object_ID))

    --into p1

    FROM sys.all_columns sc1,

    sys.all_columns sc2

    ALTER TABLE P5

    ADD PRIMARY KEY CLUSTERED (n)

    select 'CREATE TABLE/INSERT ROW_NUMBER()/ADD PK:',datediff(ms,@g,getdate())

    --==============================================================

    select @g=getdate()

    drop table p3

    create table p3 (n INT primary key clustered)

    insert p3

    SELECT TOP (@rows )

    N = (ROW_NUMBER() OVER(ORDER BY sc1.Object_ID))

    --into p1

    FROM sys.all_columns sc1,

    sys.all_columns sc2

    select 'CREATE TABLE with PK/INSERT ROW_NUMBER()',datediff(ms,@g,getdate())

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Aye... nicely done... glad to see you back in the swing of things.

    Yep, I'm aware that ROW_NUMBER doesn't imply NOT NULL like IDENTITY does...

    ... but both Itzek's method and my method are great for tricking ID-TEN-T DBA's that will neither allow a permanent Tally table nor the constuction of Temp Tables and can still get the job done. Itzek's is especially useful because it also doesn't make any reference to the forbidden system tables that many DBA's forbid the use of. 😀 Just remove the comments and a good number of them simply glaze over when they see the code during a review. Not my habit to go about tricking DBA's but I just won't let anyone deprive me of the power of the Tally :w00t:

    Of course, since I rarely need anything larger than 11k rows in the Tally table (or CTE or Derived Table), I'd probably remove the L5 row from Itzek's Tally CTE just to make the code even less obtrusive and maybe a tiny bit faster :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One thing I note on the test script is no tlog truncation or checkpoint or delay to ensure that those things (tlog growth for example) didn't affect the timing. I would hope that the data file into which all of this data was serially inserted didn't grow during the successive iterations either. 🙂

    Very nice example nontheless!!

    And you just gotta love the efficiency of the generator BG provided!!!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • You're right, but it's a database with plenty of space and lots of trans log space in simple recovery mode (as in 5GB free in the data are and 4GB in the Tlogs for temp tables making up 160MB each). Nothing It just seemed like overkill.

    I also rotated the order around while posting to show best to worse. In reality - the slower ones were technically on top (i.e. they had "first crack" at the resources). Running them in various order a dozen times over essentially didn't change the overal picture (sure - a few MS in either direction each time, but not much.)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply