To Generate More than One Billion Numbers on the fly

Question

To Generate More than One Billion Numbers on the fly

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 1

First, recurrsion is nothing more than RBAR on steriods, only it's a bit more difficult to troubleshoot if something goes wrong. 😛

Second, this is an SQL Server 2000 forum and I'm pretty sure everyone knows that CTE's, recursive or not, are not available in 2000.

Third, if you do have 2k5, then the overhead recurrsion is absolutely NOT necessary...

WITH

L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY C) AS n FROM L5)

SELECT N FROM Nums WHERE n <= 1400000

Thank you brother Itzek. Of course, the 1400000 can be a variable.

Another way to do it in 2k5 (a bit faster on second runs) is...

SELECT TOP (1400000 )

N = ROW_NUMBER() OVER(ORDER BY sc1.Object_ID)

FROM sys.all_columns sc1,

sys.all_columns sc2

Again, the 1400000 can be a variable.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

EdVassie SSC Guru Points: 60476 More actions · Answer 2

Nice one Jeff!

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

EdVassie (1/2/2008)
Aye, but you can use OPTION (MAXRECURSION 0) to allow unlimited recursion, and get billions of rows in the result set if you need them.

Cool. Didn't notice that. Thanks

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jason Selburg SSC-Insane Points: 24641 More actions · Answer 4

Jeff Moden (12/31/2007)
Jason! Hey! How's the new job???

Just getting settled in now. I'll be better suited to give an answer/opinion in a week or so. 😀

______________________________________________________________________

Personal Motto: Why push the envelope when you can just open it?

If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

Jason L. Selburg

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

Jeff Moden (1/2/2008)
SELECT TOP (1400000 )
N = ROW_NUMBER() OVER(ORDER BY sc1.Object_ID)
FROM sys.all_columns sc1,
sys.all_columns sc2
Again, the 1400000 can be a variable.

Now - here's where roles get reversed (Jeff advances a 2005 solution, and I shoot it down:)):

One smallish caution on using Row_number() over IDENTITY(int,1,1): if you plan on using the dynamically generated number as a PK, or cluster index the sucker, SELECT INTO will mark ROW_NUMBER() as nullable, and IDENTITY as not nullable. So - you lose the perf advantage of select INTO, or you have to throw in an ISNULL or COALLESCE to patch that up.

From what I've seen, select into+IDENTITY and then alter table..ADD PRIMARY KEY is STILL the fastest method. Of course - as long as you can live with a brand new table...

For reference purposes - relative times for creating a 10M record tally table (Code used below results):

Method: Time (ms)

SELECT INTO/IDENTITY/ADD PRIMARY KEY 31203

SELECT INTO/ISNULL(ROW_NUMBER(),0)/ADD PK 37546

SELECT INTO/ROW_NUMBER()/NOT NULL/ADD PK 45953

CREATE TABLE/INSERT ROW_NUMBER()/ADD PK 103466

CREATE TABLE with PK/INSERT ROW_NUMBER() 108873

Of course - IDENTITY(int,1,1) doesn't work on an INSERT, so ROW_NUMBER() can be handy there.

Here are the queries used - let me know if I missed one:

declare @rows int

declare @g datetime

select @rows=10000000

--======================================

select @g=getdate()

drop table p2

SELECT TOP (@rows )

N = identity(int,7,2)

into p2

FROM sys.all_columns sc1,

sys.all_columns sc2

ALTER TABLE P2

ADD PRIMARY KEY CLUSTERED (n)

select 'SELECT INTO/IDENTITY/ADD PRIMARY KEY'+cast(datediff(ms,@g,getdate()) as char)

--======================================

select @g=getdate()

drop table p6

SELECT TOP (@rows )

N = isnull(ROW_NUMBER() OVER(ORDER BY sc1.Object_ID),0)

into p6

FROM sys.all_columns sc1,

sys.all_columns sc2

--ALTER TABLE P1

--alter column N int NOT NULL

ALTER TABLE P6

add PRIMARY KEY CLUSTERED (N)

select 'SELECT INTO/ISNULL(ROW_NUMBER(),0)/ADD PK:',datediff(ms,@g,getdate())

select @g=getdate()

drop table p1

SELECT TOP (@rows )

N = (ROW_NUMBER() OVER(ORDER BY sc1.Object_ID))

into p1

FROM sys.all_columns sc1,

sys.all_columns sc2

ALTER TABLE P1

alter column N int NOT NULL

ALTER TABLE P1

add PRIMARY KEY CLUSTERED (N)

select 'SELECT INTO/ROW_NUMBER()/NOT NULL/ADD PK:',datediff(ms,@g,getdate())

--==============================================================

select @g=getdate()

drop table p5

create table p5 (n INT not null)

insert p5

SELECT TOP (@rows )

N = (ROW_NUMBER() OVER(ORDER BY sc1.Object_ID))

--into p1

FROM sys.all_columns sc1,

sys.all_columns sc2

ALTER TABLE P5

ADD PRIMARY KEY CLUSTERED (n)

select 'CREATE TABLE/INSERT ROW_NUMBER()/ADD PK:',datediff(ms,@g,getdate())

--==============================================================

select @g=getdate()

drop table p3

create table p3 (n INT primary key clustered)

insert p3

SELECT TOP (@rows )

N = (ROW_NUMBER() OVER(ORDER BY sc1.Object_ID))

--into p1

FROM sys.all_columns sc1,

sys.all_columns sc2

select 'CREATE TABLE with PK/INSERT ROW_NUMBER()',datediff(ms,@g,getdate())

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 6

Aye... nicely done... glad to see you back in the swing of things.

Yep, I'm aware that ROW_NUMBER doesn't imply NOT NULL like IDENTITY does...

... but both Itzek's method and my method are great for tricking ID-TEN-T DBA's that will neither allow a permanent Tally table nor the constuction of Temp Tables and can still get the job done. Itzek's is especially useful because it also doesn't make any reference to the forbidden system tables that many DBA's forbid the use of. 😀 Just remove the comments and a good number of them simply glaze over when they see the code during a review. Not my habit to go about tricking DBA's but I just won't let anyone deprive me of the power of the Tally :w00t:

Of course, since I rarely need anything larger than 11k rows in the Tally table (or CTE or Derived Table), I'd probably remove the L5 row from Itzek's Tally CTE just to make the code even less obtrusive and maybe a tiny bit faster :hehe:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

One thing I note on the test script is no tlog truncation or checkpoint or delay to ensure that those things (tlog growth for example) didn't affect the timing. I would hope that the data file into which all of this data was serially inserted didn't grow during the successive iterations either. 🙂

Very nice example nontheless!!

And you just gotta love the efficiency of the generator BG provided!!!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 8

You're right, but it's a database with plenty of space and lots of trans log space in simple recovery mode (as in 5GB free in the data are and 4GB in the Tlogs for temp tables making up 160MB each). Nothing It just seemed like overkill.

I also rotated the order around while posting to show best to worse. In reality - the slower ones were technically on top (i.e. they had "first crack" at the resources). Running them in various order a dozen times over essentially didn't change the overal picture (sure - a few MS in either direction each time, but not much.)

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?