Writing Faster T-SQL

Question

Writing Faster T-SQL

jacob sebastian

SSChampion

Points: 11822
More actions
March 26, 2007 at 6:45 pm

#123254

Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/jSebastian/2944.asp
.

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004581 More actions · Answer 1

Jacob,

Your idea is great but your code is too complex for what it does and it takes too long to run... for example, you say the code to produce a million numbers takes a minute or two to run... the following does the same thing and only takes 21 seconds to run...

--===== Create and populate the test table on the fly

SELECT TOP 10000000

IDENTITY(INT,1,1) AS Number

INTO dbo.TestTable

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

...and still you end up with nothing real to test on.

Thought I'd take this opportunity to show you the test table I use for performance testing on a great number of different methods... the basis of the code can be used to gen just about any test scenario you please... and it's stupid simple...

--===== Create and populate a test table.

-- Column "SomeDate" has a range of >=01/01/2000 <01/01/2010

SELECT TOP 10000000

RowNum = IDENTITY(INT,1,1),

SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

SomeCSV = 'Part1,Part2,Part3,Part4,Part5,Part6,Part7,Part8,Part9,Part10',

SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

--===== Add a primary key just to make it a "real" table...

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (RowNum)

Of course, you can easily change the number of rows and, with a bit of forethought, you can change the ranges of numbers/values produced by simply changing some of the multipliers or additive offsets.

The above code will gen a million rows (including the Clustered PK) in about 43 seconds and 10 million rows in a bit over 8 minutes.

Like I said, I really appreciate your efforts in your article, but there are easier ways to gen volumes of test data...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jacob sebastian SSChampion Points: 11822 More actions · Answer 2

Hi Jeff,

Thanks for sharing your code. This is indeed a nice approach.

If you look at my last example, I had said 2.45 minutes for 5 million rows, which also include an applicaton specific logic used for the generated records. I used a CTE to present the idea that rows can be really generated ON THE FLY. If we use a physical table, instead of a CTE, the query be would much faster.

I appreciate your feedback.

After the introduction of Visual Studio Team Edition For Database Professionals widely known as dbpro or datadude, test data generation has been much easier and productive. dbpro has tools that can generate test data automatically for the unit-tests that you write.

Thanks again

Jacob

.

Jeff Moden SSC Guru Points: 1004581 More actions · Answer 3

Heh... Thanks Jacob...

Ok... apples to apples on the last bit of code... the following takes 1:09 including the PK.

--===== Create and populate a test table.

-- Column "SomeDate" has a range of >=01/01/2000 <01/01/2010

SELECT TOP 5000000

RowNum = IDENTITY(INT,1,1),

SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

SomeDate = CAST(FLOOR(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0) AS DATETIME)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

--===== Add a PK

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (RowNum)

Again, you can tweek the ranges of data to anything you'd like.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jacob sebastian SSChampion Points: 11822 More actions · Answer 4

jacob sebastian

SSChampion

Points: 11822

May 22, 2007 at 12:41 am

#707975

Hi Jeff,

I agree with you. You won

Jacob

.

Andy Warren SSC Guru Points: 119902 More actions · Answer 5

I thought the article was badly named - should have been something 'Building Test Data Faster with TSQL' or something like that.

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

SuperDBA-207096 SSCrazy Eights Points: 8176 More actions · Answer 6

Nice article!

It is a pretty good example of CTE.... if you think about some other logic in the CTE other than just creating 10, maybe something more complex.... it would be much faster than using a loop....

Mark

Jeff Moden SSC Guru Points: 1004581 More actions · Answer 7

Actually, I'm sorry... I normally don't turn things into a race. Just meant to show an alternative.

No matter what method you use, your article hits upon something that a lot of developers overlook... testing for performance and creating volumes of data to do just that. Good article, Jacob.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jacob sebastian SSChampion Points: 11822 More actions · Answer 8

Hi jeff,

I have taken your comment in the CORRECT sense only. It was really valuable and I see that the approach you suggested is better. I would welcome your comments in the future articles too.

Believe me, I too, did not find it to be a race. It was just a friendly conclusion that I put on my previous comment.

Jacob

.

Daniel Wilson-437873 Grasshopper Points: 17 More actions · Answer 9

thanks for the intro to CTE's! I know I've written the same SELECT statement into 2 parts of a query before. No more ... ah ... once we get all our customers to switch to 2005!

Kathi Kellenberger SSChampion Points: 11811 More actions · Answer 10

I ran across a way to generate the numbers list in Itzik Ben-Gan's book "Inside Microsoft SQL Server 2005 T-SQL Querying". Itzik is a genious, and I highly recommend his book. Here is the code:

WITH

L0

AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows

L1

AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

L2

AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

L3

AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

L4

AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

L5

AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

num

AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)

SELECT

N FROM NUM WHERE N <= 1000000;

Aunt Kathi Data Platform MVP
Author of Expert T-SQL Window Functions
Simple-Talk Editor

Alexander Kuznetsov SSCrazy Points: 2217 More actions · Answer 11

Jacob,

Use UNION ALL instead of UNION unless you explicitly need to remove duplicates - this might perform faster.

Jeff Moden SSC Guru Points: 1004581 More actions · Answer 12

Just out of curiosity, can someone who has SQL Server 2005 (unfortunately, I don't) tell me how long that bit of code takes to run? Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

paul_richardson SSCarpal Tunnel Points: 4594 More actions · Answer 13

Jeff,

I ran the code on SQL Server 2005 and it generated 1000000 numbers in 9 seconds. Quite impressive.

(the server in question is using 4 CPUS and 2GB memory.)

Paul

jacob sebastian SSChampion Points: 11822 More actions · Answer 14

This is quite intersting!

Took 8 seconds on my laptop running SQL Server Express edition.

(Toshiba Satellite Pro, 1 GB RAM, single processor)

- Jacob

.

Writing Faster T-SQL

Cookies on SQLServerCentral