March 26, 2007 at 6:45 pm
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/jSebastian/2944.asp
.
May 21, 2007 at 10:42 pm
Jacob,
Your idea is great but your code is too complex for what it does and it takes too long to run... for example, you say the code to produce a million numbers takes a minute or two to run... the following does the same thing and only takes 21 seconds to run...
--===== Create and populate the test table on the fly
SELECT TOP 10000000
IDENTITY(INT,1,1) AS Number
INTO dbo.TestTable
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
...and still you end up with nothing real to test on.
Thought I'd take this opportunity to show you the test table I use for performance testing on a great number of different methods... the basis of the code can be used to gen just about any test scenario you please... and it's stupid simple...
--===== Create and populate a test table.
-- Column "SomeDate" has a range of >=01/01/2000 <01/01/2010
SELECT TOP 10000000
RowNum = IDENTITY(INT,1,1),
SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),
SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),
SomeCSV = 'Part1,Part2,Part3,Part4,Part5,Part6,Part7,Part8,Part9,Part10',
SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),
SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN
--===== Add a primary key just to make it a "real" table...
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
Of course, you can easily change the number of rows and, with a bit of forethought, you can change the ranges of numbers/values produced by simply changing some of the multipliers or additive offsets.
The above code will gen a million rows (including the Clustered PK) in about 43 seconds and 10 million rows in a bit over 8 minutes.
Like I said, I really appreciate your efforts in your article, but there are easier ways to gen volumes of test data...
--Jeff Moden
Change is inevitable... Change for the better is not.
May 21, 2007 at 11:39 pm
Hi Jeff,
Thanks for sharing your code. This is indeed a nice approach.
If you look at my last example, I had said 2.45 minutes for 5 million rows, which also include an applicaton specific logic used for the generated records. I used a CTE to present the idea that rows can be really generated ON THE FLY. If we use a physical table, instead of a CTE, the query be would much faster.
I appreciate your feedback.
After the introduction of Visual Studio Team Edition For Database Professionals widely known as dbpro or datadude, test data generation has been much easier and productive. dbpro has tools that can generate test data automatically for the unit-tests that you write.
Thanks again
Jacob
.
May 21, 2007 at 11:57 pm
Heh... Thanks Jacob...
Ok... apples to apples on the last bit of code... the following takes 1:09 including the PK.
--===== Create and populate a test table.
-- Column "SomeDate" has a range of >=01/01/2000 <01/01/2010
SELECT TOP 5000000
RowNum = IDENTITY(INT,1,1),
SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),
SomeDate = CAST(FLOOR(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0) AS DATETIME)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN
--===== Add a PK
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
Again, you can tweek the ranges of data to anything you'd like.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 22, 2007 at 12:41 am
Hi Jeff,
I agree with you. You won 🙂
Jacob
.
May 22, 2007 at 5:32 am
I thought the article was badly named - should have been something 'Building Test Data Faster with TSQL' or something like that.
May 22, 2007 at 6:26 am
Nice article!
It is a pretty good example of CTE.... if you think about some other logic in the CTE other than just creating 10, maybe something more complex.... it would be much faster than using a loop....
Mark
May 22, 2007 at 7:20 am
Actually, I'm sorry... I normally don't turn things into a race. Just meant to show an alternative.
No matter what method you use, your article hits upon something that a lot of developers overlook... testing for performance and creating volumes of data to do just that. Good article, Jacob.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 22, 2007 at 7:36 am
Hi jeff,
I have taken your comment in the CORRECT sense only. It was really valuable and I see that the approach you suggested is better. I would welcome your comments in the future articles too.
Believe me, I too, did not find it to be a race. It was just a friendly conclusion that I put on my previous comment.
Jacob
.
May 22, 2007 at 8:16 am
thanks for the intro to CTE's! I know I've written the same SELECT statement into 2 parts of a query before. No more ... ah ... once we get all our customers to switch to 2005!
May 22, 2007 at 10:07 am
I ran across a way to generate the numbers list in Itzik Ben-Gan's book "Inside Microsoft SQL Server 2005 T-SQL Querying". Itzik is a genious, and I highly recommend his book. Here is the code:
WITH
L0
AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1
AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2
AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3
AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
L4
AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows
L5
AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows
num
AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)
SELECT
N FROM NUM WHERE N <= 1000000;
Aunt Kathi Data Platform MVP
Author of Expert T-SQL Window Functions
Simple-Talk Editor
May 22, 2007 at 3:00 pm
Jacob,
Use UNION ALL instead of UNION unless you explicitly need to remove duplicates - this might perform faster.
May 22, 2007 at 5:36 pm
Just out of curiosity, can someone who has SQL Server 2005 (unfortunately, I don't) tell me how long that bit of code takes to run? Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 23, 2007 at 3:51 am
Jeff,
I ran the code on SQL Server 2005 and it generated 1000000 numbers in 9 seconds. Quite impressive.
(the server in question is using 4 CPUS and 2GB memory.)
Paul
May 23, 2007 at 4:12 am
This is quite intersting!
Took 8 seconds on my laptop running SQL Server Express edition.
(Toshiba Satellite Pro, 1 GB RAM, single processor)
- Jacob
.
Viewing 15 posts - 1 through 15 (of 32 total)
You must be logged in to reply to this topic. Login to reply