January 23, 2007 at 11:19 am
Good article. Interesting that my create table/insert while loop (to 10,000 rows) executed in 4 seconds and the set based solution executed in 17 seconds.
January 23, 2007 at 4:41 pm
James,
Would you mind posting both versions? I'd like to do a little testing to see what's up... thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 10, 2007 at 10:00 pm
Hi James,
Not sure why that would happen. I had the exact opposite experience on a few different SQL Server instances I tested on. There could be a lot of factors involved, however. One thing that comes to mind is if your SQL Server decided it needed to AutoGrow the database while creating the table, or some other oddity. If you could post the code you ran I'd love to test it out and see if I can reproduce it.
Thanks!
July 26, 2007 at 2:13 pm
I wrote the function in the code below to quickly generate number tables.
It executed this code to load a table with 1,000,000 numbers in 6.780 seconds. When I ran it to load a table with 10,000,000 numbers, it took about 136 seconds. 100,000 rows took 0.610 seconds, and 10,000 rows took 0.063 seconds.
declare @t datetime declare @n table( number int ) set @t = getdate()
insert into @n select number from -- Function F_TABLE_NUMBER_RANGE available on this link -- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685 F_TABLE_NUMBER_RANGE(1,1000000)
select ElapsedTime = getdate()-@t
July 26, 2007 at 6:46 pm
If you always want the numbers table to start at 1, this demonstrates both an alternative to the function and a comparison of run times from 10 rows to 10 million rows...
--===== Setup the test environment
SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance & speed
--===== Declare local variables
DECLARE @DesiredRows INT --Number of rows desired in the result object
DECLARE @StartTime DATETIME --For calculation of duration times
--================================================================================
--===== Loop through the tests from 10 to 10 million =============================
SET @DesiredRows = 10
WHILE @DesiredRows <= 10000000
BEGIN
--================================================================================
-- Test the SELECT/INTO Method (temp table)
--================================================================================
--===== If the temporary Tally (Numbers) table exists, drop it
-- (Included just for test repeatability. Should not have to do in real life
-- because won't exist when new connection starts)
IF OBJECT_ID('TempDB..#Tally','U') IS NOT NULL
DROP TABLE #Tally
--===== Limit the desired number of rows
SET ROWCOUNT @DesiredRows
--===== Start the timer and run the test
SET @StartTime = GETDATE()
SELECT IDENTITY(INT,1,1) AS N
INTO #Tally
FROM Master.dbo.SysColumns sc1 WITH (NOLOCK),
Master.dbo.SysColumns sc2 WITH (NOLOCK)
--===== Report the rowcount and duration in seconds
PRINT STR(@@ROWCOUNT) + ' Rows Inserted '
+ CONVERT(CHAR(13),GETDATE()-@StartTime,114) + ' Duration SELECT/INTO Method'
--===== Return to normal unlimited rowcounts
SET ROWCOUNT 0
--================================================================================
-- Test the F_TABLE_NUMBER_RANGE Method (table variables)
-- Function F_TABLE_NUMBER_RANGE available on this link
-- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685
--================================================================================
--===== Start the timer and run the test
SET @StartTime = GETDATE()
DECLARE @N TABLE(Number INT) --Not sure why this works in a loop, but it does
INSERT INTO @N
SELECT Number
FROM dbo.F_TABLE_NUMBER_RANGE(1,@DesiredRows)
--===== Report the rowcount and duration in seconds
PRINT STR(@@ROWCOUNT) + ' Rows Inserted '
+ CONVERT(CHAR(13),GETDATE()-@StartTime,114) + ' Duration F_TABLE_NUMBER_RANGE Method'
--===== Delete rows from the table variable so as not to build up rows between tests
--TRUNCATE TABLE @N --Doesn't work on table variables...
--DROP TABLE @N --Neither does this...
DELETE @N --But this does.
PRINT REPLICATE('=',78)
--================================================================================
--===== End of test loop =========================================================
SET @DesiredRows = @DesiredRows * 10
END
... and here's the results it produced on my humble 1.8 Ghz 1 GB Ram SQL Server 2000 SP 4 Developer's Edition desktop box at home...
10 Rows Inserted 00:00:00:000 Duration SELECT/INTO Method
10 Rows Inserted 00:00:00:000 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
100 Rows Inserted 00:00:00:000 Duration SELECT/INTO Method
100 Rows Inserted 00:00:00:013 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
1000 Rows Inserted 00:00:00:000 Duration SELECT/INTO Method
1000 Rows Inserted 00:00:00:017 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
10000 Rows Inserted 00:00:00:033 Duration SELECT/INTO Method
10000 Rows Inserted 00:00:00:077 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
100000 Rows Inserted 00:00:00:203 Duration SELECT/INTO Method
100000 Rows Inserted 00:00:00:750 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
1000000 Rows Inserted 00:00:02:000 Duration SELECT/INTO Method
1000000 Rows Inserted 00:00:08:093 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
10000000 Rows Inserted 00:00:20:253 Duration SELECT/INTO Method
10000000 Rows Inserted 00:01:48:123 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
--Jeff Moden
Change is inevitable... Change for the better is not.
July 26, 2007 at 7:06 pm
Oh yeah... almost forgot... if you want a nice clustered primary key on both, the test and test results are as follows:
--===== Setup the test environment
SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance & speed
--===== Declare local variables
DECLARE @DesiredRows INT --Number of rows desired in the result object
DECLARE @MyCount INT --Keeps track of row counts
DECLARE @StartTime DATETIME --For calculation of duration times
--================================================================================
--===== Loop through the tests from 10 to 10 million =============================
SET @DesiredRows = 10
WHILE @DesiredRows <= 10000000
BEGIN
--================================================================================
-- Test the SELECT/INTO Method (temp table)
--================================================================================
--===== If the temporary Tally (Numbers) table exists, drop it
-- (Included just for test repeatability. Should not have to do in real life
-- because won't exist when new connection starts)
IF OBJECT_ID('TempDB..#Tally','U') IS NOT NULL
DROP TABLE #Tally
--===== Limit the desired number of rows
SET ROWCOUNT @DesiredRows
--===== Start the timer and run the test
SET @StartTime = GETDATE()
SELECT IDENTITY(INT,1,1) AS N
INTO #Tally
FROM Master.dbo.SysColumns sc1 WITH (NOLOCK),
Master.dbo.SysColumns sc2 WITH (NOLOCK)
SET @MyCount = @@ROWCOUNT
--===== Add a primary key to the new table
ALTER TABLE #Tally
ADD PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Report the rowcount and duration in seconds
PRINT STR(@MyCount) + ' Rows Inserted '
+ CONVERT(CHAR(13),GETDATE()-@StartTime,114) + ' Duration SELECT/INTO Method'
--===== Return to normal unlimited rowcounts
SET ROWCOUNT 0
--================================================================================
-- Test the F_TABLE_NUMBER_RANGE Method (table variables)
-- Function F_TABLE_NUMBER_RANGE available on this link
-- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685
--================================================================================
--===== Start the timer and run the test
SET @StartTime = GETDATE()
DECLARE @N TABLE(Number INT PRIMARY KEY CLUSTERED WITH FILLFACTOR = 100)
INSERT INTO @N
SELECT Number
FROM dbo.F_TABLE_NUMBER_RANGE(1,@DesiredRows)
--===== Report the rowcount and duration in seconds
PRINT STR(@@ROWCOUNT) + ' Rows Inserted '
+ CONVERT(CHAR(13),GETDATE()-@StartTime,114) + ' Duration F_TABLE_NUMBER_RANGE Method'
--===== Delete rows from the table variable so as not to build up rows between tests
--TRUNCATE TABLE @N --Doesn't work on table variables...
--DROP TABLE @N --Neither does this...
DELETE @N --But this does.
PRINT REPLICATE('=',78)
--================================================================================
--===== End of test loop =========================================================
SET @DesiredRows = @DesiredRows * 10
END
10 Rows Inserted 00:00:00:110 Duration SELECT/INTO Method
10 Rows Inserted 00:00:00:000 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
100 Rows Inserted 00:00:00:110 Duration SELECT/INTO Method
100 Rows Inserted 00:00:00:000 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
1000 Rows Inserted 00:00:00:107 Duration SELECT/INTO Method
1000 Rows Inserted 00:00:00:017 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
10000 Rows Inserted 00:00:00:140 Duration SELECT/INTO Method
10000 Rows Inserted 00:00:00:093 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
100000 Rows Inserted 00:00:00:500 Duration SELECT/INTO Method
100000 Rows Inserted 00:00:00:907 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
1000000 Rows Inserted 00:00:04:063 Duration SELECT/INTO Method
1000000 Rows Inserted 00:00:09:907 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
10000000 Rows Inserted 00:00:51:033 Duration SELECT/INTO Method
10000000 Rows Inserted 00:01:59:123 Duration F_TABLE_NUMBER_RANGE Method
==============================================================================
--Jeff Moden
Change is inevitable... Change for the better is not.
July 26, 2007 at 7:11 pm
Like Jackie Chan's Uncle would say "Oh! One more thing"...
I keep a "Tally" table of 11,000 numbers as a permanent table... good for more than 30 years of dates by days (if you need that type of fuctionality) and certainly good enough for an 8k byte VARCHAR split function (if you need that). Here's how I build it...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
--Jeff Moden
Change is inevitable... Change for the better is not.
July 27, 2007 at 12:18 pm
Jeff, I think your tests may have taken advantage of having master.dbo.syscolumns in cache, something that may or may not be the case. Also, one was loading a temp table with a SELECT INTO, and the other was doing an INSERT into a declared table.
I created a test script that clears the data cache and procedure buffers, and does a SELECT INTO for both methods.
My testing suggests a closer match in performance. The crosstab on master.dbo.syscolumns was faster than F_TABLE_NUMBER_RANGE in many tests, but not always, as you can see from the test below with 15,000,000 rows.
These tests were run on my SQL Server 2005 Developer Edition on my desktop PC.
--Removes all clean buffers from the buffer pool dbcc dropcleanbuffers --Removes all elements from the procedure cache dbcc freeproccache go print 'Test SysColumns crosstab' declare @starttime datetime set @starttime = getdate()
set rowcount 15000000
select number = identity(int,1,1) into #t from master.dbo.syscolumns sc1 with (nolock) cross join master.dbo.syscolumns sc2 with (nolock)
select Elapsed_Time_Cross = convert(char(13),getdate()-@starttime,114) go drop table #t go --Removes all clean buffers from the buffer pool dbcc dropcleanbuffers --Removes all elements from the procedure cache dbcc freeproccache go print 'Test F_TABLE_NUMBER_RANGE' declare @starttime datetime set @starttime = getdate()
select number into #t from dbo.F_TABLE_NUMBER_RANGE(1,15000000)
select Elapsed_Time_Function = convert(char(13),getdate()-@starttime,114) go drop table #t
DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. Test SysColumns crosstab
(15000000 row(s) affected)
Elapsed_Time_Cross ------------------ 00:00:37:140
(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. Test F_TABLE_NUMBER_RANGE
(15000000 row(s) affected)
Elapsed_Time_Function --------------------- 00:00:36:940
(1 row(s) affected)
July 27, 2007 at 4:04 pm
Cool... thanks, Michael.
Yeah, clearing the cashe didn't seem to matter in my testing so I didn't include it in my code. And, I very much like the idea of using your function to populate a temp table rather than a table variable. Thanks for running the test!
--Jeff Moden
Change is inevitable... Change for the better is not.
January 22, 2008 at 4:44 pm
I'm developing a system for tracking how many days it takes to complete a task from when it's requested, but I have to exclude weekends and holidays. The original system was written in Access, and the contractor had a very clunky solution where he looped through a date table, touching every row (RBAR) to do this calculation. I realized that I, too, would need a date table, so I did one, and then came the problem of loading it. I originally did a loop to insert 104ish weekend dates, and then realized that I could do it with a number table! I was very happy, this was the first time that I got to use it. Here's the code:
[font="Courier New"]CREATE TABLE [dbo].[DatesWeekendsHolidays] (
[WHDate] [smalldatetime] NOT NULL)
GO
--(The table Numbers contains a single smallint field, Number.)
declare @ThisYear char(2)
select @ThisYear = '08'
-- DECLARE @StartDate smalldatetime
-- DECLARE @EndDate smalldatetime
-- DECLARE @LoopDate smalldatetime
--
-- set @StartDate = cast(('1/1/' + @ThisYear) as smalldatetime)
-- set @EndDate = cast(('12/31/' + @ThisYear) as smalldatetime)
-- set @LoopDate = @StartDate
--Using the Numbers table lets me insert all of the dates IN ONE INSERT STATEMENT!
insert DatesWeekendsHolidays
select dateadd(dw, Number, cast(('1/1/' + @ThisYear) as smalldatetime))
from Numbers
where Number between 1 and 365
and (datepart(dw, dateadd(dw, Number, cast(('1/1/' + @ThisYear) as smalldatetime))) = 1
or datepart(dw, dateadd(dw, Number, cast(('1/1/' + @ThisYear) as smalldatetime))) = 7)
--The loop method of inserting dates: 104 inserts.
-- while @LoopDate <= @EndDate
-- BEGIN
-- set @dow = datepart(dw, @LoopDate)
--
-- if @dow = 1 or @dow = 7
-- insert DatesWeekendsHolidays select @LoopDate
-- select @LoopDate = dateadd(d, 1, @LoopDate)
-- END
--New Year's Day
insert DatesWeekendsHolidays
select cast(('1/1/' + @ThisYear) as smalldatetime)
where not exists(select WHDate
from DatesWeekendsHolidays
where WHDate = cast(('1/1/' + @ThisYear) as smalldatetime))
--insert 11/11, 12/25 -- holidays with fixed dates
--user inserts variable date holidays (Thanksgiving) via Access/VB app[/font]
Ultimately I'll write a SQL job that will run at 00:01 on 1/1 of every year and automatically populate at least the raw holidays since it is easy enough to extract the year from getdate(). It would also send an email to the responsible parties to make sure that the variable date holidays get entered.
The loop insert method took one second in Query Analyzer to run, the numbers insert took no measurable time. Looking at the statistics, trace, and execution plan are quite amusing, seeing the difference between the single insert statement and 104 is pretty funny.
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
November 19, 2008 at 4:47 pm
i had a need for a large number table, from 1 to 1 billion in just one table so neither of these solutions worked. i used cte to get what i wanted, i was able to do 1 million rows in under 6 seconds on a pentium d with 7200 rpm drives. where as the fastest solution proved thus far was about a minute and a half.
1 billion in an hour and a half awesome performance, i didnt have 3 days to wait for the others to finish. i thought id share my code
after the "with" each line you add squares the previous number. so with this example we can put in just over 4 billion rows. the where clause limits the amount returned.
after it is all inserted we add the pk and clustered indexes
SET NOCOUNT ON;
DROP TABLE dbo.NUMBERS_TEST;
WITH
N5(i) AS ( SELECT 1 UNION SELECT 0 ), -- 2
N4(i) AS ( SELECT 1 FROM N5 D1 CROSS JOIN N5 D2 ), --4
N3(i) AS ( SELECT 1 FROM N4 D1 CROSS JOIN N4 D2 ), --16
N2(i) AS ( SELECT 1 FROM N3 D1 CROSS JOIN N3 D2 ), --256
N1(i) AS ( SELECT 1 FROM N2 D1 CROSS JOIN N2 D2 ), --65536
N0(i) AS ( SELECT 1 FROM N1 D1 CROSS JOIN N1 D2 ) --4294967296
SELECT i INTO NUMBERS_TEST
FROM ( SELECT ROW_NUMBER() OVER (ORDER BY i)
FROM N0) D(i)
WHERE i <= 1000000000 ; --Put your limit here
GO
ALTER TABLE dbo.NUMBERS_TEST
ALTER COLUMN i BIGINT NOT NULL;
GO
ALTER TABLE dbo.NUMBERS_TEST
ADD CONSTRAINT PK_NUMBERS_TEST PRIMARY KEY CLUSTERED (i);
GO
November 19, 2008 at 6:46 pm
I've just gotta ask... what did you need the 1 Billion row numbers table for?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 21, 2008 at 10:57 am
it was something of a math experiment. I was doing a specialized inner join on itself to find all prime numbers from 1 to a billion. its been going for about 3 days now. hahaha
November 21, 2008 at 7:24 pm
Sounds like a lot of fun... thanks for the feedback. Did you index the numbers table with a clustered PK by any chance?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 21, 2008 at 7:48 pm
yeah, i added the clustered index after the insert to improve performance. the insert took just over an hour and a half but adding the clustered index took around 5 and a half hours. haha.
Viewing 15 posts - 16 through 30 (of 42 total)
You must be logged in to reply to this topic. Login to reply