What is the most efficient method of generating a range of values
A lead-developer friend asked me to review a T-SQL code they had inherited which generated a
number range between 2 integer values.
He believed that the T-SQL was a prime example of spaghetti code and wanted to see some alternatives.
Of course, first port of call was SQLServerCentral to see what alternatives might exist and sure enough I found two scripts.
Chris Cubley's solution was closest to the one I was given to review where as DavidT's script would probably be the one that most procedural programmers would come up with.
I paraphrased all three versions into functions and measured the performance as each of them produced the number
range 5 to 567720.
Author | Function | Average execution time |
---|---|---|
Chris Cubley | dbo.fnCrossJoinRange | 30 secs. |
DavidT | dbo.fnRange | 22 secs. |
?????? | dbo.fnUnionRange | 12 secs. |
Function based on DavidT's script
See Generate Number Range 2 for the original script.
CREATE FUNCTION dbo.fnRange ( @first int , --##PARAM @first The lowest value in the range. @last int --##PARAM @last The highest value in the range. ) RETURNS @values TABLE ( value int primary key ) AS BEGIN DECLARE @temp int -- If the values have been supplied transposed then swap them to the correct way around. IF @first > @last BEGIN SET @temp = @first SET @first = @last SET @last = @temp END WHILE @first<=@last BEGIN INSERT INTO @values VALUES(@first) SET @first = @first+1 END RETURN END GO
The above function is straight forward as it simply loops between the first and last value
inserting records into a table variable that is then returned.
This function has three main advantages
- It is simple.
- Values are inserted in order.
- For small number ranges it is very efficient
- The range is limited only by the bounds of the SQL INT datatype. This can be extended by using BIGINT
Its disadvantage is that as number ranges get progressively larger its execution time increases. The reality is that execution times become excessive once the range gets into 7 figures.
Chris Cubley's version
See Generate Number Range for the original script.
In Chris's original he used a temporary table to hold the values 0 to 9. As this is a static dataset I decided to use a view instead.
CREATE VIEW dbo.Digits AS select 0 as value union allselect 1 as value union allselect 2 as value union allselect 3 as value union allselect 4 as value union allselect 5 as value union allselect 6 as value union allselect 7 as value union allselect 8 as value union allselect 9 as value GO --##SUMMARY This function provides a table of integers between the two specified arguments. --##REMARKS The procedure allows for the two arguments being transposed. CREATE FUNCTION dbo.fnCrossJoinRange ( @first int , --##PARAM @first The lowest value in the range. @last int --##PARAM @last The highest value in the range. ) RETURNS @values TABLE ( value int primary key ) AS BEGIN INSERT INTO @values(value) SELECTnum = units.value + (tens.value * 10) + (hundreds.value * 100) + (Thousands.value * 1000) + (TenThousands.value * 10000) + (CThousands.value * 100000) + (Millions.value * 1000000) FROMdbo.Digits units CROSS JOIN dbo.Digits tens CROSS JOIN dbo.Digits hundreds CROSS JOIN dbo.Digits Thousands CROSS JOIN dbo.Digits TenThousands CROSS JOIN dbo.Digits CThousands CROSS JOIN dbo.Digits Millions where units.value + (tens.value * 10) + (hundreds.value * 100) + (Thousands.value * 1000) + (TenThousands.value * 10000) + (CThousands.value * 100000) + (Millions.value * 1000000) BETWEEN @first and @last RETURN END GO
The idea behind this function is that instead of doing a row by row insert we are playing to the strengths of SQL Server and are using set based operations. The theory behind this is that we cross join our number range (0 to 9) to itself repeatedly.
- units contain the values 0 to 9
- tens contain the values 10 to 90 in steps of 10 and zero
- hundreds contain the values 100 to 900 in steps of 100 and zero
- ...etc
By performing a CROSS JOIN we are getting the sum of every possible combination of each item.
Although this function was by far the slowest its lack of performance is principally due to the sheer volume of
records it generates before applying a filter. If you genuinely need to generate a range of numbers in the millions then it will actually out-perform
the dbo.fnRange function.
Unknown author's version
The function shown below combines the technique that Chris used in his script with a filter to limit the number of records that are combined.
Like Chris's version it is effectively doing 70 SELECT statements
- 0 to 9
- <10 to 90 in steps of 10 and zero
- <100 to 900 in steps of 100 and zero
- <1000 to 9000 in steps of 1000 and zero
- <10000 to 90000 in steps of 10000 and zero
- <100000 to 900000 in steps of 100000 and zero
- <1000000 to 9000000 in steps of 1000000 and zero
Where it has the advantage over Chris's version is that it does not select values beyond the end of the range. Chris's version selects ALL values then applies a WHERE clause. This version pre-selects values prior to applying a final WHERE clause
CREATE FUNCTION dbo.fnUnionRange ( @first int , --##PARAM @first The lowest value in the range. @last int --##PARAM @last The highest value in the range. ) RETURNS @values TABLE ( value int primary key ) AS BEGIN INSERT INTO @values(value) select units.value +tens.value +hundreds.value +Thousands.value +TenThousands.value +CThousands.value +Millions.value AS list from( select 0 as value union allselect 1 as value where 1 <= @last union allselect 2 as value where 2 <= @last union allselect 3 as value where 3 <= @last union allselect 4 as value where 4 <= @last union allselect 5 as value where 5 <= @last union allselect 6 as value where 6 <= @last union allselect 7 as value where 7 <= @last union allselect 8 as value where 8 <= @last union allselect 9 as value where 9 <= @last ) AS Units , ( select 0 as value union allselect 10 as value where 10 <= @last union allselect 20 as value where 20 <= @last union allselect 30 as value where 30<= @last union allselect 40 as value where 40 <= @last union allselect 50 as value where 50 <= @last union allselect 60 as value where 60 <= @last union allselect 70 as value where 70 <= @last union allselect 80 as value where 80 <= @last union allselect 90 as value where 90 <= @last ) AS Tens, ( select 0 as value union allselect 100 as value where 100 <= @last union allselect 200 as value where 200 <= @last union allselect 300 as value where 300 <= @last union allselect 400 as value where 400 <= @last union allselect 500 as value where 500 <= @last union allselect 600 as value where 600 <= @last union allselect 700 as value where 700 <= @last union allselect 800 as value where 800 <= @last union allselect 900 as value where 900 <= @last ) AS Hundreds, ( select 0 as value union allselect 1000 as value where 1000 <= @last union allselect 2000 as value where 2000 <= @last union allselect 3000 as value where 3000 <= @last union allselect 4000 as value where 4000 <= @last union allselect 5000 as value where 5000 <= @last union allselect 6000 as value where 6000 <= @last union allselect 7000 as value where 7000 <= @last union allselect 8000 as value where 8000 <= @last union allselect 9000 as value where 9000 <= @last ) AS Thousands, ( select 0 as value union allselect 10000 as value where 10000 <= @last union allselect 20000 as value where 20000 <= @last union allselect 30000 as value where 30000 <= @last union allselect 40000 as value where 40000 <= @last union allselect 50000 as value where 50000 <= @last union allselect 60000 as value where 60000 <= @last union allselect 70000 as value where 70000 <= @last union allselect 80000 as value where 80000 <= @last union allselect 90000 as value where 90000 <= @last ) AS TenThousands, ( select 0 as value union allselect 100000 as value where 100000 <= @last union allselect 200000 as value where 200000 <= @last union allselect 300000 as value where 300000 <= @last union allselect 400000 as value where 400000 <= @last union allselect 500000 as value where 500000 <= @last union allselect 600000 as value where 600000 <= @last union allselect 700000 as value where 700000 <= @last union allselect 800000 as value where 800000 <= @last union allselect 900000 as value where 900000 <= @last ) AS CThousands, ( select 0 as value union allselect 1000000 as value where 1000000 <= @last union allselect 2000000 as value where 2000000 <= @last union allselect 3000000 as value where 3000000 <= @last union allselect 4000000 as value where 4000000 <= @last union allselect 5000000 as value where 5000000 <= @last union allselect 6000000 as value where 6000000 <= @last union allselect 7000000 as value where 7000000 <= @last union allselect 8000000 as value where 8000000 <= @last union allselect 9000000 as value where 9000000 <= @last ) AS Millions where units.value +tens.value +hundreds.value +Thousands.value +TenThousands.value +CThousands.value +Millions.value between @first and @last RETURN END GO
Verbose though the above function is, it is streets ahead of the others in terms of performance. I decided to combine the filtering techniques in this script with the elegance of Chris's version to produce a final version.
The final version
This final version offers similar performance to the verbose version. but, to my eyes at least, gains in readability.
ALTER FUNCTION dbo.fnCrossJoinRange2 ( @first int , --##PARAM @first The lowest value in the range. @last int --##PARAM @last The highest value in the range. ) RETURNS @values TABLE ( value int primary key ) AS BEGIN INSERT INTO @values(value) SELECTnum = units.value + (tens.value) + (hundreds.value ) + (Thousands.value ) + (TenThousands.value ) + (CThousands.value ) + (Millions.value ) FROMdbo.Digits units CROSS JOIN (SELECT value * 10 as value from dbo.Digits WHERE value * 10 <=@last) tens CROSS JOIN (SELECT value * 100 as value from dbo.Digits WHERE value * 100 <=@last) hundreds CROSS JOIN (SELECT value * 1000 as value from dbo.Digits WHERE value * 1000 <=@last) Thousands CROSS JOIN (SELECT value * 10000 as value from dbo.Digits WHERE value * 10000 <=@last) TenThousands CROSS JOIN (SELECT value * 100000 as value from dbo.Digits WHERE value * 100000 <=@last) CThousands CROSS JOIN (SELECT value * 1000000 as value from dbo.Digits WHERE value * 1000000 <=@last) Millions where units.value + (tens.value ) + (hundreds.value) + (Thousands.value ) + (TenThousands.value ) + (CThousands.value ) + (Millions.value ) BETWEEN @first and @last RETURN END GO
Conclusions
The important thing to remember with SQL is that it is optimised for set based operations. A piece of code that would be very efficient in a procedural language won't necessarily be the best way to perform a task in T-SQL. This trips up most newbies and also the ORACLE crowd who tend to be much more dependent on cursors in their code.
One insert of 10,000 records is better than 10,000 inserts of 1 record.
However, I found that for small ranges, say under 500 records, there was little difference in performance between the procedural and set based versions. Once beyond the 500 threshold the set based version won hands down.