Generating a Range

Question

Generating a Range

David.Poole

SSC Guru

Points: 76034
More actions
April 5, 2006 at 12:27 pm

#168907

Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/dpoole/generatingarange.asp
LinkedIn Profile

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Emmanuel Nanchen Ten Centuries Points: 1220 More actions · Answer 1

Hello,

personnally, I would use some statements along the lines of the ones below (taken from an article by Y.Ben-Gan, on http://www.sqlmag.com; search in google for the words CREATE TABLE NUMS). The idea is to populate the table exponentially, by doubling the number of records at each iterations.

In order to achieve the desired results, one would populate the table by setting @max-2 = (@end - @start +1) where @start and @end denotes the desired range, and then issuing an update statement like

UPDATE nums set n = n -1 + @start

I wonder how this solution would compare to the others in terms of performance ...

Sincerly,

Emmanuel Nanchen

CREATE TABLE Nums(n INT NOT NULL);

DECLARE @max-2 AS INT, @rc AS INT;

SET @max-2 = 1000; -- revise max number according to your needs

SET @rc = 1;

BEGIN TRAN

INSERT INTO Nums VALUES(1);

WHILE @rc * 2 <= @max-2

BEGIN

INSERT INTO Nums SELECT n + @rc FROM Nums;

SET @rc = @rc * 2;

END

INSERT INTO Nums

SELECT n + @rc FROM Nums WHERE n + @rc <= @max-2;

COMMIT TRAN

ALTER TABLE Nums ADD CONSTRAINT PK_Nums PRIMARY KEY(n);

Kindest Regards,

Emmanuel Nanchen

aka Old Hand Points: 398 More actions · Answer 2

I am dealing long enough with number ranges. I did tried approaches similar to ones described in this article. Sadly, the only really fast solution here is the dumbest one.

Create a table NUMS and fill it with as many numbers as needed. Add index or primary key. Then use this pre-filled table in any sql statements that require sequential range of numbes. This is really fast. It takes less that a second in my application.

Goce Smilevski SSCommitted Points: 1704 More actions · Answer 3

Hi everybody,

I used to make some tests of the exponential approach, also posted some results on SSC and it was faster than any other approach (we are talking about large result sets, not those with 1000 records ). I agree with aka'a oppinion,that having a static table is the fastest way, but it has limitations, too. Not always you know in advance the maximum number of records, and eventually it has to be populated somehow at the beginning. So, if you ask me, the really usefull solution is a table-valued UDF that returns a recordset that it populates internally, using the exponential approach. I tested with UDFs I posted on SSC few years ago, and the final cross join code completed in 4 minutes, while the UDF completed in 13 seconds. In both cases, I executed

select count(*) from the_UDF_name (5, 567720)

Regards,

Goce.

alexweatherall Mr or Mrs. 500 Points: 515 More actions · Answer 4

Hi,

Echoing the post about the algorithm suggested above that is by Itzik Ben Gan. This is the one I regularly use and after doing a brief performance test against the best performing query in the article I am getting average results of

fnCrossJoinRange2(5,567720): 14.5 s

and

fn_nums(5,567720): 11.2 s

I have taken Itziks algorithm and added the min and max logic.

CREATE FUNCTION dbo.fn_nums(@min int, @max-2 int)

RETURNS @Nums TABLE(n int NOT NULL)

AS

BEGIN

IF @max-2 > 0

BEGIN

DECLARE @factor AS int

SET @factor = 1

INSERT INTO @Nums VALUES(@min)

WHILE @factor * 2 <= @max-2

BEGIN

INSERT INTO @Nums SELECT n + @factor FROM @nums

SET @factor = @factor * 2

END

INSERT INTO @Nums

SELECT n + @factor FROM @nums WHERE n + @factor <= @max-2

END

RETURN

END

Thanks

Alex Weatherall

RyanRandall SSChampion Points: 13623 More actions · Answer 5

I can't find the article by Itzik Ben Gan - can someone please post a link.

Any idea when he 'came up with' the algorithm? I ask because it is suspiciously like the one I posted on here in September 2002!

http://www.sqlservercentral.com/scripts/viewscript.asp?scriptid=321

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

alexweatherall Mr or Mrs. 500 Points: 515 More actions · Answer 6

he 'came up with' the algorithm

Hi, those quotes are yours. I'm not trying to imply that it is *his* algorithm (even though that was what I said), but that he has promoted it as a efficient way of generating numbers on the fly.

Yes, your script does perform the generation in a similar manner. But it is conceivable that you've independently come up with the same idea 🙂

Cheers

Alex

sushila SSC-Dedicated Points: 35293 More actions · Answer 7

Here's one link - scroll down 2/3rds or search for itzik ben-gan...& you know what they say about great minds...

**ASCII stupid question, get a stupid ANSI !!!**

aka Old Hand Points: 398 More actions · Answer 8

If a very specific range is required, a mixed approach can be used. So the predefined NUMS table is used as a source, with one or two outer SQLs that multiply, add or whatever the source range.

Jim zumbusch SSC Journeyman Points: 90 More actions · Answer 9

I disagree with David’s conclusion that

"However, I found that for small ranges, say under 500 records, there was little difference in performance between the procedural and set based versions. Once beyond the 500 threshold the set based version won hands down."

I am working with a result set of 500 rows but my initial value is a large number and the set based approach on my test machine takes 6 seconds to run and the procedural approach takes less then a second

SELECT * FROM [dbo].[fnCrossJoinRange2](1001000, 1001500) Run time 6 sec

SELECT * FROM [dbo].[fnRange](1001000, 1001500) Run time 0 sec

I like David’s approach but it does not perform as well for large initial values

Matt Marston SSCrazy Points: 2413 More actions · Answer 10

Here is a solution using CLR integration. It comes in around 5 seconds on my machine for the range range 5 to 567720.

public partial class UserDefinedFunctions

{

     [Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName="NumberRangeFillRow",

          TableDefinition="number int",

          IsPrecise=true,

          IsDeterministic=true)]

     public static System.Collections.IEnumerable NumberRange(int first, int last)

          int counter = first;

          while(counter <= last)

               yield return counter;

               counter++;

     public static void NumberRangeFillRow(object obj, out int number)

          number = (int)obj;

};

David.Poole SSC Guru Points: 76034 More actions · Answer 11

I must admit I didn't look at the possibility of a high initial value but with a small range.

Following it through logically all the selects would come into play so in this scenario it would perform much slower.

If you produce the range using

SELECT 1000000+value FROM [dbo].[fnCrossJoinRange2](0, 500)

what sort of results do you get?

The one thing that does show up here is that there are many, many solutions and it is always worth spending a little time researching alternatives.

LinkedIn Profile

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 12

Well done, David. It's nice to see it when someone takes multiple methods and does a really good performance comparison.

It seems that generating a range of numbers in a temporary storage area is something that a lot of folks want to do and have always had to resort to some form of loop or a complex setup where they create mini-tables to grow bigger ones using cross-joins. I believe the reason for this is because the IDENTITY function does not allow variables for the seed nor the increment and the TOP clause is not programmable in versions less than 2005.

Speaking of alternatives, I've solved that problem for temp tables in the code below... and the meat of the code uses a single insert in the spirit of the closing remarks in your article's conclusion. It won't work for table variables so it cannot be used in a function.

As a reminder to everyone, table variables do not use statisics nor can they be made to do so. As a result, they are much slower than their more effective temp table "siblings". There are a lot of things that are wrong with table variables and a lot of incorrect information about temp tables. But don't take my word for it... check out the following URL. You may never use a table variable again except when a multi-row structure is required as a return from a function...

http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k

OK, here's the programmable numeric range generation code I promised. The very first time you run it, it will take about 7 seconds to run on 500,000 rows including the addition of the clustered primary key. Once compiled, subsequent runs take less than 4 seconds to generate a range of 500,000 records no matter what the starting number is and that includes the addition of the clustered primary key... obviously, the smaller the range, the faster this one will run. It generates 500 records in about 16 milli-seconds no matter the starting number...

--===== Declare local performance measuring variable

-- This would not be part of the normal code

DECLARE @StartTime DATETIME

--===== Start the timer

SET @StartTime = GETDATE()

--===== Declare local variables

DECLARE @LoVal INT

DECLARE @HiVal INT

DECLARE @Vals INT

DECLARE @sql NVARCHAR(300)

--===== Set the local variables

SET @LoVal = 500000 --Could be param for sp

SET @HiVal = 1000000 --Could be param for sp

SET @Vals = @HiVal-@LoVal+1

--===== If the temp table already exists, drop it

IF OBJECT_ID('TempDB..#Tally') IS NOT NULL

DROP TABLE #Tally

--===== Create the new temp table

-- The "One" column is a dummy target

CREATE TABLE #Tally (N INT IDENTITY(1,1) ,One TINYINT)

--===== Change the starting value of IDENTITY

DBCC CHECKIDENT (#Tally, RESEED, @LoVal) --Doesn't work on table variables

--===== Ready to rock... populate the table

-- using a nasty ol' cross join

SET ROWCOUNT @Vals

INSERT INTO #Tally (One)

SELECT 1

FROM Master.dbo.SysColumns sc1, --Contains >4800 records

Master.dbo.SysColumns sc2 --so good to about 23 million records

SET ROWCOUNT 0

--===== Add a clustered Primary key

ALTER TABLE #Tally

ADD PRIMARY KEY CLUSTERED (N)

--===== Display the duration of the run

PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())/1000.0,10,3) + ' Seconds total duration'

The key to the programmability of the above code is the use of the DBCC CHECKIDENT command which does allow a variable for the RESEED value. It's also the reason why the table must be created first instead of using something like the following code (hat's off to my DBA, Andy Dowswell, for first showing me this little trick a while back) ...

SET ROWCOUNT @HiVal

SELECT IDENTITY(INT,1,1) AS N

INTO #Tally

FROM Master.dbo.SysColumns sc1, --Contains >4800 records

Master.dbo.SysColumns sc2 --so good to about 23 million records

SET ROWCOUNT 0

Admittedly, the programmable code ends up with an extra column of numbers but TINYINT (or BIT for that matter) only take up 1 byte per row. (Varchar(1) would take 6 bytes even if NULL). Of course, the extra column could be a datetime column where you update the column with a starting date + the N column +/- some offset to make things work.

In one spot in the code, I say that a couple of variables could be used in a stored procedure... allow me to clarify... you would have to first create the #Tally temp table in the procedure that calls range generation proc in order for this to work as an SP because the scope of non-global temp tables is "downward", not "upward" in calls.

Anyway, thanks for posting a great article and for all the other contributions and posts you've made on this and other SQL forums... you're definitely one of the good guys.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

aka Old Hand Points: 398 More actions · Answer 13

Wow, the message above looks even better than original article.

David.Poole SSC Guru Points: 76034 More actions · Answer 14

Thanks to everyone who has posted.

One feature I would really like to see on SQL Server Central (but it is in the big bucks content management arean) is the ability for an author to revise their article on the basis of feedback.

Clearly the algorithmn for the progressive doubling of the number range should be included, as should the IDENTITY variation.

I did consider the idea of storing a range in a specific table with a clustered index on it but rejected it on the grounds that storing 2.1 billion records to allow for all variations of a 32 bit int wouldn't be popular.

If I was to use the static table approach I would probably stick it in its own database so that it wouldn't artificially inflate the backups.

LinkedIn Profile