Generating a Range

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/dpoole/generatingarange.asp

  • Hello,

    personnally, I would use some statements along the lines of the ones below (taken from an article by Y.Ben-Gan, on http://www.sqlmag.com; search in google for the words CREATE TABLE NUMS). The idea is to populate the table exponentially, by doubling the number of records at each iterations.

    In order to achieve the desired results, one would populate the table by setting @max-2 = (@end - @start +1) where @start and @end denotes the desired range, and then issuing an update statement like

    UPDATE nums set n = n -1 + @start

    I wonder how this solution would compare to the others in terms of performance ...

    Sincerly,

    Emmanuel Nanchen

     

    CREATE TABLE Nums(n INT NOT NULL);

    DECLARE @max-2 AS INT, @rc AS INT;

    SET @max-2 = 1000; -- revise max number according to your needs

    SET @rc = 1;

    BEGIN TRAN

      INSERT INTO Nums VALUES(1);

      WHILE @rc * 2 <= @max-2

      BEGIN

        INSERT INTO Nums SELECT n + @rc FROM Nums;

        SET @rc = @rc * 2;

      END

      INSERT INTO Nums

        SELECT n + @rc FROM Nums WHERE n + @rc <= @max-2;

    COMMIT TRAN

    ALTER TABLE Nums ADD CONSTRAINT PK_Nums PRIMARY KEY(n);


    Kindest Regards,

    Emmanuel Nanchen

  • I am dealing long enough with number ranges. I did tried approaches similar to ones described in this article. Sadly, the only really fast solution here is the dumbest one.

    Create a table NUMS and fill it with as many numbers as needed. Add index or primary key. Then use this pre-filled table in any sql statements that require sequential range of numbes. This is really fast. It takes less that a second in my application.

  • Hi everybody,

    I used to make some tests of the exponential approach, also posted some results on SSC and it was faster than any other approach (we are talking about large result sets, not those with 1000 records ). I agree with aka'a oppinion,that having a static table is the fastest way, but it has limitations, too. Not always you know in advance the maximum number of records, and eventually it has to be populated somehow at the beginning. So, if you ask me, the really usefull solution is a table-valued UDF that returns a recordset that it populates internally, using the exponential approach. I tested with UDFs I posted on SSC few years ago, and the final cross join code completed in 4 minutes, while the UDF completed in 13 seconds. In both cases, I executed

    select count(*) from the_UDF_name (5, 567720)

    Regards,

    Goce.

  • Hi,

    Echoing the post about the algorithm suggested above that is by Itzik Ben Gan. This is the one I regularly use and after doing a brief performance test against the best performing query in the article I am getting average results of

    fnCrossJoinRange2(5,567720): 14.5 s

    and

    fn_nums(5,567720): 11.2 s

    I have taken Itziks algorithm and added the min and max logic.

    CREATE FUNCTION dbo.fn_nums(@min int, @max-2 int)

    RETURNS @Nums TABLE(n int NOT NULL)

    AS

    BEGIN

    IF @max-2 > 0

    BEGIN

    DECLARE @factor AS int

    SET @factor = 1

    INSERT INTO @Nums VALUES(@min)

    WHILE @factor * 2 <= @max-2

    BEGIN

    INSERT INTO @Nums SELECT n + @factor FROM @nums

    SET @factor = @factor * 2

    END

    INSERT INTO @Nums

    SELECT n + @factor FROM @nums WHERE n + @factor <= @max-2

    END

    RETURN

    END

    Thanks

    Alex Weatherall

  • I can't find the article by Itzik Ben Gan - can someone please post a link.

    Any idea when he 'came up with' the algorithm? I ask because it is suspiciously like the one I posted on here in September 2002!

    http://www.sqlservercentral.com/scripts/viewscript.asp?scriptid=321

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • he 'came up with' the algorithm

    Hi, those quotes are yours. I'm not trying to imply that it is *his* algorithm (even though that was what I said), but that he has promoted it as a efficient way of generating numbers on the fly.

    Yes, your script does perform the generation in a similar manner. But it is conceivable that you've independently come up with the same idea 🙂

    Cheers

    Alex

  • Here's one link - scroll down 2/3rds or search for itzik ben-gan...& you know what they say about great minds...







    **ASCII stupid question, get a stupid ANSI !!!**

  • If a very specific range is required, a mixed approach can be used. So the predefined NUMS table is used as a source, with one or two outer SQLs that multiply, add or whatever the source range.

  • I disagree with David’s conclusion that

     "However, I found that for small ranges, say under 500 records, there was little difference in performance between the procedural and set based versions. Once beyond the 500 threshold the set based version won hands down."

    I am working with a result set of 500 rows but my initial value is a large number and the set based approach on my test machine takes 6 seconds to run and the procedural approach takes less then a second

    SELECT * FROM [dbo].[fnCrossJoinRange2](1001000, 1001500) Run time 6 sec

    SELECT * FROM [dbo].[fnRange](1001000, 1001500) Run time 0 sec

    I like David’s approach but it does not perform as well for large initial values

  • Here is a solution using CLR integration. It comes in around 5 seconds on my machine for the range range 5 to 567720.

    public partial class UserDefinedFunctions

    {

         [Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName="NumberRangeFillRow",

              TableDefinition="number int",

              IsPrecise=true,

              IsDeterministic=true)]

         public static System.Collections.IEnumerable NumberRange(int first, int last) 

         { 

              int counter = first; 

              while(counter <= last) 

              { 

                   yield return counter; 

                   counter++; 

              } 

         } 

         public static void NumberRangeFillRow(object obj, out int number) 

         { 

              number = (int)obj; 

         } 

    }; 

  • I must admit I didn't look at the possibility of a high initial value but with a small range.

    Following it through logically all the selects would come into play so in this scenario it would perform much slower.

    If you produce the range using

    SELECT 1000000+value FROM [dbo].[fnCrossJoinRange2](0, 500)

    what sort of results do you get?

    The one thing that does show up here is that there are many, many solutions and it is always worth spending a little time researching alternatives.

  • Well done, David.  It's nice to see it when someone takes multiple methods and does a really good performance comparison.

    It seems that generating a range of numbers in a temporary storage area is something that a lot of folks want to do and have always had to resort to some form of loop or a complex setup where they create mini-tables to grow bigger ones using cross-joins.  I believe the reason for this is because the IDENTITY function does not allow variables for the seed nor the increment and the TOP clause is not programmable in versions less than 2005.

    Speaking of alternatives, I've solved that problem for temp tables in the code below... and the meat of the code uses a single insert in the spirit of the closing remarks in your article's conclusion.  It won't work for table variables so it cannot be used in a function.

    As a reminder to everyone, table variables do not use statisics nor can they be made to do so.  As a result, they are much slower than their more effective temp table "siblings".  There are a lot of things that are wrong with table variables and a lot of incorrect information about temp tables.  But don't take my word for it... check out the following URL.  You may never use a table variable again except when a multi-row structure is required as a return from a function...

    http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k

    OK, here's the programmable numeric range generation code I promised.  The very first time you run it, it will take about 7 seconds to run on 500,000 rows including the addition of the clustered primary key.  Once compiled, subsequent runs take less than 4 seconds to generate a range of 500,000 records no matter what the starting number is and that includes the addition of the clustered primary key... obviously, the smaller the range, the faster this one will run.  It generates 500 records in about 16 milli-seconds no matter the starting number...

    --===== Declare local performance measuring variable

         -- This would not be part of the normal code

    DECLARE @StartTime DATETIME

    --===== Start the timer

        SET @StartTime = GETDATE()

    --===== Declare local variables

    DECLARE @LoVal INT

    DECLARE @HiVal INT

    DECLARE @Vals  INT

    DECLARE @sql   NVARCHAR(300)

    --===== Set the local variables

        SET @LoVal = 500000     --Could be param for sp

        SET @HiVal = 1000000 --Could be param for sp

        SET @Vals  = @HiVal-@LoVal+1

    --===== If the temp table already exists, drop it

         IF OBJECT_ID('TempDB..#Tally') IS NOT NULL

            DROP TABLE #Tally

    --===== Create the new temp table

         -- The "One" column is a dummy target

     CREATE TABLE #Tally (N INT IDENTITY(1,1) ,One TINYINT)

    --===== Change the starting value of IDENTITY

       DBCC CHECKIDENT (#Tally, RESEED, @LoVal) --Doesn't work on table variables

    --===== Ready to rock... populate the table

         -- using a nasty ol' cross join

        SET ROWCOUNT @Vals

     INSERT INTO #Tally (One)

     SELECT 1

       FROM Master.dbo.SysColumns sc1,  --Contains >4800 records

            Master.dbo.SysColumns sc2   --so good to about 23 million records

        SET ROWCOUNT 0

    --===== Add a clustered Primary key

      ALTER TABLE #Tally

            ADD PRIMARY KEY CLUSTERED (N)

    --===== Display the duration of the run

      PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())/1000.0,10,3) + ' Seconds total duration'

    The key to the programmability of the above code is the use of the DBCC CHECKIDENT command which does allow a variable for the RESEED value.  It's also the reason why the table must be created first instead of using something like the following code (hat's off to my DBA, Andy Dowswell, for first showing me this little trick a while back) ...

        SET ROWCOUNT @HiVal

     SELECT IDENTITY(INT,1,1) AS N

       INTO #Tally

       FROM Master.dbo.SysColumns sc1,  --Contains >4800 records

            Master.dbo.SysColumns sc2   --so good to about 23 million records

        SET ROWCOUNT 0

    Admittedly, the programmable code ends up with an extra column of numbers but TINYINT (or BIT for that matter) only take up 1 byte per row.  (Varchar(1) would take 6 bytes even if NULL).  Of course, the extra column could be a datetime column where you update the column with a starting date + the N column +/- some offset to make things work.

    In one spot in the code, I say that a couple of variables could be used in a stored procedure... allow me to clarify... you would have to first create the #Tally temp table in the procedure that calls range generation proc in order for this to work as an SP because the scope of non-global temp tables is "downward", not "upward" in calls.

    Anyway, thanks for posting a great article and for all the other contributions and posts you've made on this and other SQL forums... you're definitely one of the good guys.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wow, the message above looks even better than original article.

  • Thanks to everyone who has posted.

    One feature I would really like to see on SQL Server Central (but it is in the big bucks content management arean) is the ability for an author to revise their article on the basis of feedback.

    Clearly the algorithmn for the progressive doubling of the number range should be included, as should the IDENTITY variation.

    I did consider the idea of storing a range in a specific table with a clustered index on it but rejected it on the grounds that storing 2.1 billion records to allow for all variations of a 32 bit int wouldn't be popular.

    If I was to use the static table approach I would probably stick it in its own database so that it wouldn't artificially inflate the backups.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply