Searching smallest unused number

  • Dear damirmi

    My simple question is how do you identify which are the number that have already used of. If i am not wrong all the usernumber that are in table are free to use...

    then you can just use MIN() function to find the least unused number

    Select Min(UserNumber) From Table

    It find the least number from the table if there is anything that will let you know which are the number that are in use then keep that cateriea into where clause and use it.

    Thanks & Regards
    Syed Sami Ur Rehman
    SQL-Server (Developer)
    Hyderabad
    Email-sami.sqldba@gmail.com

  • To bring this story to a temporarily end. I tried to verify the previous shown behaviour

    with up to 500.000.000 Rows in #testEnvironment. (timeconsuming business !!, had to move transaction log

    because its blowing up to about 100 GB)

    But to my opinion, if you come to the situation to find unused Numbers out of 10E09 used ID's

    there is something basically wrong with the concept. (well formed sequence Tables might be a better approach)

    I intentionally only post the IO/CPU time results. The scrips are already well known here.

    (Sorry for german output, but I think you can read it)

    -- Temp Table with one missing Number: Total elapsed Time about 30 min

    (100.000.000 Zeile(n) betroffen)

    (1 Zeile(n) betroffen)

    ========== BASELINE ==========

    (1 Zeile(n) betroffen)

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 0 ms, verstrichene Zeit = 0 ms.

    ========== CELKO's first Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000B-Tabelle. Scananzahl 6, logische Lesevorgänge 321546, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    Worktable-Tabelle. Scananzahl 0, logische Lesevorgänge 0, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 192848 ms, verstrichene Zeit = 444526 ms.

    ========== TALLY TABLE Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000B-Tabelle. Scananzahl 3, logische Lesevorgänge 160773, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    Worktable-Tabelle. Scananzahl 0, logische Lesevorgänge 0, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 16785 ms, verstrichene Zeit = 8810 ms.

    ========== ROW_NUMBER Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000B-Tabelle. Scananzahl 4, logische Lesevorgänge 160773, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 50826 ms, verstrichene Zeit = 30194 ms.

    ========== CELKO's second Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000B-Tabelle. Scananzahl 12, logische Lesevorgänge 643092, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    Worktable-Tabelle. Scananzahl 0, logische Lesevorgänge 0, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 423636 ms, verstrichene Zeit = 888386 ms.

    -- Temp Table with Index: Total elapsed Time about 5 min

    (46.666.668 Zeile(n) betroffen)

    ========== BASELINE ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000C-Tabelle. Scananzahl 1, logische Lesevorgänge 3, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 0 ms, verstrichene Zeit = 0 ms.

    ========== CELKO's first Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000C-Tabelle. Scananzahl 2, logische Lesevorgänge 150614, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 20483 ms, verstrichene Zeit = 20747 ms.

    ========== TALLY TABLE Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000C-Tabelle. Scananzahl 0, logische Lesevorgänge 200722, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    Worktable-Tabelle. Scananzahl 0, logische Lesevorgänge 0, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 187 ms, verstrichene Zeit = 113 ms.

    ========== ROW_NUMBER Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000C-Tabelle. Scananzahl 2, logische Lesevorgänge 6, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 0 ms, verstrichene Zeit = 0 ms.

    ========== CELKO's second Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000C-Tabelle. Scananzahl 4, logische Lesevorgänge 301228, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 40248 ms, verstrichene Zeit = 40310 ms.

    -- Temp Table with Index Total elapsed Time about 1h

    (466.666.668 Zeile(n) betroffen)

    ========== BASELINE ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000D-Tabelle. Scananzahl 1, logische Lesevorgänge 4, physische Lesevorgänge 1, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 0 ms, verstrichene Zeit = 12 ms.

    ========== CELKO's first Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000D-Tabelle. Scananzahl 2, logische Lesevorgänge 1506118, physische Lesevorgänge 877, Read-Ahead-Lesevorgänge 487104, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 207996 ms, verstrichene Zeit = 258317 ms.

    ========== TALLY TABLE Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000D-Tabelle. Scananzahl 0, logische Lesevorgänge 266804, physische Lesevorgänge 68, Read-Ahead-Lesevorgänge 5, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    Worktable-Tabelle. Scananzahl 0, logische Lesevorgänge 0, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 264 ms, verstrichene Zeit = 476 ms.

    ========== ROW_NUMBER Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000D-Tabelle. Scananzahl 2, logische Lesevorgänge 8, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 0 ms, verstrichene Zeit = 4 ms.

    ========== CELKO's second Solution ==========

    (1 Zeile(n) betroffen)

    #testEnvironment00000000000D-Tabelle. Scananzahl 4, logische Lesevorgänge 3012236, physische Lesevorgänge 604, Read-Ahead-Lesevorgänge 1479661, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.

    SQL Server-Ausführungszeiten:

    , CPU-Zeit = 432388 ms, verstrichene Zeit = 1726745 ms.

    Conclusion:

    Celko's solutions both are beyond all affairs, not only presenting wrong results, but also excruciatingly slow.

    #testEnvironment without Index: Tally beats my RowNumber

    #testEnvironment with Index: my RowNumber beats Tally

    Never give up 😉

  • rahu21 (11/16/2011)


    Conclusion:

    Celko's solutions both are beyond all affairs, not only presenting wrong results, but also excruciatingly slow.

    #testEnvironment without Index: Tally beats my RowNumber

    #testEnvironment with Index: my RowNumber beats Tally

    Never give up 😉

    Agreed. Although in my tests the TALLY table fairs better and the ROW_NUMBER fairs worse, I did find that generally if there was an index that the ROW_NUMBER solution was faster (the TALLY table that I posted that beat the ROW_NUMBER on an indexed table turned out to have a bug in it).

    With that in mind, is there any reason to keep the ORDER BY that you've used?

    SELECT TOP 1 RowNumber as FirstUnusedID

    FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID) AS RowNumber,ID

    FROM #testEnvironment )T

    WHERE RowNumber <> ID

    ORDER BY ID

    The above appears to get the same results but has less IO.

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    DECLARE @numberToBeDeleted INT

    SELECT @numberToBeDeleted = (ABS(CHECKSUM(NEWID())) % 60000) + 1

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID

    INTO #testEnvironment

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    CREATE NONCLUSTERED INDEX [test_index]

    ON #testEnvironment (ID)

    DELETE FROM #testEnvironment

    WHERE ID = @numberToBeDeleted

    PRINT '========== BASELINE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN(ID)

    FROM #testEnvironment

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== CELKO''s first Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN (ID +1) AS unused_user_id_min

    FROM #testEnvironment

    WHERE (ID + 1) NOT IN (SELECT ID FROM #testEnvironment);

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== TALLY TABLE Solution =========='

    --Use a real tally table for performance, here's one on the fly for testing.

    --If you're unsure about what a tally table is, or what it's for, please read

    --this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),

    t2 AS (SELECT 1 N FROM t1 x, t1 y),

    t3 AS (SELECT 1 N FROM t2 x, t2 y),

    t4 AS (SELECT 1 N FROM t3 x, t3 y),

    tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num

    FROM t4 x, t4 y)

    SELECT MIN(num) AS nextID

    FROM #testEnvironment a

    RIGHT OUTER JOIN tally b ON a.ID = b.num

    WHERE ID IS NULL

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== ROW_NUMBER Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT TOP 1 RowNumber as FirstUnusedID FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID)

    AS RowNumber,ID FROM #testEnvironment)T

    WHERE RowNumber <> ID

    ORDER BY 1,((SELECT MAX(ID) + 1 FROM #testEnvironment))

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== ROW_NUMBER v2 Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT TOP 1 RowNumber as FirstUnusedID

    FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID) AS RowNumber,ID

    FROM #testEnvironment )T

    WHERE RowNumber <> ID

    ORDER BY ID

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== CELKO''s second Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH Available (ID) AS (

    SELECT ID -1

    FROM #testEnvironment

    WHERE (ID -1) NOT IN (SELECT ID FROM #testEnvironment)

    UNION ALL

    SELECT ID +1

    FROM #testEnvironment

    WHERE (ID +1) NOT IN (SELECT ID FROM #testEnvironment))

    SELECT MIN (ID)

    FROM Available

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    --CleanUp

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    ========== BASELINE ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________00000000008C'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    ========== CELKO's first Solution ==========

    (1 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#testEnvironment____________________________________________________________________________________________________00000000008C'. Scan count 2, logical reads 3734, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1060 ms, elapsed time = 1113 ms.

    ========== TALLY TABLE Solution ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________00000000008C'. Scan count 1, logical reads 132, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 31 ms, elapsed time = 37 ms.

    ========== ROW_NUMBER Solution ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________00000000008C'. Scan count 2, logical reads 65, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 8 ms.

    ========== ROW_NUMBER v2 Solution ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________00000000008C'. Scan count 1, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 8 ms.

    ========== CELKO's second Solution ==========

    (1 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#testEnvironment____________________________________________________________________________________________________00000000008C'. Scan count 4, logical reads 7468, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1981 ms, elapsed time = 1996 ms.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Can someone please do a test where there are 1M rows (or 10M or more) and the MIN missing value is one from the highest number in the range (i.e. 999,999 if you do a million row test)? I want to see if row_number/tally table solutions are affected by having a very high min number available. I would test but my laptop has a pair of SSDs and 16GB of RAM so everything is fast. Well, except for Joe Celko's solutions! :w00t:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (11/16/2011)


    Can someone please do a test where there are 1M rows (or 10M or more) and the MIN missing value is one from the highest number in the range (i.e. 999,999 if you do a million row test)? I want to see if row_number/tally table solutions are affected by having a very high min number available. I would test but my laptop has a pair of SSDs and 16GB of RAM so everything is fast. Well, except for Joe Celko's solutions! :w00t:

    Sorry, forgot about this. I've set it running on a virtual machine with a 600GB hdd, but only had 4GB of RAM to give it. Will let you know 🙂

    USE [Test]

    SET NOCOUNT ON

    PRINT '========== Ensure Table is ready =========='

    IF object_id('[Test].dbo.testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE [Test].dbo.testEnvironment

    END

    PRINT REPLICATE('=',80)

    PRINT '========== Insert 500,000,000(!!) rows =========='

    SELECT TOP 500000000 IDENTITY(INT,1,1) AS ID

    INTO [Test].dbo.testEnvironment

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3,

    master.dbo.syscolumns sc4, master.dbo.syscolumns sc5, master.dbo.syscolumns sc6,

    master.dbo.syscolumns sc7, master.dbo.syscolumns sc8, master.dbo.syscolumns sc9,

    master.dbo.syscolumns sc10, master.dbo.syscolumns sc11, master.dbo.syscolumns sc12

    PRINT REPLICATE('=',80)

    PRINT '========== Add Index =========='

    CREATE NONCLUSTERED INDEX [test_index]

    ON [Test].dbo.testEnvironment (ID)

    PRINT REPLICATE('=',80)

    PRINT '========== Delete 1 ID - 485,001,270 =========='

    DELETE FROM [Test].dbo.testEnvironment

    WHERE ID = 485001270

    PRINT REPLICATE('=',80)

    PRINT '========== Queries!! =========='

    PRINT REPLICATE('=',80)

    PRINT '========== BASELINE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN(ID)

    FROM [Test].dbo.testEnvironment

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',80)

    PRINT '========== ROW_NUMBER Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT TOP 1 RowNumber as FirstUnusedID FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID)

    AS RowNumber,ID FROM [Test].dbo.testEnvironment)T

    WHERE RowNumber <> ID

    ORDER BY 1,((SELECT MAX(ID) + 1 FROM [Test].dbo.testEnvironment))

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',80)

    PRINT '========== ROW_NUMBER v2 Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT TOP 1 RowNumber as FirstUnusedID

    FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID) AS RowNumber,ID

    FROM [Test].dbo.testEnvironment )T

    WHERE RowNumber <> ID

    ORDER BY ID

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',80)


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • 47 minutes 31 seconds later. . . 🙂

    ========== Ensure Table is ready ==========

    ================================================================================

    ========== Insert 500,000,000(!!) rows ==========

    ================================================================================

    ========== Add Index ==========

    ================================================================================

    ========== Delete 1 ID - 485,001,270 ==========

    ================================================================================

    ========== Queries!! ==========

    ================================================================================

    ========== BASELINE ==========

    Table 'testEnvironment'. Scan count 1, logical reads 4, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 29 ms.

    ================================================================================

    ========== ROW_NUMBER Solution ==========

    Table 'testEnvironment'. Scan count 2, logical reads 902183, physical reads 571, read-ahead reads 877616, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 162662 ms, elapsed time = 173297 ms.

    ================================================================================

    ========== ROW_NUMBER v2 Solution ==========

    Table 'testEnvironment'. Scan count 1, logical reads 902179, physical reads 712, read-ahead reads 908889, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 163832 ms, elapsed time = 176550 ms.

    ================================================================================

    So both of the ROW_NUMBER solutions are looking at around 170 seconds on 500m rows. At some point I'll set-up a tally table with 500m rows that can attempt to compete.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • TheSQLGuru (11/16/2011)


    Can someone please do a test where there are 1M rows (or 10M or more) and the MIN missing value is one from the highest number in the range (i.e. 999,999 if you do a million row test)? I want to see if row_number/tally table solutions are affected by having a very high min number available. I would test but my laptop has a pair of SSDs and 16GB of RAM so everything is fast. Well, except for Joe Celko's solutions! :w00t:

    And here's 1m rows where the earliest missing number is 999,999.

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID

    INTO #testEnvironment

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    CREATE NONCLUSTERED INDEX [test_index]

    ON #testEnvironment (ID)

    DELETE FROM #testEnvironment

    WHERE ID = 999999

    PRINT '========== BASELINE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN(ID)

    FROM #testEnvironment

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== CELKO''s first Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN (ID +1) AS unused_user_id_min

    FROM #testEnvironment

    WHERE (ID + 1) NOT IN (SELECT ID FROM #testEnvironment);

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== TALLY TABLE Solution =========='

    --If you're unsure about what a tally table is, or what it's for, please read

    --this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/

    --===== Create and populate the Tally table on the fly

    IF object_id('dbo.Tally') IS NOT NULL

    BEGIN

    DROP TABLE dbo.Tally

    END

    SELECT TOP 1000000

    IDENTITY(INT,1,1) AS num

    INTO dbo.Tally

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_num

    PRIMARY KEY CLUSTERED (num) WITH FILLFACTOR = 100

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN(num) AS nextID

    FROM #testEnvironment a

    RIGHT OUTER JOIN dbo.Tally b ON a.ID = b.num

    WHERE ID IS NULL

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== ROW_NUMBER Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT TOP 1 RowNumber as FirstUnusedID FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID)

    AS RowNumber,ID FROM #testEnvironment)T

    WHERE RowNumber <> ID

    ORDER BY 1,((SELECT MAX(ID) + 1 FROM #testEnvironment))

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== ROW_NUMBER v2 Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT TOP 1 RowNumber as FirstUnusedID

    FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID) AS RowNumber,ID

    FROM #testEnvironment )T

    WHERE RowNumber <> ID

    ORDER BY ID

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== CELKO''s second Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH Available (ID) AS (

    SELECT ID -1

    FROM #testEnvironment

    WHERE (ID -1) NOT IN (SELECT ID FROM #testEnvironment)

    UNION ALL

    SELECT ID +1

    FROM #testEnvironment

    WHERE (ID +1) NOT IN (SELECT ID FROM #testEnvironment))

    SELECT MIN (ID)

    FROM Available

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    ========== BASELINE ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________0000000000C3'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    ========== CELKO's first Solution ==========

    (1 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#testEnvironment____________________________________________________________________________________________________0000000000C3'. Scan count 2, logical reads 3734, physical reads 0, read-ahead reads 15, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 967 ms, elapsed time = 974 ms.

    ========== TALLY TABLE Solution ==========

    (1000000 row(s) affected)

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________0000000000C3'. Scan count 1, logical reads 1867, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Tally'. Scan count 1, logical reads 1616, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 405 ms, elapsed time = 405 ms.

    ========== ROW_NUMBER Solution ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________0000000000C3'. Scan count 2, logical reads 1870, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 281 ms, elapsed time = 287 ms.

    ========== ROW_NUMBER v2 Solution ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________0000000000C3'. Scan count 1, logical reads 1867, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 328 ms, elapsed time = 326 ms.

    ========== CELKO's second Solution ==========

    (1 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#testEnvironment____________________________________________________________________________________________________0000000000C3'. Scan count 4, logical reads 7468, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2106 ms, elapsed time = 2126 ms.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Last one big test.

    USE [Test]

    SET NOCOUNT ON

    PRINT '========== Ensure Table is ready =========='

    IF object_id('[Test].dbo.testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE [Test].dbo.testEnvironment

    END

    PRINT REPLICATE('=',80)

    PRINT '========== Insert 500,000,000(!!) rows =========='

    SELECT TOP 500000000 IDENTITY(INT,1,1) AS ID

    INTO [Test].dbo.testEnvironment

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3,

    master.dbo.syscolumns sc4, master.dbo.syscolumns sc5, master.dbo.syscolumns sc6,

    master.dbo.syscolumns sc7, master.dbo.syscolumns sc8, master.dbo.syscolumns sc9,

    master.dbo.syscolumns sc10, master.dbo.syscolumns sc11, master.dbo.syscolumns sc12

    PRINT REPLICATE('=',80)

    PRINT '========== Add Index =========='

    CREATE NONCLUSTERED INDEX [test_index]

    ON [Test].dbo.testEnvironment (ID)

    PRINT REPLICATE('=',80)

    PRINT '========== Delete 1 ID - 485,001,270 =========='

    DELETE FROM [Test].dbo.testEnvironment

    WHERE ID = 485001270

    PRINT REPLICATE('=',80)

    PRINT '========== Queries!! =========='

    PRINT REPLICATE('=',80)

    PRINT '========== BASELINE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN(ID)

    FROM [Test].dbo.testEnvironment

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',80)

    PRINT '========== ROW_NUMBER Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT TOP 1 RowNumber as FirstUnusedID FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID)

    AS RowNumber,ID FROM [Test].dbo.testEnvironment)T

    WHERE RowNumber <> ID

    ORDER BY 1,((SELECT MAX(ID) + 1 FROM [Test].dbo.testEnvironment))

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',80)

    PRINT '========== ROW_NUMBER v2 Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT TOP 1 RowNumber as FirstUnusedID

    FROM (SELECT ROW_NUMBER() OVER(ORDER BY ID) AS RowNumber,ID

    FROM [Test].dbo.testEnvironment )T

    WHERE RowNumber <> ID

    ORDER BY ID

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',80)

    PRINT '========== TALLY TABLE Solution =========='

    --If you're unsure about what a tally table is, or what it's for, please read

    --this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/

    --===== Create and populate the Tally table on the fly

    IF object_id('[Test].dbo.Tally') IS NOT NULL

    BEGIN

    DROP TABLE [Test].dbo.Tally

    END

    SELECT TOP 500000000

    IDENTITY(INT,1,1) AS num

    INTO dbo.Tally

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3,

    master.dbo.syscolumns sc4, master.dbo.syscolumns sc5, master.dbo.syscolumns sc6,

    master.dbo.syscolumns sc7, master.dbo.syscolumns sc8, master.dbo.syscolumns sc9,

    master.dbo.syscolumns sc10, master.dbo.syscolumns sc11, master.dbo.syscolumns sc12

    --===== Add a Primary Key to maximize performance

    ALTER TABLE [Test].dbo.Tally

    ADD CONSTRAINT PK_Tally_num

    PRIMARY KEY CLUSTERED (num) WITH FILLFACTOR = 100

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN(num) AS nextID

    FROM [Test].dbo.testEnvironment a

    RIGHT OUTER JOIN [Test].dbo.Tally b ON a.ID = b.num

    WHERE ID IS NULL

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',80)

    PRINT '========== CELKO''s first Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN (ID +1) AS unused_user_id_min

    FROM [Test].dbo.testEnvironment

    WHERE (ID + 1) NOT IN (SELECT ID FROM [Test].dbo.testEnvironment);

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',80)

    PRINT '========== Clean Up =========='

    IF object_id('[Test].dbo.testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE [Test].dbo.testEnvironment

    END

    IF object_id('[Test].dbo.Tally') IS NOT NULL

    BEGIN

    DROP TABLE [Test].dbo.Tally

    END

    PRINT REPLICATE('=',80)

    ========== Ensure Table is ready ==========

    ================================================================================

    ========== Insert 500,000,000(!!) rows ==========

    ================================================================================

    ========== Add Index ==========

    ================================================================================

    ========== Delete 1 ID - 485,001,270 ==========

    ================================================================================

    ========== Queries!! ==========

    ================================================================================

    ========== BASELINE ==========

    Table 'testEnvironment'. Scan count 1, logical reads 4, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 13 ms.

    ================================================================================

    ========== ROW_NUMBER Solution ==========

    Table 'testEnvironment'. Scan count 2, logical reads 902182, physical reads 1434, read-ahead reads 471399, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 152304 ms, elapsed time = 157702 ms.

    ================================================================================

    ========== ROW_NUMBER v2 Solution ==========

    Table 'testEnvironment'. Scan count 1, logical reads 902178, physical reads 499, read-ahead reads 476329, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 162007 ms, elapsed time = 186641 ms.

    ================================================================================

    ========== TALLY TABLE Solution ==========

    Table 'testEnvironment'. Scan count 1, logical reads 930060, physical reads 768, read-ahead reads 930045, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Tally'. Scan count 1, logical reads 805158, physical reads 1102, read-ahead reads 491970, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 218323 ms, elapsed time = 263700 ms.

    ================================================================================

    ========== CELKO's first Solution ==========

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'testEnvironment'. Scan count 2, logical reads 1860120, physical reads 560, read-ahead reads 930049, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 513556 ms, elapsed time = 519127 ms.

    ================================================================================

    ========== Clean Up ==========

    ================================================================================


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks for the awesome testing! Seems that the rownumber solution is a fair bit quicker than the tally table solution for larger numbers where the missing value is high up the value range.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 9 posts - 31 through 38 (of 38 total)

You must be logged in to reply to this topic. Login to reply