Searching smallest unused number

  • Hi all,

    I need to create sql script which will go through all records and find smallest unused number in list of unsorted numbers.

    I have table with ID(primary key), UserNumber(int), FullName(varchar) and there are 2000 records in this table. UserNumber is some random number, currently smallest one is 989 and largest is 9877 nothing i sorted. Now i want to create automatic script which will check what is smallest unused number(UserNumber) and use that number when adding new record.

    Is this possible?

    Thank you.

  • Please post table definitions, sample data and desired output. Read this to see the best way to post this to get quick responses.

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Any reason you don't have an identity column in the Users' table and use that instead in that table?

  • damirmi (10/24/2011)


    Hi all,

    I need to create sql script which will go through all records and find smallest unused number in list of unsorted numbers.

    I have table with ID(primary key), UserNumber(int), FullName(varchar) and there are 2000 records in this table. UserNumber is some random number, currently smallest one is 989 and largest is 9877 nothing i sorted. Now i want to create automatic script which will check what is smallest unused number(UserNumber) and use that number when adding new record.

    Is this possible?

    Thank you.

    Hello and welcome to SSC!

    As others have already pointed out, your DDL script has become detached from your post, or perhaps you were unaware of the benefits of providing one.

    When you have time, please read this article[/url] about the best way to provide us with working sample data and DDL scripts. This will allow the unpaid volunteers of this site to provide you with working, tested code for your particular problem.

    For now, I've had a "best guess" at your issue.

    First, lets create a test environment to play with: -

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    --1,000,000 Random rows of data

    SELECT TOP 1000000 ABS(CHECKSUM(NewId())) AS ID

    INTO #testEnvironment

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    --DELETE non-unique rows

    DELETE FROM #testEnvironment

    WHERE ID IN (SELECT ID

    FROM #testEnvironment

    GROUP BY ID

    HAVING COUNT(ID) > 1)

    OK, now on to the query.

    --Use a real tally table for performance, here's one on the fly for testing.

    --If you're unsure about what a tally table is, or what it's for, please read

    --this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/

    ;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),

    t2 AS (SELECT 1 N FROM t1 x, t1 y),

    t3 AS (SELECT 1 N FROM t2 x, t2 y),

    t4 AS (SELECT 1 N FROM t3 x, t3 y),

    tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num

    FROM t4 x, t4 y)

    SELECT MIN(num) AS nextID

    FROM #testEnvironment a

    RIGHT OUTER JOIN tally b ON a.ID = b.num

    WHERE ID IS NULL


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thank you very much for answer, I will look into into it as soon as possible and get back to you with the result

    BR

    Damir

  • CELKO (10/25/2011)


    >> I need to create SQL script which will go through all records [sic: rows are not records] and find smallest unused number in list [sic: tables are not lists] of unsorted [sic: tables have no ordering, they are sets] numbers. <,

    That was a lot of ignorance in one paragraph

    >> I have table with id (primary key), user_nbr number (int), fullname (varchar) and there are 2000 records [sic] in this table. <<

    Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Please learn to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect.

    Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. If you want to learn how to ask a question on a Newsgroup, look at: http://www.catb.org/~esr/faqs/smart-questions.html

    What is the magical, generic “id”? I hope you did not use an IDENTITY or other non-relational construct when you have a proper key in user_nbr. Here is my guess at your table:

    CREATE TABLE Users

    (user_nbr INTEGER NOT NULL PRIMARY KEY

    CHECK (user_id > 0),

    user_name VARCHAR (25) NOT NULL);

    SELECT MIN (user_id +1) AS unused_user_id_min

    FROM Users

    WHERE (user_id + 1) NOT IN (SELECT user_id FROM Users);

    Ouch, was all of that entirely necessary?

    Lets examine your solution and compare to the tally table version that I had already posted.

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    --1,000,000 random rows of data

    SELECT TOP 1000000 ABS(CHECKSUM(NewId())) AS ID

    INTO #testEnvironment

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    --DELETE non-unique rows

    DELETE FROM #testEnvironment

    WHERE ID IN (SELECT ID

    FROM #testEnvironment

    GROUP BY ID

    HAVING COUNT(ID) > 1)

    PRINT '========== BASELINE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT COUNT(*)

    FROM #testEnvironment

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== CELKO Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN (ID +1) AS unused_user_id_min

    FROM #testEnvironment

    WHERE (ID + 1) NOT IN (SELECT ID FROM #testEnvironment);

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== TALLY TABLE Solution =========='

    --Use a real tally table for performance, here's one on the fly for testing.

    --If you're unsure about what a tally table is, or what it's for, please read

    --this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),

    t2 AS (SELECT 1 N FROM t1 x, t1 y),

    t3 AS (SELECT 1 N FROM t2 x, t2 y),

    t4 AS (SELECT 1 N FROM t3 x, t3 y),

    tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num

    FROM t4 x, t4 y)

    SELECT MIN(num) AS nextID

    FROM #testEnvironment a

    RIGHT OUTER JOIN tally b ON a.ID = b.num

    WHERE ID IS NULL

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    --CleanUp

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    First, completely random numbers and no index on the table.

    ========== BASELINE ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________000000000080'. Scan count 1, logical reads 1609, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 63 ms, elapsed time = 70 ms.

    ========== CELKO Solution ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________000000000080'. Scan count 18, logical reads 2008750, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 179170 ms, elapsed time = 46838 ms.

    ========== TALLY TABLE Solution ==========

    (1 row(s) affected)

    Table '#testEnvironment____________________________________________________________________________________________________000000000080'. Scan count 5, logical reads 1609, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 250 ms, elapsed time = 83 ms.

    So the CELKO solution takes 46838 ms compared to the TALLY taking 83 ms.

    Also, the CELKO solution doesn't pick up that the lowest number is 1, instead reporting the next lowest number (4657).

    OK, perhaps my results are skewed? Lets try a slightly less random set of data, where we already have an ID of "1" so that we can be sure that the CELKO solution gets the correct lowest unused ID.

    IF object_id('tempdb..#testEnvironment2') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment2

    END

    CREATE TABLE #testEnvironment2

    (ID INTEGER NOT NULL PRIMARY KEY

    CHECK (ID > 0));

    ;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),

    t2 AS (SELECT 1 N FROM t1 x, t1 y),

    t3 AS (SELECT 1 N FROM t2 x, t2 y),

    t4 AS (SELECT 1 N FROM t3 x, t3 y),

    t5 AS (SELECT 1 N FROM t4 x, t4 y),

    tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ID

    FROM t5 x, t5 y)

    INSERT INTO #testEnvironment2

    SELECT ID

    FROM tally

    WHERE ID <= 1000000 AND

    (ID%3 = 0 OR ID%5 = 0 OR ID = 1)

    PRINT '========== BASELINE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT COUNT(*)

    FROM #testEnvironment2

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== CELKO Solution =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT MIN (ID +1) AS unused_user_id_min

    FROM #testEnvironment2

    WHERE (ID + 1) NOT IN (SELECT ID FROM #testEnvironment2);

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '========== TALLY TABLE Solution =========='

    --Use a real tally table for performance, here's one on the fly for testing.

    --If you're unsure about what a tally table is, or what it's for, please read

    --this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),

    t2 AS (SELECT 1 N FROM t1 x, t1 y),

    t3 AS (SELECT 1 N FROM t2 x, t2 y),

    t4 AS (SELECT 1 N FROM t3 x, t3 y),

    tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num

    FROM t4 x, t4 y)

    SELECT MIN(num) AS nextID

    FROM #testEnvironment2 a

    RIGHT OUTER JOIN tally b ON a.ID = b.num

    WHERE ID IS NULL

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    --CleanUp

    IF object_id('tempdb..#testEnvironment2') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment2

    END

    This time we have an index, which should hopefully benefit the CELKO solution.

    ========== BASELINE ==========

    (1 row(s) affected)

    Table '#testEnvironment2___________________________________________________________________________________________________000000000081'. Scan count 1, logical reads 754, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 31 ms, elapsed time = 32 ms.

    ========== CELKO Solution ==========

    (1 row(s) affected)

    Table '#testEnvironment2___________________________________________________________________________________________________00000000008B'. Scan count 2, logical reads 1510, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 205 ms, elapsed time = 203 ms.

    ========== TALLY TABLE Solution ==========

    (1 row(s) affected)

    Table '#testEnvironment2___________________________________________________________________________________________________000000000081'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 31 ms, elapsed time = 32 ms.

    The plus side is that both the CELKO solution and the Tally solution now report the same lowest ID (2), but once again we have a massive difference in time.

    First run

    No index - random numbers in ID

    Baseline - CPU time = 63 ms, elapsed time = 70 ms.

    CELKO - CPU time = 179170 ms, elapsed time = 46838 ms.

    Tally - CPU time = 250 ms, elapsed time = 83 ms.

    Index

    Baseline - CPU time = 31 ms, elapsed time = 32 ms.

    CELKO - CPU time = 205 ms, elapsed time = 203 ms.

    Tally - CPU time = 31 ms, elapsed time = 32 ms.

    Second run

    No index - random numbers in ID

    Baseline - CPU time = 78 ms, elapsed time = 70 ms.

    CELKO - CPU time = 169609 ms, elapsed time = 45142 ms.

    Tally - CPU time = 343 ms, elapsed time = 84 ms.

    Index

    Baseline - CPU time = 31 ms, elapsed time = 34 ms.

    CELKO - CPU time = 227 ms, elapsed time = 207 ms.

    Tally - CPU time = 31 ms, elapsed time = 29 ms.

    Third run

    No index - random numbers in ID

    Baseline - CPU time = 63 ms, elapsed time = 70 ms.

    CELKO - CPU time = 176298 ms, elapsed time = 46788 ms.

    Tally - CPU time = 328 ms, elapsed time = 85 ms.

    Index

    Baseline - CPU time = 47 ms, elapsed time = 33 ms.

    CELKO - CPU time = 235 ms, elapsed time = 222 ms.

    Tally - CPU time = 31 ms, elapsed time = 28 ms.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • @cadavre We stopped wasting our time proving wrong JC (not short for Jesus-Christ).

    However this is a great proof that theory is not real life and that performance is still king (once you have the correct answer obviously).

    @jc-2 There's NOTHING in this world forcing every DBA / programmer to know those standards and what you teach. While I agree that standards are critical to any profession, your way to go about it is completely wrong.

    Maybe you should try to force every governement to implement them and then force us to go back to school to learn that stuff.

    Otherwise I think you're just wasting everyone's time rather than helping.

  • @cadavre

    Thank you very much this helped me a lot, it works perfectly.

    P.S.

    Next time I ask a question I will try to follow forum rules.:-D

  • damirmi (10/31/2011)


    @Cadavre

    Thank you very much this helped me a lot, it works perfectly.

    P.S.

    Next time I ask a question I will try to follow forum rules.:-D

    More "guidelines" than "rules". They just ensure that we manage to give you correct and tested code for your specific query 🙂

    Glad you have your answer, if you have any questions on how anything works then feel free to ask.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Ninja's_RGR'us (10/26/2011)


    @Cadavre We stopped wasting our time proving wrong JC (not short for Jesus-Christ).

    No we didn't. 🙂 And I rather enjoyed the display of prowess that Cadavre put on.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/31/2011)


    Ninja's_RGR'us (10/26/2011)


    @Cadavre We stopped wasting our time proving wrong JC (not short for Jesus-Christ).

    No we didn't. 🙂 And I rather enjoyed the display of prowess that Cadavre put on.

    Ya me too.

    But it was different that just shut up @jc-2 or the usual answers.

    :hehe:

  • Thanks guys. The post was more to show future Google searchers the power of the tally table (which I learnt from Jeff's posts and articles :-)) than to correct CELKO. I'm aware that my knowledge of SQL Server is far below his, but everyone has something to learn.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • 1) I LOVE the Celko bashing response! MUCH better than the simple "go away Joe" I was gonna post. 🙂

    2) Hate to say this but the rownumber solution doesn't give the correct answer either. The definition of the column is INT, which contrary to popular belief does NOT start at ONE!!! The negative limit of the integer datatype (-2.1xx B) should be the answer for the test case generated, but it is obviously 1 due to rownumber tally table method. You could use a full-blown permanent numbers table to get the right answer.

    3) The elephant in the room that no one is mentioning is the CONCURRENCY issue here. It is QUITE difficult (without the OUTPUT clause anyway, which is thankfully available now) to do sequencing like this. Without proper locking controls in an environment with more than one concurrent user you can wind up either a) keeping scalability down to a crawl or b) much worse wind up reusing the same number more than once.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (11/3/2011)


    1) I LOVE the Celko bashing response! MUCH better than the simple "go away Joe" I was gonna post. 🙂

    Really wasn't my intention to appear to be bashing anyone 🙁

    TheSQLGuru (11/3/2011)


    2) Hate to say this but the rownumber solution doesn't give the correct answer either. The definition of the column is INT, which contrary to popular belief does NOT start at ONE!!! The negative limit of the integer datatype (-2.1xx B) should be the answer for the test case generated, but it is obviously 1 due to rownumber tally table method. You could use a full-blown permanent numbers table to get the right answer.

    I suppose from a mathematical stand-point, you're correct. Honestly, I assumed that the OP wanted "ID" numbers that started at 1 (actually went back to the OP's first post to double check it wasn't part of the requirements).

    TheSQLGuru (11/3/2011)


    3) The elephant in the room that no one is mentioning is the CONCURRENCY issue here. It is QUITE difficult (without the OUTPUT clause anyway, which is thankfully available now) to do sequencing like this. Without proper locking controls in an environment with more than one concurrent user you can wind up either a) keeping scalability down to a crawl or b) much worse wind up reusing the same number more than once.

    This I have no answer for :w00t: I hadn't actually considered the concurrency issue since I'm unsure what use the OP was intending on getting from the script.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre (11/3/2011)


    TheSQLGuru (11/3/2011)


    1) I LOVE the Celko bashing response! MUCH better than the simple "go away Joe" I was gonna post. 🙂

    Really wasn't my intention to appear to be bashing anyone 🙁

    TheSQLGuru (11/3/2011)


    2) Hate to say this but the rownumber solution doesn't give the correct answer either. The definition of the column is INT, which contrary to popular belief does NOT start at ONE!!! The negative limit of the integer datatype (-2.1xx B) should be the answer for the test case generated, but it is obviously 1 due to rownumber tally table method. You could use a full-blown permanent numbers table to get the right answer.

    I suppose from a mathematical stand-point, you're correct. Honestly, I assumed that the OP wanted "ID" numbers that started at 1 (actually went back to the OP's first post to double check it wasn't part of the requirements).

    TheSQLGuru (11/3/2011)


    3) The elephant in the room that no one is mentioning is the CONCURRENCY issue here. It is QUITE difficult (without the OUTPUT clause anyway, which is thankfully available now) to do sequencing like this. Without proper locking controls in an environment with more than one concurrent user you can wind up either a) keeping scalability down to a crawl or b) much worse wind up reusing the same number more than once.

    This I have no answer for :w00t: I hadn't actually considered the concurrency issue since I'm unsure what use the OP was intending on getting from the script.

    A) I really wish Joe Celko would just stop "helping" SQL Server people. Many of us have been after him to stop for years now. Besides being incredibly abrasive/abusive he puts out REALLY bad advice for SQL Server. Odd thing is that in person he is about the nicest guy you could meet.

    B) You and virtually everyone else just starts all their identities from 1, giving up half the potential values on all but tinyints. I have come across numerous clients that have had to jump through (sometimes painful) hoops to address running out of values on an identity column because of this.

    C) I have likewise come across numerous clients that have had duplicate value collisions and/or concurrency problems with "getting next value" scenarios.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply