Searching smallest unused number

Question

Searching smallest unused number

damirmi

SSC Veteran

Points: 253
More actions
October 24, 2011 at 5:30 am

#246819

Hi all,
I need to create sql script which will go through all records and find smallest unused number in list of unsorted numbers.
I have table with ID(primary key), UserNumber(int), FullName(varchar) and there are 2000 records in this table. UserNumber is some random number, currently smallest one is 989 and largest is 9877 nothing i sorted. Now i want to create automatic script which will check what is smallest unused number(UserNumber) and use that number when adding new record.
Is this possible?
Thank you.

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

Please post table definitions, sample data and desired output. Read this to see the best way to post this to get quick responses.

http://www.sqlservercentral.com/articles/Best+Practices/61537/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 2

Any reason you don't have an identity column in the Users' table and use that instead in that table?

Cadavre SSC-Forever Points: 41690 More actions · Answer 3

damirmi (10/24/2011)
Hi all,
I need to create sql script which will go through all records and find smallest unused number in list of unsorted numbers.
I have table with ID(primary key), UserNumber(int), FullName(varchar) and there are 2000 records in this table. UserNumber is some random number, currently smallest one is 989 and largest is 9877 nothing i sorted. Now i want to create automatic script which will check what is smallest unused number(UserNumber) and use that number when adding new record.
Is this possible?
Thank you.

Hello and welcome to SSC!

As others have already pointed out, your DDL script has become detached from your post, or perhaps you were unaware of the benefits of providing one.

When you have time, please read this article[/url] about the best way to provide us with working sample data and DDL scripts. This will allow the unpaid volunteers of this site to provide you with working, tested code for your particular problem.

For now, I've had a "best guess" at your issue.

First, lets create a test environment to play with: -

IF object_id('tempdb..#testEnvironment') IS NOT NULL

BEGIN

DROP TABLE #testEnvironment

END

--1,000,000 Random rows of data

SELECT TOP 1000000 ABS(CHECKSUM(NewId())) AS ID

INTO #testEnvironment

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

--DELETE non-unique rows

DELETE FROM #testEnvironment

WHERE ID IN (SELECT ID

FROM #testEnvironment

GROUP BY ID

HAVING COUNT(ID) > 1)

OK, now on to the query.

--Use a real tally table for performance, here's one on the fly for testing.

--If you're unsure about what a tally table is, or what it's for, please read

--this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/

;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),

t2 AS (SELECT 1 N FROM t1 x, t1 y),

t3 AS (SELECT 1 N FROM t2 x, t2 y),

t4 AS (SELECT 1 N FROM t3 x, t3 y),

tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num

FROM t4 x, t4 y)

SELECT MIN(num) AS nextID

FROM #testEnvironment a

RIGHT OUTER JOIN tally b ON a.ID = b.num

WHERE ID IS NULL

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

damirmi SSC Veteran Points: 253 More actions · Answer 4

Thank you very much for answer, I will look into into it as soon as possible and get back to you with the result

BR

Damir

Cadavre SSC-Forever Points: 41690 More actions · Answer 5

CELKO (10/25/2011)
>> I need to create SQL script which will go through all records [sic: rows are not records] and find smallest unused number in list [sic: tables are not lists] of unsorted [sic: tables have no ordering, they are sets] numbers. <,
That was a lot of ignorance in one paragraph
>> I have table with id (primary key), user_nbr number (int), fullname (varchar) and there are 2000 records [sic] in this table. <<
Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Please learn to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect.
Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. If you want to learn how to ask a question on a Newsgroup, look at: http://www.catb.org/~esr/faqs/smart-questions.html
What is the magical, generic “id”? I hope you did not use an IDENTITY or other non-relational construct when you have a proper key in user_nbr. Here is my guess at your table:
CREATE TABLE Users
(user_nbr INTEGER NOT NULL PRIMARY KEY
CHECK (user_id > 0),
user_name VARCHAR (25) NOT NULL);
SELECT MIN (user_id +1) AS unused_user_id_min
FROM Users
WHERE (user_id + 1) NOT IN (SELECT user_id FROM Users);

Ouch, was all of that entirely necessary?

Lets examine your solution and compare to the tally table version that I had already posted.

IF object_id('tempdb..#testEnvironment') IS NOT NULL

BEGIN

DROP TABLE #testEnvironment

END

--1,000,000 random rows of data

SELECT TOP 1000000 ABS(CHECKSUM(NewId())) AS ID

INTO #testEnvironment

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

--DELETE non-unique rows

DELETE FROM #testEnvironment

WHERE ID IN (SELECT ID

FROM #testEnvironment

GROUP BY ID

HAVING COUNT(ID) > 1)

PRINT '========== BASELINE =========='

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT COUNT(*)

FROM #testEnvironment

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT '========== CELKO Solution =========='

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT MIN (ID +1) AS unused_user_id_min

FROM #testEnvironment

WHERE (ID + 1) NOT IN (SELECT ID FROM #testEnvironment);

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT '========== TALLY TABLE Solution =========='

--Use a real tally table for performance, here's one on the fly for testing.

--If you're unsure about what a tally table is, or what it's for, please read

--this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/

SET STATISTICS IO ON

SET STATISTICS TIME ON

;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),

t2 AS (SELECT 1 N FROM t1 x, t1 y),

t3 AS (SELECT 1 N FROM t2 x, t2 y),

t4 AS (SELECT 1 N FROM t3 x, t3 y),

tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num

FROM t4 x, t4 y)

SELECT MIN(num) AS nextID

FROM #testEnvironment a

RIGHT OUTER JOIN tally b ON a.ID = b.num

WHERE ID IS NULL

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

--CleanUp

IF object_id('tempdb..#testEnvironment') IS NOT NULL

BEGIN

DROP TABLE #testEnvironment

END

First, completely random numbers and no index on the table.

========== BASELINE ==========

(1 row(s) affected)

Table '#testEnvironment____________________________________________________________________________________________________000000000080'. Scan count 1, logical reads 1609, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 63 ms, elapsed time = 70 ms.

========== CELKO Solution ==========

(1 row(s) affected)

Table '#testEnvironment____________________________________________________________________________________________________000000000080'. Scan count 18, logical reads 2008750, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 179170 ms, elapsed time = 46838 ms.

========== TALLY TABLE Solution ==========

(1 row(s) affected)

Table '#testEnvironment____________________________________________________________________________________________________000000000080'. Scan count 5, logical reads 1609, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 250 ms, elapsed time = 83 ms.

So the CELKO solution takes 46838 ms compared to the TALLY taking 83 ms.

Also, the CELKO solution doesn't pick up that the lowest number is 1, instead reporting the next lowest number (4657).

OK, perhaps my results are skewed? Lets try a slightly less random set of data, where we already have an ID of "1" so that we can be sure that the CELKO solution gets the correct lowest unused ID.

IF object_id('tempdb..#testEnvironment2') IS NOT NULL

BEGIN

DROP TABLE #testEnvironment2

END

CREATE TABLE #testEnvironment2

(ID INTEGER NOT NULL PRIMARY KEY

CHECK (ID > 0));

;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),

t2 AS (SELECT 1 N FROM t1 x, t1 y),

t3 AS (SELECT 1 N FROM t2 x, t2 y),

t4 AS (SELECT 1 N FROM t3 x, t3 y),

t5 AS (SELECT 1 N FROM t4 x, t4 y),

tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ID

FROM t5 x, t5 y)

INSERT INTO #testEnvironment2

SELECT ID

FROM tally

WHERE ID <= 1000000 AND

(ID%3 = 0 OR ID%5 = 0 OR ID = 1)

PRINT '========== BASELINE =========='

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT COUNT(*)

FROM #testEnvironment2

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT '========== CELKO Solution =========='

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT MIN (ID +1) AS unused_user_id_min

FROM #testEnvironment2

WHERE (ID + 1) NOT IN (SELECT ID FROM #testEnvironment2);

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT '========== TALLY TABLE Solution =========='

--Use a real tally table for performance, here's one on the fly for testing.

--If you're unsure about what a tally table is, or what it's for, please read

--this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/

SET STATISTICS IO ON

SET STATISTICS TIME ON

;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),

t2 AS (SELECT 1 N FROM t1 x, t1 y),

t3 AS (SELECT 1 N FROM t2 x, t2 y),

t4 AS (SELECT 1 N FROM t3 x, t3 y),

tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num

FROM t4 x, t4 y)

SELECT MIN(num) AS nextID

FROM #testEnvironment2 a

RIGHT OUTER JOIN tally b ON a.ID = b.num

WHERE ID IS NULL

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

--CleanUp

IF object_id('tempdb..#testEnvironment2') IS NOT NULL

BEGIN

DROP TABLE #testEnvironment2

END

This time we have an index, which should hopefully benefit the CELKO solution.

========== BASELINE ==========

(1 row(s) affected)

Table '#testEnvironment2___________________________________________________________________________________________________000000000081'. Scan count 1, logical reads 754, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 31 ms, elapsed time = 32 ms.

========== CELKO Solution ==========

(1 row(s) affected)

Table '#testEnvironment2___________________________________________________________________________________________________00000000008B'. Scan count 2, logical reads 1510, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 205 ms, elapsed time = 203 ms.

========== TALLY TABLE Solution ==========

(1 row(s) affected)

Table '#testEnvironment2___________________________________________________________________________________________________000000000081'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 31 ms, elapsed time = 32 ms.

The plus side is that both the CELKO solution and the Tally solution now report the same lowest ID (2), but once again we have a massive difference in time.

First run

No index - random numbers in ID

Baseline - CPU time = 63 ms, elapsed time = 70 ms.

CELKO - CPU time = 179170 ms, elapsed time = 46838 ms.

Tally - CPU time = 250 ms, elapsed time = 83 ms.

Index

Baseline - CPU time = 31 ms, elapsed time = 32 ms.

CELKO - CPU time = 205 ms, elapsed time = 203 ms.

Tally - CPU time = 31 ms, elapsed time = 32 ms.

Second run

No index - random numbers in ID

Baseline - CPU time = 78 ms, elapsed time = 70 ms.

CELKO - CPU time = 169609 ms, elapsed time = 45142 ms.

Tally - CPU time = 343 ms, elapsed time = 84 ms.

Index

Baseline - CPU time = 31 ms, elapsed time = 34 ms.

CELKO - CPU time = 227 ms, elapsed time = 207 ms.

Tally - CPU time = 31 ms, elapsed time = 29 ms.

Third run

No index - random numbers in ID

Baseline - CPU time = 63 ms, elapsed time = 70 ms.

CELKO - CPU time = 176298 ms, elapsed time = 46788 ms.

Tally - CPU time = 328 ms, elapsed time = 85 ms.

Index

Baseline - CPU time = 47 ms, elapsed time = 33 ms.

CELKO - CPU time = 235 ms, elapsed time = 222 ms.

Tally - CPU time = 31 ms, elapsed time = 28 ms.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

@Cadavre We stopped wasting our time proving wrong JC (not short for Jesus-Christ).

However this is a great proof that theory is not real life and that performance is still king (once you have the correct answer obviously).

@jc-2 There's NOTHING in this world forcing every DBA / programmer to know those standards and what you teach. While I agree that standards are critical to any profession, your way to go about it is completely wrong.

Maybe you should try to force every governement to implement them and then force us to go back to school to learn that stuff.

Otherwise I think you're just wasting everyone's time rather than helping.

damirmi SSC Veteran Points: 253 More actions · Answer 7

@Cadavre

Thank you very much this helped me a lot, it works perfectly.

P.S.

Next time I ask a question I will try to follow forum rules.:-D

Cadavre SSC-Forever Points: 41690 More actions · Answer 8

damirmi (10/31/2011)
@Cadavre
Thank you very much this helped me a lot, it works perfectly.
P.S.
Next time I ask a question I will try to follow forum rules.:-D

More "guidelines" than "rules". They just ensure that we manage to give you correct and tested code for your specific query 🙂

Glad you have your answer, if you have any questions on how anything works then feel free to ask.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

Ninja's_RGR'us (10/26/2011)
@Cadavre We stopped wasting our time proving wrong JC (not short for Jesus-Christ).

No we didn't. 🙂 And I rather enjoyed the display of prowess that Cadavre put on.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

Jeff Moden (10/31/2011)
Ninja's_RGR'us (10/26/2011)
@Cadavre We stopped wasting our time proving wrong JC (not short for Jesus-Christ).
No we didn't. 🙂 And I rather enjoyed the display of prowess that Cadavre put on.

Ya me too.

But it was different that just shut up @jc-2 or the usual answers.

:hehe:

Cadavre SSC-Forever Points: 41690 More actions · Answer 11

Thanks guys. The post was more to show future Google searchers the power of the tally table (which I learnt from Jeff's posts and articles :-)) than to correct CELKO. I'm aware that my knowledge of SQL Server is far below his, but everyone has something to learn.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 12

1) I LOVE the Celko bashing response! MUCH better than the simple "go away Joe" I was gonna post. 🙂

2) Hate to say this but the rownumber solution doesn't give the correct answer either. The definition of the column is INT, which contrary to popular belief does NOT start at ONE!!! The negative limit of the integer datatype (-2.1xx B) should be the answer for the test case generated, but it is obviously 1 due to rownumber tally table method. You could use a full-blown permanent numbers table to get the right answer.

3) The elephant in the room that no one is mentioning is the CONCURRENCY issue here. It is QUITE difficult (without the OUTPUT clause anyway, which is thankfully available now) to do sequencing like this. Without proper locking controls in an environment with more than one concurrent user you can wind up either a) keeping scalability down to a crawl or b) much worse wind up reusing the same number more than once.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Cadavre SSC-Forever Points: 41690 More actions · Answer 13

TheSQLGuru (11/3/2011)
1) I LOVE the Celko bashing response! MUCH better than the simple "go away Joe" I was gonna post. 🙂

Really wasn't my intention to appear to be bashing anyone 🙁

TheSQLGuru (11/3/2011)
2) Hate to say this but the rownumber solution doesn't give the correct answer either. The definition of the column is INT, which contrary to popular belief does NOT start at ONE!!! The negative limit of the integer datatype (-2.1xx B) should be the answer for the test case generated, but it is obviously 1 due to rownumber tally table method. You could use a full-blown permanent numbers table to get the right answer.

I suppose from a mathematical stand-point, you're correct. Honestly, I assumed that the OP wanted "ID" numbers that started at 1 (actually went back to the OP's first post to double check it wasn't part of the requirements).

TheSQLGuru (11/3/2011)
3) The elephant in the room that no one is mentioning is the CONCURRENCY issue here. It is QUITE difficult (without the OUTPUT clause anyway, which is thankfully available now) to do sequencing like this. Without proper locking controls in an environment with more than one concurrent user you can wind up either a) keeping scalability down to a crawl or b) much worse wind up reusing the same number more than once.

This I have no answer for :w00t: I hadn't actually considered the concurrency issue since I'm unsure what use the OP was intending on getting from the script.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 14

Cadavre (11/3/2011)
TheSQLGuru (11/3/2011)
1) I LOVE the Celko bashing response! MUCH better than the simple "go away Joe" I was gonna post. 🙂
Really wasn't my intention to appear to be bashing anyone 🙁
TheSQLGuru (11/3/2011)
2) Hate to say this but the rownumber solution doesn't give the correct answer either. The definition of the column is INT, which contrary to popular belief does NOT start at ONE!!! The negative limit of the integer datatype (-2.1xx B) should be the answer for the test case generated, but it is obviously 1 due to rownumber tally table method. You could use a full-blown permanent numbers table to get the right answer.
I suppose from a mathematical stand-point, you're correct. Honestly, I assumed that the OP wanted "ID" numbers that started at 1 (actually went back to the OP's first post to double check it wasn't part of the requirements).
TheSQLGuru (11/3/2011)
3) The elephant in the room that no one is mentioning is the CONCURRENCY issue here. It is QUITE difficult (without the OUTPUT clause anyway, which is thankfully available now) to do sequencing like this. Without proper locking controls in an environment with more than one concurrent user you can wind up either a) keeping scalability down to a crawl or b) much worse wind up reusing the same number more than once.
This I have no answer for :w00t: I hadn't actually considered the concurrency issue since I'm unsure what use the OP was intending on getting from the script.

A) I really wish Joe Celko would just stop "helping" SQL Server people. Many of us have been after him to stop for years now. Besides being incredibly abrasive/abusive he puts out REALLY bad advice for SQL Server. Odd thing is that in person he is about the nicest guy you could meet.

B) You and virtually everyone else just starts all their identities from 1, giving up half the potential values on all but tinyints. I have come across numerous clients that have had to jump through (sometimes painful) hoops to address running out of values on an identity column because of this.

C) I have likewise come across numerous clients that have had duplicate value collisions and/or concurrency problems with "getting next value" scenarios.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service