October 24, 2011 at 5:30 am
Hi all,
I need to create sql script which will go through all records and find smallest unused number in list of unsorted numbers.
I have table with ID(primary key), UserNumber(int), FullName(varchar) and there are 2000 records in this table. UserNumber is some random number, currently smallest one is 989 and largest is 9877 nothing i sorted. Now i want to create automatic script which will check what is smallest unused number(UserNumber) and use that number when adding new record.
Is this possible?
Thank you.
October 24, 2011 at 5:37 am
Please post table definitions, sample data and desired output. Read this to see the best way to post this to get quick responses.
http://www.sqlservercentral.com/articles/Best+Practices/61537/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 24, 2011 at 5:57 am
Any reason you don't have an identity column in the Users' table and use that instead in that table?
October 24, 2011 at 6:30 am
damirmi (10/24/2011)
Hi all,I need to create sql script which will go through all records and find smallest unused number in list of unsorted numbers.
I have table with ID(primary key), UserNumber(int), FullName(varchar) and there are 2000 records in this table. UserNumber is some random number, currently smallest one is 989 and largest is 9877 nothing i sorted. Now i want to create automatic script which will check what is smallest unused number(UserNumber) and use that number when adding new record.
Is this possible?
Thank you.
Hello and welcome to SSC!
As others have already pointed out, your DDL script has become detached from your post, or perhaps you were unaware of the benefits of providing one.
When you have time, please read this article[/url] about the best way to provide us with working sample data and DDL scripts. This will allow the unpaid volunteers of this site to provide you with working, tested code for your particular problem.
For now, I've had a "best guess" at your issue.
First, lets create a test environment to play with: -
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment
END
--1,000,000 Random rows of data
SELECT TOP 1000000 ABS(CHECKSUM(NewId())) AS ID
INTO #testEnvironment
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3
--DELETE non-unique rows
DELETE FROM #testEnvironment
WHERE ID IN (SELECT ID
FROM #testEnvironment
GROUP BY ID
HAVING COUNT(ID) > 1)
OK, now on to the query.
--Use a real tally table for performance, here's one on the fly for testing.
--If you're unsure about what a tally table is, or what it's for, please read
--this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/
;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),
t2 AS (SELECT 1 N FROM t1 x, t1 y),
t3 AS (SELECT 1 N FROM t2 x, t2 y),
t4 AS (SELECT 1 N FROM t3 x, t3 y),
tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num
FROM t4 x, t4 y)
SELECT MIN(num) AS nextID
FROM #testEnvironment a
RIGHT OUTER JOIN tally b ON a.ID = b.num
WHERE ID IS NULL
October 25, 2011 at 2:35 am
Thank you very much for answer, I will look into into it as soon as possible and get back to you with the result
BR
Damir
October 26, 2011 at 2:29 am
CELKO (10/25/2011)
>> I need to create SQL script which will go through all records [sic: rows are not records] and find smallest unused number in list [sic: tables are not lists] of unsorted [sic: tables have no ordering, they are sets] numbers. <,That was a lot of ignorance in one paragraph
>> I have table with id (primary key), user_nbr number (int), fullname (varchar) and there are 2000 records [sic] in this table. <<
Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Please learn to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect.
Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. If you want to learn how to ask a question on a Newsgroup, look at: http://www.catb.org/~esr/faqs/smart-questions.html
What is the magical, generic “id”? I hope you did not use an IDENTITY or other non-relational construct when you have a proper key in user_nbr. Here is my guess at your table:
CREATE TABLE Users
(user_nbr INTEGER NOT NULL PRIMARY KEY
CHECK (user_id > 0),
user_name VARCHAR (25) NOT NULL);
SELECT MIN (user_id +1) AS unused_user_id_min
FROM Users
WHERE (user_id + 1) NOT IN (SELECT user_id FROM Users);
Ouch, was all of that entirely necessary?
Lets examine your solution and compare to the tally table version that I had already posted.
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment
END
--1,000,000 random rows of data
SELECT TOP 1000000 ABS(CHECKSUM(NewId())) AS ID
INTO #testEnvironment
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3
--DELETE non-unique rows
DELETE FROM #testEnvironment
WHERE ID IN (SELECT ID
FROM #testEnvironment
GROUP BY ID
HAVING COUNT(ID) > 1)
PRINT '========== BASELINE =========='
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT COUNT(*)
FROM #testEnvironment
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '========== CELKO Solution =========='
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT MIN (ID +1) AS unused_user_id_min
FROM #testEnvironment
WHERE (ID + 1) NOT IN (SELECT ID FROM #testEnvironment);
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '========== TALLY TABLE Solution =========='
--Use a real tally table for performance, here's one on the fly for testing.
--If you're unsure about what a tally table is, or what it's for, please read
--this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/
SET STATISTICS IO ON
SET STATISTICS TIME ON
;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),
t2 AS (SELECT 1 N FROM t1 x, t1 y),
t3 AS (SELECT 1 N FROM t2 x, t2 y),
t4 AS (SELECT 1 N FROM t3 x, t3 y),
tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num
FROM t4 x, t4 y)
SELECT MIN(num) AS nextID
FROM #testEnvironment a
RIGHT OUTER JOIN tally b ON a.ID = b.num
WHERE ID IS NULL
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
--CleanUp
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment
END
First, completely random numbers and no index on the table.
========== BASELINE ==========
(1 row(s) affected)
Table '#testEnvironment____________________________________________________________________________________________________000000000080'. Scan count 1, logical reads 1609, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 63 ms, elapsed time = 70 ms.
========== CELKO Solution ==========
(1 row(s) affected)
Table '#testEnvironment____________________________________________________________________________________________________000000000080'. Scan count 18, logical reads 2008750, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 179170 ms, elapsed time = 46838 ms.
========== TALLY TABLE Solution ==========
(1 row(s) affected)
Table '#testEnvironment____________________________________________________________________________________________________000000000080'. Scan count 5, logical reads 1609, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 250 ms, elapsed time = 83 ms.
So the CELKO solution takes 46838 ms compared to the TALLY taking 83 ms.
Also, the CELKO solution doesn't pick up that the lowest number is 1, instead reporting the next lowest number (4657).
OK, perhaps my results are skewed? Lets try a slightly less random set of data, where we already have an ID of "1" so that we can be sure that the CELKO solution gets the correct lowest unused ID.
IF object_id('tempdb..#testEnvironment2') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment2
END
CREATE TABLE #testEnvironment2
(ID INTEGER NOT NULL PRIMARY KEY
CHECK (ID > 0));
;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),
t2 AS (SELECT 1 N FROM t1 x, t1 y),
t3 AS (SELECT 1 N FROM t2 x, t2 y),
t4 AS (SELECT 1 N FROM t3 x, t3 y),
t5 AS (SELECT 1 N FROM t4 x, t4 y),
tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ID
FROM t5 x, t5 y)
INSERT INTO #testEnvironment2
SELECT ID
FROM tally
WHERE ID <= 1000000 AND
(ID%3 = 0 OR ID%5 = 0 OR ID = 1)
PRINT '========== BASELINE =========='
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT COUNT(*)
FROM #testEnvironment2
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '========== CELKO Solution =========='
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT MIN (ID +1) AS unused_user_id_min
FROM #testEnvironment2
WHERE (ID + 1) NOT IN (SELECT ID FROM #testEnvironment2);
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '========== TALLY TABLE Solution =========='
--Use a real tally table for performance, here's one on the fly for testing.
--If you're unsure about what a tally table is, or what it's for, please read
--this article --> http://www.sqlservercentral.com/articles/T-SQL/62867/
SET STATISTICS IO ON
SET STATISTICS TIME ON
;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),
t2 AS (SELECT 1 N FROM t1 x, t1 y),
t3 AS (SELECT 1 N FROM t2 x, t2 y),
t4 AS (SELECT 1 N FROM t3 x, t3 y),
tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num
FROM t4 x, t4 y)
SELECT MIN(num) AS nextID
FROM #testEnvironment2 a
RIGHT OUTER JOIN tally b ON a.ID = b.num
WHERE ID IS NULL
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
--CleanUp
IF object_id('tempdb..#testEnvironment2') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment2
END
This time we have an index, which should hopefully benefit the CELKO solution.
========== BASELINE ==========
(1 row(s) affected)
Table '#testEnvironment2___________________________________________________________________________________________________000000000081'. Scan count 1, logical reads 754, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 32 ms.
========== CELKO Solution ==========
(1 row(s) affected)
Table '#testEnvironment2___________________________________________________________________________________________________00000000008B'. Scan count 2, logical reads 1510, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 205 ms, elapsed time = 203 ms.
========== TALLY TABLE Solution ==========
(1 row(s) affected)
Table '#testEnvironment2___________________________________________________________________________________________________000000000081'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 32 ms.
The plus side is that both the CELKO solution and the Tally solution now report the same lowest ID (2), but once again we have a massive difference in time.
First run
No index - random numbers in ID
Baseline - CPU time = 63 ms, elapsed time = 70 ms.
CELKO - CPU time = 179170 ms, elapsed time = 46838 ms.
Tally - CPU time = 250 ms, elapsed time = 83 ms.
Index
Baseline - CPU time = 31 ms, elapsed time = 32 ms.
CELKO - CPU time = 205 ms, elapsed time = 203 ms.
Tally - CPU time = 31 ms, elapsed time = 32 ms.
Second run
No index - random numbers in ID
Baseline - CPU time = 78 ms, elapsed time = 70 ms.
CELKO - CPU time = 169609 ms, elapsed time = 45142 ms.
Tally - CPU time = 343 ms, elapsed time = 84 ms.
Index
Baseline - CPU time = 31 ms, elapsed time = 34 ms.
CELKO - CPU time = 227 ms, elapsed time = 207 ms.
Tally - CPU time = 31 ms, elapsed time = 29 ms.
Third run
No index - random numbers in ID
Baseline - CPU time = 63 ms, elapsed time = 70 ms.
CELKO - CPU time = 176298 ms, elapsed time = 46788 ms.
Tally - CPU time = 328 ms, elapsed time = 85 ms.
Index
Baseline - CPU time = 47 ms, elapsed time = 33 ms.
CELKO - CPU time = 235 ms, elapsed time = 222 ms.
Tally - CPU time = 31 ms, elapsed time = 28 ms.
October 26, 2011 at 5:39 am
@cadavre We stopped wasting our time proving wrong JC (not short for Jesus-Christ).
However this is a great proof that theory is not real life and that performance is still king (once you have the correct answer obviously).
@jc-2 There's NOTHING in this world forcing every DBA / programmer to know those standards and what you teach. While I agree that standards are critical to any profession, your way to go about it is completely wrong.
Maybe you should try to force every governement to implement them and then force us to go back to school to learn that stuff.
Otherwise I think you're just wasting everyone's time rather than helping.
October 31, 2011 at 9:30 am
damirmi (10/31/2011)
@CadavreThank you very much this helped me a lot, it works perfectly.
P.S.
Next time I ask a question I will try to follow forum rules.:-D
More "guidelines" than "rules". They just ensure that we manage to give you correct and tested code for your specific query 🙂
Glad you have your answer, if you have any questions on how anything works then feel free to ask.
October 31, 2011 at 10:40 pm
Ninja's_RGR'us (10/26/2011)
@Cadavre We stopped wasting our time proving wrong JC (not short for Jesus-Christ).
No we didn't. 🙂 And I rather enjoyed the display of prowess that Cadavre put on.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 1, 2011 at 4:38 am
Jeff Moden (10/31/2011)
Ninja's_RGR'us (10/26/2011)
@Cadavre We stopped wasting our time proving wrong JC (not short for Jesus-Christ).No we didn't. 🙂 And I rather enjoyed the display of prowess that Cadavre put on.
Ya me too.
But it was different that just shut up @jc-2 or the usual answers.
:hehe:
November 2, 2011 at 3:43 am
Thanks guys. The post was more to show future Google searchers the power of the tally table (which I learnt from Jeff's posts and articles :-)) than to correct CELKO. I'm aware that my knowledge of SQL Server is far below his, but everyone has something to learn.
November 3, 2011 at 7:48 am
1) I LOVE the Celko bashing response! MUCH better than the simple "go away Joe" I was gonna post. 🙂
2) Hate to say this but the rownumber solution doesn't give the correct answer either. The definition of the column is INT, which contrary to popular belief does NOT start at ONE!!! The negative limit of the integer datatype (-2.1xx B) should be the answer for the test case generated, but it is obviously 1 due to rownumber tally table method. You could use a full-blown permanent numbers table to get the right answer.
3) The elephant in the room that no one is mentioning is the CONCURRENCY issue here. It is QUITE difficult (without the OUTPUT clause anyway, which is thankfully available now) to do sequencing like this. Without proper locking controls in an environment with more than one concurrent user you can wind up either a) keeping scalability down to a crawl or b) much worse wind up reusing the same number more than once.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 3, 2011 at 8:06 am
TheSQLGuru (11/3/2011)
1) I LOVE the Celko bashing response! MUCH better than the simple "go away Joe" I was gonna post. 🙂
Really wasn't my intention to appear to be bashing anyone 🙁
TheSQLGuru (11/3/2011)
2) Hate to say this but the rownumber solution doesn't give the correct answer either. The definition of the column is INT, which contrary to popular belief does NOT start at ONE!!! The negative limit of the integer datatype (-2.1xx B) should be the answer for the test case generated, but it is obviously 1 due to rownumber tally table method. You could use a full-blown permanent numbers table to get the right answer.
I suppose from a mathematical stand-point, you're correct. Honestly, I assumed that the OP wanted "ID" numbers that started at 1 (actually went back to the OP's first post to double check it wasn't part of the requirements).
TheSQLGuru (11/3/2011)
3) The elephant in the room that no one is mentioning is the CONCURRENCY issue here. It is QUITE difficult (without the OUTPUT clause anyway, which is thankfully available now) to do sequencing like this. Without proper locking controls in an environment with more than one concurrent user you can wind up either a) keeping scalability down to a crawl or b) much worse wind up reusing the same number more than once.
This I have no answer for :w00t: I hadn't actually considered the concurrency issue since I'm unsure what use the OP was intending on getting from the script.
November 3, 2011 at 10:39 am
Cadavre (11/3/2011)
TheSQLGuru (11/3/2011)
1) I LOVE the Celko bashing response! MUCH better than the simple "go away Joe" I was gonna post. 🙂Really wasn't my intention to appear to be bashing anyone 🙁
TheSQLGuru (11/3/2011)
2) Hate to say this but the rownumber solution doesn't give the correct answer either. The definition of the column is INT, which contrary to popular belief does NOT start at ONE!!! The negative limit of the integer datatype (-2.1xx B) should be the answer for the test case generated, but it is obviously 1 due to rownumber tally table method. You could use a full-blown permanent numbers table to get the right answer.I suppose from a mathematical stand-point, you're correct. Honestly, I assumed that the OP wanted "ID" numbers that started at 1 (actually went back to the OP's first post to double check it wasn't part of the requirements).
TheSQLGuru (11/3/2011)
3) The elephant in the room that no one is mentioning is the CONCURRENCY issue here. It is QUITE difficult (without the OUTPUT clause anyway, which is thankfully available now) to do sequencing like this. Without proper locking controls in an environment with more than one concurrent user you can wind up either a) keeping scalability down to a crawl or b) much worse wind up reusing the same number more than once.This I have no answer for :w00t: I hadn't actually considered the concurrency issue since I'm unsure what use the OP was intending on getting from the script.
A) I really wish Joe Celko would just stop "helping" SQL Server people. Many of us have been after him to stop for years now. Besides being incredibly abrasive/abusive he puts out REALLY bad advice for SQL Server. Odd thing is that in person he is about the nicest guy you could meet.
B) You and virtually everyone else just starts all their identities from 1, giving up half the potential values on all but tinyints. I have come across numerous clients that have had to jump through (sometimes painful) hoops to address running out of values on an identity column because of this.
C) I have likewise come across numerous clients that have had duplicate value collisions and/or concurrency problems with "getting next value" scenarios.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 1 through 15 (of 38 total)
You must be logged in to reply to this topic. Login to reply