July 13, 2010 at 6:27 am
Or even:
WHILE EXISTS
(
SELECT *
FROM #EventPersonInfo
WHERE app_id IS NULL
)
BEGIN
-- Get a new random number
SET @temp = ABS(CHECKSUM(NEWID()));
-- Do the update
UPDATE TOP (1)
#EventPersonInfo
SET app_id = @temp
WHERE app_id IS NULL
AND NOT EXISTS
(
SELECT *
FROM #EventPersonInfo WITH (UPDLOCK, HOLDLOCK)
WHERE app_id = @temp
);
END;
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 13, 2010 at 6:40 am
Beyond the call of duty Paul 😉
July 13, 2010 at 8:54 am
Nice, Paul. I assume those are provided for testing, or other solutions and not for this problem. It's overkill and inefficient for what the OP is trying to accomplish here.
July 13, 2010 at 9:05 am
You didn't get. It is just a joke!
You should: Ha! Ha! Ha!
Am I right?
If not, I have some even better solution. It will use multiple temp tables, few CTE's, Jeff favoured "quirky update" techique and, of course couple of loops and couple of cursors (possible three)!
Do you want to see it?
😀
July 13, 2010 at 9:10 am
Steve Jones - Editor (7/13/2010)
Nice, Paul. I assume those are provided for testing, or other solutions and not for this problem.
No.
It's overkill and inefficient for what the OP is trying to accomplish here.
Thanks for your opinion.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 13, 2010 at 9:21 am
Eugene Elutin (7/13/2010)
You didn't get. It is just a joke! You should: Ha! Ha! Ha! Am I right?
Not entirely, but I can see why you would say that.
Nobody here (except Tasnim) has enough information to say how this problem could best be solved. It may be, for example, that the design cannot be changed, and a random number must indeed be generated for rows that currently contain NULL.
It may also be that the table only contains a few hundred rows and is read-only except for an overnight (off-line) batch process. We just don't know.
Rather than just join in with the predictable "your design is wrong, you need to write a set-based solution" response, I decided to show a couple of examples of row-by-row processing code, incorporating some of the features that would be required to make it robust and suitable for concurrent processing.
If I get feedback from Tasnim, I might go on to explain how my code shows that row-by-row processing can get quite complex, and will usually not perform adequately. The thread might then lead on to be an interesting and positive learning experience. It could happen.
That said, if the requirement is to add 'n' new unique random positive integers to replace an unknown number of NULLs, the set-based solutions are non-trivial too, and may not perform any better.
Notwithstanding Steve's attitude, I intend to pick back up on the theme tomorrow.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 13, 2010 at 9:31 am
Paul White NZ (7/13/2010)
...the set-based solutions are non-trivial too, and may not perform any better.
...
You are absoultely right. Actually, strictly speaking, it is impossible to guarantee absolute uniqueness in a set-based solution.
You will need still check for duplicates. Basically, any such solution will contain some sort of loop. In one of the recent posts I have explained some OP how to generate set of random numbers:
Some thing along the lines:
1. GENERATE SET OF RANDOM NUMBERS
2. CHECK THE GENERATED SET FOR DUPLICATES
3. IF NOT FOUND THEN EXIT
ELSE:
REGENERATE RANDOM NUMBERS FOR DUPLICTES ONLY
AND GOTO STEP 2
Post #945594 in http://www.sqlservercentral.com/Forums/Topic945541-338-1.aspx
Actually, RAND([base]) can be replaced with manipulation on NEWID() as explained in the futher posts on the above topic.
July 13, 2010 at 9:44 am
Thanks, Eugene. I'm glad someone else appreciates the subtleties.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 13, 2010 at 7:20 pm
Hi Everyone!
First of all let me take the time to thank all of you guys, Paul, Steve, Old hand for helping with my case here. Really appreciate it guys 🙂
To Paul: A few things for my table. I am a database programmer (a new one). My table will have some 90000+ rows and I need to maintain unique app_IDs for each of them. The table rows will increase gradually, so every time new rows are added, a unique app_ID should be generated. As stated previously, Identity column will solve the problem, but its not random and therefore the idea is not very robust.
I am C / C++ programmer and I have easily achieved such objectives with arrays (as tables) and rand() functions. Due to the if/else and while loops, in C and because I am familiar with it I could make robust code in C to get this job done. I have described the algorithm before and you all understood the right thing, just that I needed to get it done with unique random number instead of identity column style.
July 13, 2010 at 7:45 pm
tasnim.siddiqi (7/13/2010)
...To Paul: A few things for my table. I am a database programmer (a new one). My table will have some 90000+ rows and I need to maintain unique app_IDs for each of them. The table rows will increase gradually, so every time new rows are added, a unique app_ID should be generated. As stated previously, Identity column will solve the problem, but its not random and therefore the idea is not very robust.
I am C / C++ programmer and I have easily achieved such objectives with arrays (as tables) and rand() functions. Due to the if/else and while loops, in C and because I am familiar with it I could make robust code in C to get this job done. I have described the algorithm before and you all understood the right thing, just that I needed to get it done with unique random number instead of identity column style.
1. Thanks Lord, you are not former assembly programmer (as me, for example), otherwise you would want to implement something even more robust than c/c++. 😀
2. If you want to be a database programmer (specificaly T-SQL), you should learn how the thing should be properly done in T-SQL. Query language is not procedural - it based on absolutely different concept. But, even if it would be, just think a bit what you would end up with, if VB6.0 programmer would start coding in c++ using the practice of coding in VB6.0 and saying that he knows that they work in vb therefore he is going to use them in c++...
3. Could you please enlight us a bit about why and how the identity column in SQLServer is not robust enough? What particular aspect of using it worries you?
4. From what you described so far about your table, defining App_id as identity is a right design solution in SQLServer. There is nothing in your details, except the fact that you are c/c++ programmer and you like using random numbers in your c/c++ code, justifies using random number for app_id. You wil create headache for yourself and future developers.
This forum provides one of the best opportunities to learn SQL Server and helps to became a professional database programmer. Of course it will only work if you want it. If your choice is to use c/c++ practices while developing in T-SQL, no one here is able to stop you.
Just be aware, that any normal T-SQL programmer who will see your code will call it (including you, I afraid) by not very nice names...
July 14, 2010 at 12:13 am
tasnim.siddiqi (7/13/2010)
As stated previously, Identity column will solve the problem, but its not random and therefore the idea is not very robust.
Can you explain that a little more? I am unsure *why* you feel random numbers would be more 'robust'. Robust in what scenario? Just interested.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 14, 2010 at 12:22 am
Robust in the sense that it will be more reliable, I mean with just ascending ordered numbers the app_IDs will have a biased numbering form. I mean it is like skewness, if you have done statistics before. It is better to avoid such things.
I hope I could clarify you.
July 14, 2010 at 1:16 am
tasnim.siddiqi (7/14/2010)
Robust in the sense that it will be more reliable, I mean with just ascending ordered numbers the app_IDs will have a biased numbering form. I mean it is like skewness, if you have done statistics before. It is better to avoid such things.
Ok, I understand the general concern - but why is that skewness important? Is there something about the application or design that *requires* a random distribution? The more you can tell us about it, the better really.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 14, 2010 at 3:06 am
In the meantime, in case you are just looking for something simple to get you started:
DECLARE @temp INTEGER;
BEGIN TRANSACTION;
-- Find a new random number that does not already exist in the table
WHILE (@temp IS NULL OR EXISTS (SELECT * FROM Event_Person_Info WITH (UPDLOCK, HOLDLOCK) WHERE App_ID = @temp))
SET @temp = ABS(CHECKSUM(NEWID()));
-- Update the first NULL App_ID
UPDATE TOP (1)
Event_Person_Info
SET App_ID = @temp
WHERE App_ID IS NULL;
COMMIT TRANSACTION;
That's based on your original code, and just updates the first NULL App_ID found to a random positive integer, which does not already exist in the table. The code I presented previously shows various ways to extend this idea.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 14, 2010 at 9:00 am
tasnim.siddiqi (7/14/2010)
Robust in the sense that it will be more reliable, I mean with just ascending ordered numbers the app_IDs will have a biased numbering form. I mean it is like skewness, if you have done statistics before. It is better to avoid such things.I hope I could clarify you.
There's nothing about an increasing key that makes it more or less reliable or robust. You are fundamentally missing the point. Even in a hash, if you can hash to a known unique item, it is fine. There's no need for a distribution.
In terms of statistics, I'm not sure why skewing matters. I don't think it's "better" to avoid skewing in general. There are places it is desirable, and places it's not.
The reason an increasing key helps is performance. You can read here, about halfway down, where it talks about performance implications. When you insert into the middle of a SQL table, you potentially cause a page split. Think about inserting something in the middle of a large file on disk. It takes time and resources, way more than using the end of the file.
Viewing 15 posts - 16 through 30 (of 66 total)
You must be logged in to reply to this topic. Login to reply