Need to insert a new integer that is pseudo-random and unique to a group (not unique to the entire table)

Question

Need to insert a new integer that is pseudo-random and unique to a group (not unique to the entire table)

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 1

SQL Kiwi (6/4/2012)
You might like that because it can generate multiple values per query, unlike RAND.

I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method for generating integers and about 5 times slower than RAND which can actually be used for more than 1 row if you (ironically) have a random seed such as NEWID().

SET STATISTICS TIME ON;

DECLARE @Bitbucket INT

SELECT TOP 2000000

@Bitbucket = ABS(CONVERT(int, CRYPT_GEN_RANDOM(8)))%1000000+1

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

GO

DECLARE @Bitbucket INT

SELECT TOP 2000000

@Bitbucket = ABS(CHECKSUM(CRYPT_GEN_RANDOM(8)))%1000000+1

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

GO

DECLARE @Bitbucket INT

SELECT TOP 2000000

@Bitbucket = ABS(CHECKSUM(NEWID()))%1000000+1

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

GO

DECLARE @Bitbucket FLOAT

SELECT TOP 2000000

@Bitbucket = RAND(CHECKSUM(NEWID()))*1000000+1

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

GO

SET STATISTICS TIME OFF;

Results...

SQL Server Execution Times:

CPU time = 6864 ms, elapsed time = 6869 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 6848 ms, elapsed time = 6847 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 858 ms, elapsed time = 851 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 1202 ms, elapsed time = 1209 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 2

Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...

Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.

As far as raw performance is concerned, the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop. On SQL Server 2012, the LCG I showed generated a million rows via a SQLCLR streaming function in around 5 seconds. The same SQLCLR function using the bulk load API did the same job in 900ms.

And, as I mentioned before, the distribution of values from Joe's code (using your test rig) is probably not what is required in many cases:

SELECT *

FROM #Proof1 AS p

WHERE SeqNum BETWEEN 16 AND 29

ORDER BY SeqNum;

SeqNumKeyVal

1665536

1732768

1816384

198192

204096

212048

221024

23512

24256

25128

2664

2732

2816

298

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 3

SQL Kiwi (6/4/2012)
the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop

Sorry... which methods? Are you talking about the solutions for the OP or the code I presented for comparison with the crypto functrtion?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 4

SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...
Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.

Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 5

Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop
Sorry... which methods? Are you talking about the solutions for the OP or the code I presented for comparison with the crypto functrtion?

The ones that generate a million unique random rows (the comparison generates two million non-unique rows).

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 6

SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop
Sorry... which methods? Are you talking about the solutions for the OP or the code I presented for comparison with the crypto functrtion?
The ones that generate a million unique random rows (the comparison generates two million non-unique rows).

Didn't know if you changed the 2 million row test to a million or not.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 7

Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...
Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.
Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉

Just checking you were aware of its existence, Jeff. After all, it was new for SQL Server 2008 😉

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 8

Jeff Moden (6/4/2012)
Didn't know if you changed the 2 million row test to a million or not.

No, I didn't.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 9

SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...
Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.
Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉
Just checking you were aware of its existence, Jeff. After all, it was new for SQL Server 2008 😉

Despite the smiley face, I don't appreciate that at all, Paul.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 10

SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
Didn't know if you changed the 2 million row test to a million or not.
No, I didn't.

Understood. Just didn't want to assume so I asked.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 11

Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...
Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.
Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉
Just checking you were aware of its existence, Jeff. After all, it was new for SQL Server 2008 😉
Despite the smiley face, I don't appreciate that at all, Paul.

Yep well that makes two of us then. At least I didn't start off with 'umm..." and finish with a 'smiley'. Look, in any case, CRYPT_GEN_RANDOM is not a very well-known function, and you have stated many times here that you mostly use SQL Server 2005. Shrug.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 12

Really? I thought you were just taking a swipe at me. Ok.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 13

I know that this is probably a little out of line of me, trying to bring the thread back on topic and all, but this problem was bugging me all morning. Random numbers without replacement that is. I think I've found a way to do what the OP has requested without resorting to conguential random whatevers and need someone else to check me to see if this is a reasonable approach (somewhat complicated though). Consider this:

CREATE TABLE #t (Category VARCHAR(5), rn INT)

INSERT INTO #t

SELECT 'AA', 106 UNION ALL SELECT 'BB', 105

GO

;WITH NewData (Category) AS (

SELECT 'AA' UNION ALL SELECT 'AA' UNION ALL SELECT 'AA' UNION ALL SELECT 'AA' UNION ALL SELECT 'AA'

UNION ALL SELECT 'BB' UNION ALL SELECT 'BB' UNION ALL SELECT 'BB' UNION ALL SELECT 'BB' UNION ALL SELECT 'BB'),

Tally (n) AS (

SELECT TOP 10 100+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns c1),

NDplusCounts AS (

SELECT DISTINCT Category

,COUNT(Category) OVER (PARTITION BY Category) AS CountofCat

FROM NewData),

Final AS (

SELECT nd.Category, n, CountofCat, r=ROW_NUMBER() OVER (PARTITION BY nd.Category ORDER BY NEWID())

FROM NDplusCounts nd

CROSS APPLY Tally n

LEFT OUTER JOIN #t t ON nd.Category = t.Category and n.n = t.rn

WHERE t.rn IS NULL )

INSERT INTO #t

SELECT Category, n

FROM Final

WHERE r <= CountofCat

GO 2

SELECT * FROM #t -- ORDER BY Category, rn

DROP TABLE #t

To explain

1) I set up a table with two "category" entries, where each was assigned a random number (rn) between 101 and 110.

2) I then use a Tally table to construct my "valid" random numbers, i.e., adjust the start and the range as you need it. In my case, it is 101 to 110.

3) I now append (from the NewData CTE) some more records for each of the two categories, and I'm going to try to apply a random number to each.

4) The LEFT JOIN attempts to remove any previously used random numbers.

5) I execute the batch including the INSERT 2 times, which should add 10 records but only adds 8 because it exhausts the supply of random numbers because I'm selecting previously unselected random numbers each time.

The result, if you order by Category, rn (commented out) shows that none of the random numbers were duplicated.

Can anyone tell me if this even applies to the OP's original need? 🙂

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St