June 4, 2012 at 7:56 pm
SQL Kiwi (6/4/2012)
You might like that because it can generate multiple values per query, unlike RAND.
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method for generating integers and about 5 times slower than RAND which can actually be used for more than 1 row if you (ironically) have a random seed such as NEWID().
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT TOP 2000000
@Bitbucket = ABS(CONVERT(int, CRYPT_GEN_RANDOM(8)))%1000000+1
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
GO
DECLARE @Bitbucket INT
SELECT TOP 2000000
@Bitbucket = ABS(CHECKSUM(CRYPT_GEN_RANDOM(8)))%1000000+1
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
GO
DECLARE @Bitbucket INT
SELECT TOP 2000000
@Bitbucket = ABS(CHECKSUM(NEWID()))%1000000+1
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
GO
DECLARE @Bitbucket FLOAT
SELECT TOP 2000000
@Bitbucket = RAND(CHECKSUM(NEWID()))*1000000+1
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
GO
SET STATISTICS TIME OFF;
Results...
SQL Server Execution Times:
CPU time = 6864 ms, elapsed time = 6869 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 6848 ms, elapsed time = 6847 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 858 ms, elapsed time = 851 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 1202 ms, elapsed time = 1209 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2012 at 8:32 pm
Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...
Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.
As far as raw performance is concerned, the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop. On SQL Server 2012, the LCG I showed generated a million rows via a SQLCLR streaming function in around 5 seconds. The same SQLCLR function using the bulk load API did the same job in 900ms.
And, as I mentioned before, the distribution of values from Joe's code (using your test rig) is probably not what is required in many cases:
SELECT *
FROM #Proof1 AS p
WHERE SeqNum BETWEEN 16 AND 29
ORDER BY SeqNum;
SeqNumKeyVal
1665536
1732768
1816384
198192
204096
212048
221024
23512
24256
25128
2664
2732
2816
298
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 4, 2012 at 8:59 pm
SQL Kiwi (6/4/2012)
the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop
Sorry... which methods? Are you talking about the solutions for the OP or the code I presented for comparison with the crypto functrtion?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2012 at 9:05 pm
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.
Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2012 at 9:07 pm
Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
the methods presented so far to generate a million rows run for around 15-20 seconds on my laptopSorry... which methods? Are you talking about the solutions for the OP or the code I presented for comparison with the crypto functrtion?
The ones that generate a million unique random rows (the comparison generates two million non-unique rows).
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 4, 2012 at 9:09 pm
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
the methods presented so far to generate a million rows run for around 15-20 seconds on my laptopSorry... which methods? Are you talking about the solutions for the OP or the code I presented for comparison with the crypto functrtion?
The ones that generate a million unique random rows (the comparison generates two million non-unique rows).
Didn't know if you changed the 2 million row test to a million or not.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2012 at 9:19 pm
Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.
Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉
Just checking you were aware of its existence, Jeff. After all, it was new for SQL Server 2008 😉
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 4, 2012 at 9:20 pm
Jeff Moden (6/4/2012)
Didn't know if you changed the 2 million row test to a million or not.
No, I didn't.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 4, 2012 at 9:58 pm
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.
Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉
Just checking you were aware of its existence, Jeff. After all, it was new for SQL Server 2008 😉
Despite the smiley face, I don't appreciate that at all, Paul.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2012 at 9:59 pm
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
Didn't know if you changed the 2 million row test to a million or not.No, I didn't.
Understood. Just didn't want to assume so I asked.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2012 at 11:47 pm
Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
SQL Kiwi (6/4/2012)
Jeff Moden (6/4/2012)
I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.
Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉
Just checking you were aware of its existence, Jeff. After all, it was new for SQL Server 2008 😉
Despite the smiley face, I don't appreciate that at all, Paul.
Yep well that makes two of us then. At least I didn't start off with 'umm..." and finish with a 'smiley'. Look, in any case, CRYPT_GEN_RANDOM is not a very well-known function, and you have stated many times here that you mostly use SQL Server 2005. Shrug.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 5, 2012 at 6:02 am
Really? I thought you were just taking a swipe at me. Ok.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 6, 2012 at 12:30 am
I know that this is probably a little out of line of me, trying to bring the thread back on topic and all, but this problem was bugging me all morning. Random numbers without replacement that is. I think I've found a way to do what the OP has requested without resorting to conguential random whatevers and need someone else to check me to see if this is a reasonable approach (somewhat complicated though). Consider this:
CREATE TABLE #t (Category VARCHAR(5), rn INT)
INSERT INTO #t
SELECT 'AA', 106 UNION ALL SELECT 'BB', 105
GO
;WITH NewData (Category) AS (
SELECT 'AA' UNION ALL SELECT 'AA' UNION ALL SELECT 'AA' UNION ALL SELECT 'AA' UNION ALL SELECT 'AA'
UNION ALL SELECT 'BB' UNION ALL SELECT 'BB' UNION ALL SELECT 'BB' UNION ALL SELECT 'BB' UNION ALL SELECT 'BB'),
Tally (n) AS (
SELECT TOP 10 100+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns c1),
NDplusCounts AS (
SELECT DISTINCT Category
,COUNT(Category) OVER (PARTITION BY Category) AS CountofCat
FROM NewData),
Final AS (
SELECT nd.Category, n, CountofCat, r=ROW_NUMBER() OVER (PARTITION BY nd.Category ORDER BY NEWID())
FROM NDplusCounts nd
CROSS APPLY Tally n
LEFT OUTER JOIN #t t ON nd.Category = t.Category and n.n = t.rn
WHERE t.rn IS NULL )
INSERT INTO #t
SELECT Category, n
FROM Final
WHERE r <= CountofCat
GO 2
SELECT * FROM #t -- ORDER BY Category, rn
DROP TABLE #t
To explain
1) I set up a table with two "category" entries, where each was assigned a random number (rn) between 101 and 110.
2) I then use a Tally table to construct my "valid" random numbers, i.e., adjust the start and the range as you need it. In my case, it is 101 to 110.
3) I now append (from the NewData CTE) some more records for each of the two categories, and I'm going to try to apply a random number to each.
4) The LEFT JOIN attempts to remove any previously used random numbers.
5) I execute the batch including the INSERT 2 times, which should add 10 records but only adds 8 because it exhausts the supply of random numbers because I'm selecting previously unselected random numbers each time.
The result, if you order by Category, rn (commented out) shows that none of the random numbers were duplicated.
Can anyone tell me if this even applies to the OP's original need? 🙂
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply