Need to insert a new integer that is pseudo-random and unique to a group (not unique to the entire table)

  • SQL Kiwi (6/4/2012)


    You might like that because it can generate multiple values per query, unlike RAND.

    I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method for generating integers and about 5 times slower than RAND which can actually be used for more than 1 row if you (ironically) have a random seed such as NEWID().

    SET STATISTICS TIME ON;

    DECLARE @Bitbucket INT

    SELECT TOP 2000000

    @Bitbucket = ABS(CONVERT(int, CRYPT_GEN_RANDOM(8)))%1000000+1

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    GO

    DECLARE @Bitbucket INT

    SELECT TOP 2000000

    @Bitbucket = ABS(CHECKSUM(CRYPT_GEN_RANDOM(8)))%1000000+1

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    GO

    DECLARE @Bitbucket INT

    SELECT TOP 2000000

    @Bitbucket = ABS(CHECKSUM(NEWID()))%1000000+1

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    GO

    DECLARE @Bitbucket FLOAT

    SELECT TOP 2000000

    @Bitbucket = RAND(CHECKSUM(NEWID()))*1000000+1

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    GO

    SET STATISTICS TIME OFF;

    Results...

    SQL Server Execution Times:

    CPU time = 6864 ms, elapsed time = 6869 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 6848 ms, elapsed time = 6847 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 858 ms, elapsed time = 851 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 1202 ms, elapsed time = 1209 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/4/2012)


    I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...

    Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.

    As far as raw performance is concerned, the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop. On SQL Server 2012, the LCG I showed generated a million rows via a SQLCLR streaming function in around 5 seconds. The same SQLCLR function using the bulk load API did the same job in 900ms.

    And, as I mentioned before, the distribution of values from Joe's code (using your test rig) is probably not what is required in many cases:

    SELECT *

    FROM #Proof1 AS p

    WHERE SeqNum BETWEEN 16 AND 29

    ORDER BY SeqNum;

    SeqNumKeyVal

    1665536

    1732768

    1816384

    198192

    204096

    212048

    221024

    23512

    24256

    25128

    2664

    2732

    2816

    298

  • SQL Kiwi (6/4/2012)


    the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop

    Sorry... which methods? Are you talking about the solutions for the OP or the code I presented for comparison with the crypto functrtion?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQL Kiwi (6/4/2012)


    Jeff Moden (6/4/2012)


    I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...

    Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.

    Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/4/2012)


    SQL Kiwi (6/4/2012)


    the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop

    Sorry... which methods? Are you talking about the solutions for the OP or the code I presented for comparison with the crypto functrtion?

    The ones that generate a million unique random rows (the comparison generates two million non-unique rows).

  • SQL Kiwi (6/4/2012)


    Jeff Moden (6/4/2012)


    SQL Kiwi (6/4/2012)


    the methods presented so far to generate a million rows run for around 15-20 seconds on my laptop

    Sorry... which methods? Are you talking about the solutions for the OP or the code I presented for comparison with the crypto functrtion?

    The ones that generate a million unique random rows (the comparison generates two million non-unique rows).

    Didn't know if you changed the 2 million row test to a million or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/4/2012)


    SQL Kiwi (6/4/2012)


    Jeff Moden (6/4/2012)


    I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...

    Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.

    Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉

    Just checking you were aware of its existence, Jeff. After all, it was new for SQL Server 2008 😉

  • Jeff Moden (6/4/2012)


    Didn't know if you changed the 2 million row test to a million or not.

    No, I didn't.

  • SQL Kiwi (6/4/2012)


    Jeff Moden (6/4/2012)


    SQL Kiwi (6/4/2012)


    Jeff Moden (6/4/2012)


    I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...

    Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.

    Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉

    Just checking you were aware of its existence, Jeff. After all, it was new for SQL Server 2008 😉

    Despite the smiley face, I don't appreciate that at all, Paul.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQL Kiwi (6/4/2012)


    Jeff Moden (6/4/2012)


    Didn't know if you changed the 2 million row test to a million or not.

    No, I didn't.

    Understood. Just didn't want to assume so I asked.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/4/2012)


    SQL Kiwi (6/4/2012)


    Jeff Moden (6/4/2012)


    SQL Kiwi (6/4/2012)


    Jeff Moden (6/4/2012)


    I had considered it only so briefly because it's about 8 times slower than the traditional NEWID() method...

    Well sure, but (a) I was just making sure you were aware of its existence; and (b) you're not comparing apples with apples - see Cryptographically secure pseudorandom number generator.

    Umm... true enough, Paul. But you did suggest it as a possible replacement for RAND which is what I tested it against. 😉

    Just checking you were aware of its existence, Jeff. After all, it was new for SQL Server 2008 😉

    Despite the smiley face, I don't appreciate that at all, Paul.

    Yep well that makes two of us then. At least I didn't start off with 'umm..." and finish with a 'smiley'. Look, in any case, CRYPT_GEN_RANDOM is not a very well-known function, and you have stated many times here that you mostly use SQL Server 2005. Shrug.

  • Really? I thought you were just taking a swipe at me. Ok.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I know that this is probably a little out of line of me, trying to bring the thread back on topic and all, but this problem was bugging me all morning. Random numbers without replacement that is. I think I've found a way to do what the OP has requested without resorting to conguential random whatevers and need someone else to check me to see if this is a reasonable approach (somewhat complicated though). Consider this:

    CREATE TABLE #t (Category VARCHAR(5), rn INT)

    INSERT INTO #t

    SELECT 'AA', 106 UNION ALL SELECT 'BB', 105

    GO

    ;WITH NewData (Category) AS (

    SELECT 'AA' UNION ALL SELECT 'AA' UNION ALL SELECT 'AA' UNION ALL SELECT 'AA' UNION ALL SELECT 'AA'

    UNION ALL SELECT 'BB' UNION ALL SELECT 'BB' UNION ALL SELECT 'BB' UNION ALL SELECT 'BB' UNION ALL SELECT 'BB'),

    Tally (n) AS (

    SELECT TOP 10 100+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns c1),

    NDplusCounts AS (

    SELECT DISTINCT Category

    ,COUNT(Category) OVER (PARTITION BY Category) AS CountofCat

    FROM NewData),

    Final AS (

    SELECT nd.Category, n, CountofCat, r=ROW_NUMBER() OVER (PARTITION BY nd.Category ORDER BY NEWID())

    FROM NDplusCounts nd

    CROSS APPLY Tally n

    LEFT OUTER JOIN #t t ON nd.Category = t.Category and n.n = t.rn

    WHERE t.rn IS NULL )

    INSERT INTO #t

    SELECT Category, n

    FROM Final

    WHERE r <= CountofCat

    GO 2

    SELECT * FROM #t -- ORDER BY Category, rn

    DROP TABLE #t

    To explain

    1) I set up a table with two "category" entries, where each was assigned a random number (rn) between 101 and 110.

    2) I then use a Tally table to construct my "valid" random numbers, i.e., adjust the start and the range as you need it. In my case, it is 101 to 110.

    3) I now append (from the NewData CTE) some more records for each of the two categories, and I'm going to try to apply a random number to each.

    4) The LEFT JOIN attempts to remove any previously used random numbers.

    5) I execute the batch including the INSERT 2 times, which should add 10 records but only adds 8 because it exhausts the supply of random numbers because I'm selecting previously unselected random numbers each time.

    The result, if you order by Category, rn (commented out) shows that none of the random numbers were duplicated.

    Can anyone tell me if this even applies to the OP's original need? 🙂


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply