Code generation

Question

Post reply

Code generation

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

scziege SSC Veteran Points: 247 More actions · Answer 1

scziege

SSC Veteran

Points: 247

May 11, 2010 at 2:35 pm

#1165835

Absolutly Great job

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 2

Lowell (5/11/2010)
Jeff Moden (5/11/2010)
1.6 million random 8 character codes that no one can guess and they're guaranteed to be unique within the set... takes about 25 seconds on my 8 year old desktop...
you must have a monster "old" desktop; I've got a decent 2.8Gig developer machine and it took 5 times that long; 01:29 to run your code,01:28 on second run.
25 seconds....your old developer machine plugged into a SAN? or did you use DBCC TIMEWARP with this?

While it's true that I keep the flux capacitor on the hardware version of DBCC TIMEWARP loaded up with fresh hampster poo for the ultimate in "flash warping" when bit 77 is set, I forgot to mention that the SELECT * FROM #MyHead was not included in that time period.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 3

scziege (5/11/2010)
Absolutly Great job

I love a satisfied customer. Thanks for the feedback. Most folks on this thread had the right idea to begin with...

Now... a favor from you, please. Could you tell us why and what you need these for and why a simple IDENTITY column wouldn't have been good enough? It would really be nice to know the details behind the business decisions for things like this. Thanks for your time.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 4

Jeff, that is an interesting take on Random..... :w00t:

The first ten codes in the sample I tried...

10000180

10000F52

10001875

10001D17

1000221F

10003478

100035E7

10003763

10006415

100081B2

Is there some bias towards groups of characters in that solution ....

Or am I seeing it wrong?

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 5

mister.magoo (5/11/2010)
Jeff, that is an interesting take on Random..... :w00t:
The first ten codes in the sample I tried...
10000180
10000F52
10001875
10001D17
1000221F
10003478
100035E7
10003763
10006415
100081B2
Is there some bias towards groups of characters in that solution ....
Or am I seeing it wrong?

Heh... good question, actually. The "bias" isn't really "bias" It's because we have 1.6 million random numbers starting at 10000000 because I didn't want to have any numbers that began with leading zeros. The other "bias" is only because the PARTITION clause sorts the final output. If we had sorted the final output using the whole of NEWID() for 1.6 million rows, the first 8 characters would form a similar "pattern" of ascent.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 6

Magoo... here's what I'm talking about... if I take out the partitioning that supports the dupe check, then you probably get more like what you're expecting...

WITH

cteFirstGen AS

( --=== Gen enough 8 character random codes to cover possible dupes for 1.6 million codes

SELECT TOP 2000000 LEFT(NEWID(),8) AS RandomCode

FROM Master.sys.All_Columns ac1

CROSS JOIN Master.sys.All_Columns ac2

)

,

cteNumberDupes AS

( --=== Number the codes so we can select unique random codes

SELECT --ROW_NUMBER() OVER (PARTITION BY RandomCode ORDER BY (SELECT NULL)) AS Occurance,

RandomCode

FROM cteFirstGen

WHERE RandomCode > '10000000'

) --=== Select 1.6 million unique random codes only

SELECT TOP 1600000 RandomCode

INTO #MyHead

FROM cteNumberDupes

-- WHERE Occurance = 1

SELECT * FROM #MyHead

The problem is that there's no guarantee that there's no dupe. You can use a DISTINCT if you don't mind waiting the extra time... it will double the amount of time it takes to execute this way....

WITH

cteFirstGen AS

( --=== Gen enough 8 character random codes to cover possible dupes for 1.6 million codes

SELECT TOP 2000000 LEFT(NEWID(),8) AS RandomCode

FROM Master.sys.All_Columns ac1

CROSS JOIN Master.sys.All_Columns ac2

)

,

cteNumberDupes AS

( --=== Number the codes so we can select unique random codes

SELECT --ROW_NUMBER() OVER (PARTITION BY RandomCode ORDER BY (SELECT NULL)) AS Occurance,

RandomCode

FROM cteFirstGen

WHERE RandomCode > '10000000'

) --=== Select 1.6 million unique random codes only

SELECT DISTINCT TOP 1600000 RandomCode

INTO #MyHead

FROM cteNumberDupes

-- WHERE Occurance = 1

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

scziege SSC Veteran Points: 247 More actions · Answer 7

Hey Jeff,

our customer want to make a promotion on a product package with this codes.

The customer can buy the product and insert the code onto a web page, after that

he will win or loose.

But do you think the codes are safe or is it possible to guess the code, on the web page

we use captchas.

Thanks

Thorsten

scziege SSC Veteran Points: 247 More actions · Answer 8

Hey,

another stupid question. Is it possible to check the codes against a table in which I store old codes ?

The main point is the performance we have to look.

We have a table with 100 Million rows and should insert 5 Million rows which not exists in the database.

Thanks in adavance

Thorsten

mister.magoo SSC-Forever Points: 47174 More actions · Answer 9

My 2p worth:

Jeff has shown you a framework for generating numbers.

I personally think you could consider generating more numbers and disallowing any with any repeating digits.

You should probably consider going case sensitive e.g. Ax9G3bgt4 to make it harder to guess.

You could (if you wanted) generate all possible combinations of 8 alphanumeric characters (about 2.4 trillion I think unless you want case sensitivity, then it's about 200 trillion ) which would take a couple of weeks to do on a desktop pc at least. (Previous experience shows you can generate a trillion rows in about 6 days) and then pick 1.6 million values from there, which gives you pretty good odds of not being guessed.

Or you could generate batches of random strings and add them to your pool, seeded to deliberately scatter the values so that you don't get silly ones like AAAAAAAA.

You have to define more clearly the parameters of the request to get a really solid answer.

Having just seen your other question, I say again - tell us the whole requirement.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

spaghettidba SSC Guru Points: 105732 More actions · Answer 10

Great explanation, Jeff! That's one thing I wanted to ask.

It's strange to see that partition by seems to be faster than distinct. I would have expected to be equally fast (or slow).

Just out of curiosity, how did you come up with this? I suppose this is something you already had in your toolbox.

-- Gianluca Sartori

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 11

Gianluca Sartori (5/12/2010)
Great explanation, Jeff! That's one thing I wanted to ask.
It's strange to see that partition by seems to be faster than distinct. I would have expected to be equally fast (or slow).
Just out of curiosity, how did you come up with this? I suppose this is something you already had in your toolbox.

I'm not 100% sure of the "internals" but it seemed logical to me that the partition would do a single sort whereas DISTINCT has to check each value against a range of other values and then, if within the range, check for the actual value. I'm no expert on the internals so rather than guess, I tried both. The partitioning won. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 12

scziege (5/12/2010)
Hey Jeff,
our customer want to make a promotion on a product package with this codes.
The customer can buy the product and insert the code onto a web page, after that
he will win or loose.
But do you think the codes are safe or is it possible to guess the code, on the web page
we use captchas.
Thanks
Thorsten

With random numbers with such a relatively small domain, someone could get lucky with a guess. If you were to use a full NEWID(), instead, it would be like someone guessing the exact linear mile of travel you were in across a diameter of something like 17 quadrillion Milkyway galaxies.

It would be a good test, as well. If you can't successfully and correctly enter a 36 character code, you don't deserve to win. 😛

Maggo is correct... if you're going to use these codes for such a contest, codes with repeating adjacent characters should be eliminated and to prevent "mistrakes", things like lower case "L" and "1" and upper case "0" and zero should be eliminated from the mix. That's actually a lot easier to do than what most people would think although it does take a little extra time to generate. I'll play with this tonight because it IS an interesting problem. I've got a "random password generator" bit of code that I can steal a bit of code from.

Someone will likely beat me to it, though... I've got to get to work. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 13

scziege (5/12/2010)
Hey Jeff,
our customer want to make a promotion on a product package with this codes.
The customer can buy the product and insert the code onto a web page, after that
he will win or loose.
But do you think the codes are safe or is it possible to guess the code, on the web page
we use captchas.
Thanks
Thorsten

Heh... I guess there were no takers while I was at work. Ok, no problem. We've got this covered.

Thorsten, I don't know what your company makes or sells but I'm thinking they owe me a bit for this one. 😉 Ask them if they can send me a couple of boxes of whatever they make. I'm sure that I can find someone or a homeless shelter in the area that needs whatever they sell. Or, if they want, they can make a donation to their favorite charity in my name, please.

The whole thing takes 00:01:40 to run on my 8 year old single P4 CPU desktop and does a lot of things mentioned on this thread. Because random letters frequently spell out swear words, I added a "fix" for that little problem, as well. I also have it pick the "winning" code and store it in a table as well as storing the 1.6 million random codes in another table. If you need more than 1 winning code, then change the TOP 1 to TOP n where "n" is the number of winning codes you want the machine to pick. The winning code table also has a "Place" number in case of such an eventuality. The random code table also has an IDENTITY column so whoever prints up inserts can keep track of what they printed.

We could check against the previous random code list you have but I'm going to hold out for a couple of more boxes of whatever your company sells for a homeless shelter. 😉

Here's the code... as usual, the details are documented in the code itself...

--===== Conditionally drop the permanent random code and winner tables

-- Uncomment if you want to test for reruns

IF OBJECT_ID('dbo.jbmRandomCode','U') IS NOT NULL

DROP TABLE dbo.jbmRandomCode

;

IF OBJECT_ID('dbo.jbmWinnerCode','U') IS NOT NULL

DROP TABLE dbo.jbmWinnerCode

;

--===== Generate 1.6 Million random codes where no two adjacent characters

-- are alike. We also remove all vowels to keep from accidently

-- spelling swear words and we remove things that could be mistaken

-- for each other such as:

-- Capital "O" and "0" (zero)

-- Lower Case "L" and "1" (one)

-- Lower Case "o" removed just to avoid all confusion

-- Jeff Moden

WITH

cteBytes AS

( --=== Generate 2.2 Million random 8 byte numbers.

-- We need this many so that when we remove numbers that

-- have identical adjacent characters and any dupes,

-- we still have enough to gen 1.6 Million random codes.

-- This is done at the byte level because NEWID is expensive to run.

-- If we ran 8 individual NEWID()'s for each code, this code would

-- take a lot longer.

SELECT TOP 2200000 CAST(NEWID() AS BINARY(8)) AS Bytes

FROM Master.sys.All_Columns ac1

CROSS JOIN Master.sys.All_Columns ac2

)

,

cteCharacters AS

( --=== This is just the list of characters that we have to chose from.

-- Vowels and certain other characters have been removed to avoid confusion.

-- Note that both strings must be absolutely identical here.

SELECT LEN('BCDFGHJKLMNPQRSTVWXYZbcdfghjkmnpqrstvwxyz23456789') AS CharLen,

'BCDFGHJKLMNPQRSTVWXYZbcdfghjkmnpqrstvwxyz23456789' AS Characters

)

,

cteChars AS

( --=== This takes each byte from the random 8 byte binary number and converts each into

-- a human readable character by using the random byte to select from the allowed

-- character list. Note that "modulo" (%) plays an important role here.

SELECT SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,1,1) AS INT)%c.CharLen+1,1) AS Char1,

SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,2,1) AS INT)%c.CharLen+1,1) AS Char2,

SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,3,1) AS INT)%c.CharLen+1,1) AS Char3,

SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,4,1) AS INT)%c.CharLen+1,1) AS Char4,

SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,5,1) AS INT)%c.CharLen+1,1) AS Char5,

SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,6,1) AS INT)%c.CharLen+1,1) AS Char6,

SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,7,1) AS INT)%c.CharLen+1,1) AS Char7,

SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,8,1) AS INT)%c.CharLen+1,1) AS Char8

FROM cteBytes b

CROSS JOIN cteCharacters c

)

,

cteRandomCode AS

( --=== Now we take all the individual letters and slam them together to make a random code.

-- In the process, we make sure that no two adjacent characters are alike.

SELECT Char1+Char2+Char3+Char4+Char5+Char6+Char7+Char8 AS RandomCode

FROM cteChars

WHERE Char1<>Char2 AND Char2<>Char3 AND Char3<>Char4 AND Char4<>Char5 AND Char5<> Char6 AND Char6<>Char7 AND Char7<>Char8

)

,

cteNumberDupes AS

( --=== We need to remove dupes and this numbers one or more occurances of the same code

-- with an incrementing number to count occurances of that same code.

SELECT ROW_NUMBER() OVER (PARTITION BY RandomCode ORDER BY RandomCode) AS Occurance,

RandomCode

FROM cteRandomCode

) --=== This selects the top 1.6 million first occurances of all the codes we haven't

-- rejected so far. Since we only select the first occurance of any given code,

-- this effectively removes dupes and does so much faster than using DISTINCT.

-- The final result is stored in a new table called dbo.jbmRandomCode.

SELECT TOP 1600000

IDENTITY(INT,1,1) AS RowNum,

RandomCode COLLATE Latin1_General_BIN AS RandomCode --For case sensitivity

INTO dbo.jbmRandomCode

FROM cteNumberDupes

WHERE Occurance = 1

;

--===== And now, pick what will be the winning number and store it in a new table.

-- Change the TOP 1 to however many 1st, 2nd, 3rd, etc place winners you want.

SELECT TOP 1

IDENTITY(INT,1,1) AS Place,

RandomCode COLLATE Latin1_General_BIN AS RandomCode --For case sensitivity

INTO dbo.jbmWinnerCode

FROM dbo.jbmRandomCode

ORDER BY NEWID()

;

{edit} Added brute force case sensitivity using COLLATE Latin1_General_BIN just to make life a little easier...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 14

mister.magoo (5/12/2010)
My 2p worth:
Jeff has shown you a framework for generating numbers.
I personally think you could consider generating more numbers and disallowing any with any repeating digits.
You should probably consider going case sensitive e.g. Ax9G3bgt4 to make it harder to guess.
You could (if you wanted) generate all possible combinations of 8 alphanumeric characters (about 2.4 trillion I think unless you want case sensitivity, then it's about 200 trillion ) which would take a couple of weeks to do on a desktop pc at least. (Previous experience shows you can generate a trillion rows in about 6 days) and then pick 1.6 million values from there, which gives you pretty good odds of not being guessed.
Or you could generate batches of random strings and add them to your pool, seeded to deliberately scatter the values so that you don't get silly ones like AAAAAAAA.
You have to define more clearly the parameters of the request to get a really solid answer.
Having just seen your other question, I say again - tell us the whole requirement.

I've incorporated some of the good ideas from your post into the new code I posted above. Thanks, Magoo. Using the characters chosen for the codes (49 in all) and removing any codes having duplicate adjacent letters, it works out to about 33 trillion combinations. Heh... we should start a lottery so old DBA's can retire. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 15

Brilliant answer Jeff!

I like this a lot, but every time I see NEWID() used like this it just makes me yearn for a decent random number generator so that you could just

select top 1600000 Number from RandomNumbers

:unsure:

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);