May 11, 2010 at 2:35 pm
Absolutly Great job
May 11, 2010 at 2:41 pm
Lowell (5/11/2010)
Jeff Moden (5/11/2010)
1.6 million random 8 character codes that no one can guess and they're guaranteed to be unique within the set... takes about 25 seconds on my 8 year old desktop...you must have a monster "old" desktop; I've got a decent 2.8Gig developer machine and it took 5 times that long; 01:29 to run your code,01:28 on second run.
25 seconds....your old developer machine plugged into a SAN? or did you use DBCC TIMEWARP with this?
While it's true that I keep the flux capacitor on the hardware version of DBCC TIMEWARP loaded up with fresh hampster poo for the ultimate in "flash warping" when bit 77 is set, I forgot to mention that the SELECT * FROM #MyHead was not included in that time period.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 11, 2010 at 2:47 pm
scziege (5/11/2010)
Absolutly Great job
I love a satisfied customer. Thanks for the feedback. Most folks on this thread had the right idea to begin with...
Now... a favor from you, please. Could you tell us why and what you need these for and why a simple IDENTITY column wouldn't have been good enough? It would really be nice to know the details behind the business decisions for things like this. Thanks for your time.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 11, 2010 at 4:56 pm
Jeff, that is an interesting take on Random..... :w00t:
The first ten codes in the sample I tried...
10000180
10000F52
10001875
10001D17
1000221F
10003478
100035E7
10003763
10006415
100081B2
Is there some bias towards groups of characters in that solution ....
Or am I seeing it wrong?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
May 11, 2010 at 6:26 pm
mister.magoo (5/11/2010)
Jeff, that is an interesting take on Random..... :w00t:The first ten codes in the sample I tried...
10000180
10000F52
10001875
10001D17
1000221F
10003478
100035E7
10003763
10006415
100081B2
Is there some bias towards groups of characters in that solution ....
Or am I seeing it wrong?
Heh... good question, actually. The "bias" isn't really "bias" It's because we have 1.6 million random numbers starting at 10000000 because I didn't want to have any numbers that began with leading zeros. The other "bias" is only because the PARTITION clause sorts the final output. If we had sorted the final output using the whole of NEWID() for 1.6 million rows, the first 8 characters would form a similar "pattern" of ascent.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 11, 2010 at 6:35 pm
Magoo... here's what I'm talking about... if I take out the partitioning that supports the dupe check, then you probably get more like what you're expecting...
WITH
cteFirstGen AS
( --=== Gen enough 8 character random codes to cover possible dupes for 1.6 million codes
SELECT TOP 2000000 LEFT(NEWID(),8) AS RandomCode
FROM Master.sys.All_Columns ac1
CROSS JOIN Master.sys.All_Columns ac2
)
,
cteNumberDupes AS
( --=== Number the codes so we can select unique random codes
SELECT --ROW_NUMBER() OVER (PARTITION BY RandomCode ORDER BY (SELECT NULL)) AS Occurance,
RandomCode
FROM cteFirstGen
WHERE RandomCode > '10000000'
) --=== Select 1.6 million unique random codes only
SELECT TOP 1600000 RandomCode
INTO #MyHead
FROM cteNumberDupes
-- WHERE Occurance = 1
SELECT * FROM #MyHead
The problem is that there's no guarantee that there's no dupe. You can use a DISTINCT if you don't mind waiting the extra time... it will double the amount of time it takes to execute this way....
WITH
cteFirstGen AS
( --=== Gen enough 8 character random codes to cover possible dupes for 1.6 million codes
SELECT TOP 2000000 LEFT(NEWID(),8) AS RandomCode
FROM Master.sys.All_Columns ac1
CROSS JOIN Master.sys.All_Columns ac2
)
,
cteNumberDupes AS
( --=== Number the codes so we can select unique random codes
SELECT --ROW_NUMBER() OVER (PARTITION BY RandomCode ORDER BY (SELECT NULL)) AS Occurance,
RandomCode
FROM cteFirstGen
WHERE RandomCode > '10000000'
) --=== Select 1.6 million unique random codes only
SELECT DISTINCT TOP 1600000 RandomCode
INTO #MyHead
FROM cteNumberDupes
-- WHERE Occurance = 1
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2010 at 12:12 am
Hey Jeff,
our customer want to make a promotion on a product package with this codes.
The customer can buy the product and insert the code onto a web page, after that
he will win or loose.
But do you think the codes are safe or is it possible to guess the code, on the web page
we use captchas.
Thanks
Thorsten
May 12, 2010 at 12:24 am
Hey,
another stupid question. Is it possible to check the codes against a table in which I store old codes ?
The main point is the performance we have to look.
We have a table with 100 Million rows and should insert 5 Million rows which not exists in the database.
Thanks in adavance
Thorsten
May 12, 2010 at 12:33 am
My 2p worth:
Jeff has shown you a framework for generating numbers.
I personally think you could consider generating more numbers and disallowing any with any repeating digits.
You should probably consider going case sensitive e.g. Ax9G3bgt4 to make it harder to guess.
You could (if you wanted) generate all possible combinations of 8 alphanumeric characters (about 2.4 trillion I think unless you want case sensitivity, then it's about 200 trillion ) which would take a couple of weeks to do on a desktop pc at least. (Previous experience shows you can generate a trillion rows in about 6 days) and then pick 1.6 million values from there, which gives you pretty good odds of not being guessed.
Or you could generate batches of random strings and add them to your pool, seeded to deliberately scatter the values so that you don't get silly ones like AAAAAAAA.
You have to define more clearly the parameters of the request to get a really solid answer.
Having just seen your other question, I say again - tell us the whole requirement.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
May 12, 2010 at 1:34 am
Great explanation, Jeff! That's one thing I wanted to ask.
It's strange to see that partition by seems to be faster than distinct. I would have expected to be equally fast (or slow).
Just out of curiosity, how did you come up with this? I suppose this is something you already had in your toolbox.
-- Gianluca Sartori
May 12, 2010 at 8:02 am
Gianluca Sartori (5/12/2010)
Great explanation, Jeff! That's one thing I wanted to ask.It's strange to see that partition by seems to be faster than distinct. I would have expected to be equally fast (or slow).
Just out of curiosity, how did you come up with this? I suppose this is something you already had in your toolbox.
I'm not 100% sure of the "internals" but it seemed logical to me that the partition would do a single sort whereas DISTINCT has to check each value against a range of other values and then, if within the range, check for the actual value. I'm no expert on the internals so rather than guess, I tried both. The partitioning won. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2010 at 8:07 am
scziege (5/12/2010)
Hey Jeff,our customer want to make a promotion on a product package with this codes.
The customer can buy the product and insert the code onto a web page, after that
he will win or loose.
But do you think the codes are safe or is it possible to guess the code, on the web page
we use captchas.
Thanks
Thorsten
With random numbers with such a relatively small domain, someone could get lucky with a guess. If you were to use a full NEWID(), instead, it would be like someone guessing the exact linear mile of travel you were in across a diameter of something like 17 quadrillion Milkyway galaxies.
It would be a good test, as well. If you can't successfully and correctly enter a 36 character code, you don't deserve to win. 😛
Maggo is correct... if you're going to use these codes for such a contest, codes with repeating adjacent characters should be eliminated and to prevent "mistrakes", things like lower case "L" and "1" and upper case "0" and zero should be eliminated from the mix. That's actually a lot easier to do than what most people would think although it does take a little extra time to generate. I'll play with this tonight because it IS an interesting problem. I've got a "random password generator" bit of code that I can steal a bit of code from.
Someone will likely beat me to it, though... I've got to get to work. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2010 at 9:07 pm
scziege (5/12/2010)
Hey Jeff,our customer want to make a promotion on a product package with this codes.
The customer can buy the product and insert the code onto a web page, after that
he will win or loose.
But do you think the codes are safe or is it possible to guess the code, on the web page
we use captchas.
Thanks
Thorsten
Heh... I guess there were no takers while I was at work. Ok, no problem. We've got this covered.
Thorsten, I don't know what your company makes or sells but I'm thinking they owe me a bit for this one. 😉 Ask them if they can send me a couple of boxes of whatever they make. I'm sure that I can find someone or a homeless shelter in the area that needs whatever they sell. Or, if they want, they can make a donation to their favorite charity in my name, please.
The whole thing takes 00:01:40 to run on my 8 year old single P4 CPU desktop and does a lot of things mentioned on this thread. Because random letters frequently spell out swear words, I added a "fix" for that little problem, as well. I also have it pick the "winning" code and store it in a table as well as storing the 1.6 million random codes in another table. If you need more than 1 winning code, then change the TOP 1 to TOP n where "n" is the number of winning codes you want the machine to pick. The winning code table also has a "Place" number in case of such an eventuality. The random code table also has an IDENTITY column so whoever prints up inserts can keep track of what they printed.
We could check against the previous random code list you have but I'm going to hold out for a couple of more boxes of whatever your company sells for a homeless shelter. 😉
Here's the code... as usual, the details are documented in the code itself...
--===== Conditionally drop the permanent random code and winner tables
-- Uncomment if you want to test for reruns
IF OBJECT_ID('dbo.jbmRandomCode','U') IS NOT NULL
DROP TABLE dbo.jbmRandomCode
;
IF OBJECT_ID('dbo.jbmWinnerCode','U') IS NOT NULL
DROP TABLE dbo.jbmWinnerCode
;
--===== Generate 1.6 Million random codes where no two adjacent characters
-- are alike. We also remove all vowels to keep from accidently
-- spelling swear words and we remove things that could be mistaken
-- for each other such as:
-- Capital "O" and "0" (zero)
-- Lower Case "L" and "1" (one)
-- Lower Case "o" removed just to avoid all confusion
-- Jeff Moden
WITH
cteBytes AS
( --=== Generate 2.2 Million random 8 byte numbers.
-- We need this many so that when we remove numbers that
-- have identical adjacent characters and any dupes,
-- we still have enough to gen 1.6 Million random codes.
-- This is done at the byte level because NEWID is expensive to run.
-- If we ran 8 individual NEWID()'s for each code, this code would
-- take a lot longer.
SELECT TOP 2200000 CAST(NEWID() AS BINARY(8)) AS Bytes
FROM Master.sys.All_Columns ac1
CROSS JOIN Master.sys.All_Columns ac2
)
,
cteCharacters AS
( --=== This is just the list of characters that we have to chose from.
-- Vowels and certain other characters have been removed to avoid confusion.
-- Note that both strings must be absolutely identical here.
SELECT LEN('BCDFGHJKLMNPQRSTVWXYZbcdfghjkmnpqrstvwxyz23456789') AS CharLen,
'BCDFGHJKLMNPQRSTVWXYZbcdfghjkmnpqrstvwxyz23456789' AS Characters
)
,
cteChars AS
( --=== This takes each byte from the random 8 byte binary number and converts each into
-- a human readable character by using the random byte to select from the allowed
-- character list. Note that "modulo" (%) plays an important role here.
SELECT SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,1,1) AS INT)%c.CharLen+1,1) AS Char1,
SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,2,1) AS INT)%c.CharLen+1,1) AS Char2,
SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,3,1) AS INT)%c.CharLen+1,1) AS Char3,
SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,4,1) AS INT)%c.CharLen+1,1) AS Char4,
SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,5,1) AS INT)%c.CharLen+1,1) AS Char5,
SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,6,1) AS INT)%c.CharLen+1,1) AS Char6,
SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,7,1) AS INT)%c.CharLen+1,1) AS Char7,
SUBSTRING(c.Characters,CAST(SUBSTRING(b.Bytes,8,1) AS INT)%c.CharLen+1,1) AS Char8
FROM cteBytes b
CROSS JOIN cteCharacters c
)
,
cteRandomCode AS
( --=== Now we take all the individual letters and slam them together to make a random code.
-- In the process, we make sure that no two adjacent characters are alike.
SELECT Char1+Char2+Char3+Char4+Char5+Char6+Char7+Char8 AS RandomCode
FROM cteChars
WHERE Char1<>Char2 AND Char2<>Char3 AND Char3<>Char4 AND Char4<>Char5 AND Char5<> Char6 AND Char6<>Char7 AND Char7<>Char8
)
,
cteNumberDupes AS
( --=== We need to remove dupes and this numbers one or more occurances of the same code
-- with an incrementing number to count occurances of that same code.
SELECT ROW_NUMBER() OVER (PARTITION BY RandomCode ORDER BY RandomCode) AS Occurance,
RandomCode
FROM cteRandomCode
) --=== This selects the top 1.6 million first occurances of all the codes we haven't
-- rejected so far. Since we only select the first occurance of any given code,
-- this effectively removes dupes and does so much faster than using DISTINCT.
-- The final result is stored in a new table called dbo.jbmRandomCode.
SELECT TOP 1600000
IDENTITY(INT,1,1) AS RowNum,
RandomCode COLLATE Latin1_General_BIN AS RandomCode --For case sensitivity
INTO dbo.jbmRandomCode
FROM cteNumberDupes
WHERE Occurance = 1
;
--===== And now, pick what will be the winning number and store it in a new table.
-- Change the TOP 1 to however many 1st, 2nd, 3rd, etc place winners you want.
SELECT TOP 1
IDENTITY(INT,1,1) AS Place,
RandomCode COLLATE Latin1_General_BIN AS RandomCode --For case sensitivity
INTO dbo.jbmWinnerCode
FROM dbo.jbmRandomCode
ORDER BY NEWID()
;
{edit} Added brute force case sensitivity using COLLATE Latin1_General_BIN just to make life a little easier...
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2010 at 10:21 pm
mister.magoo (5/12/2010)
My 2p worth:Jeff has shown you a framework for generating numbers.
I personally think you could consider generating more numbers and disallowing any with any repeating digits.
You should probably consider going case sensitive e.g. Ax9G3bgt4 to make it harder to guess.
You could (if you wanted) generate all possible combinations of 8 alphanumeric characters (about 2.4 trillion I think unless you want case sensitivity, then it's about 200 trillion ) which would take a couple of weeks to do on a desktop pc at least. (Previous experience shows you can generate a trillion rows in about 6 days) and then pick 1.6 million values from there, which gives you pretty good odds of not being guessed.
Or you could generate batches of random strings and add them to your pool, seeded to deliberately scatter the values so that you don't get silly ones like AAAAAAAA.
You have to define more clearly the parameters of the request to get a really solid answer.
Having just seen your other question, I say again - tell us the whole requirement.
I've incorporated some of the good ideas from your post into the new code I posted above. Thanks, Magoo. Using the characters chosen for the codes (49 in all) and removing any codes having duplicate adjacent letters, it works out to about 33 trillion combinations. Heh... we should start a lottery so old DBA's can retire. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2010 at 12:32 am
Brilliant answer Jeff!
I like this a lot, but every time I see NEWID() used like this it just makes me yearn for a decent random number generator so that you could just
select top 1600000 Number from RandomNumbers
:unsure:
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 15 posts - 16 through 30 (of 36 total)
You must be logged in to reply to this topic. Login to reply