May 7, 2016 at 5:49 pm
Comments posted to this topic are about the item RANDom traps for the unwary
May 8, 2016 at 9:31 pm
Did you consider to create a view with Rand() column?
CREATE VIEW [Common].[vwRandomView]
AS
SELECT RAND() Result
and use CROSS APPLY with this view?
It worked for me ๐
May 9, 2016 at 1:18 am
SELECT *
FROM Person.StateProvince
WHERE (ABS(CAST((BINARY_CHECKSUM(*) * RAND()) AS INT)) % 100) < 10
What happens if CAST((BINARY_CHECKSUM(*) * RAND()) AS INT = - 2^31 ?
Int Range is <-2 147 483 648, +2 147 483 647>
If BINARY_CHECKSUM(*) * RAND() = -2 147 483 648
then ABS() function doesn't work properly and raises error: Arithmetic overflow.
It happens only once per 4 billions (2^32),
but this mean it will happen day after deploy on production.
May 9, 2016 at 3:23 am
Most built-in random number generators are "cheap" Linear Congruential Generators:
Xn+1 = (a Xn + c) Mod m
They produce a fixed output sequence. The sequence is started at a point determined by the seed. The seed is similar to saying "start the sequence at index n". Re-seeding the generator starts it at the SAME point.
In SQL:
-- Seed
SELECT RAND(12345)
GO
SELECT RAND()
GO 50
Will produce 50 reasonable random numbers. Running the whole batch again will produce the SAME sequence, because the seed is the same.
SELECT RAND(12345)
GO 50
will produce the SAME value 50 times because the generator was re-seeded (set back to the same initial index on every run).
The correct usage is seed ONCE, use MANY times.
However beware, when used in a select, it is only evaluated once, eg
SELECT RAND(),*
FROM sys.objects
GO
will output the SAME value.
-- Seed ONCE (based on the current millisecond)
SELECTRAND(DATEPART(MILLISECOND,GETDATE()))
GO
-- Use MANY
SELECTRAND()
GO50
will produce 50 reasonable random numbers, and every run of the whole batch will be different.
Ultimately, it is the quality of the INITIAL seed that governs the output sequence. There are numerous strategies for a good seed: current date, GUID, CHECKSUM etc. Develop one that works for your situation, but remember: SAME seed = SAME sequence.
May 9, 2016 at 5:32 am
...It happends only once per 4 bilions (2^32),
but this mean it will happen day after deploy on production.
This made my day! ๐
May 9, 2016 at 6:30 am
Thank you! I was planning on writing a similar response. The problem here is not with the Random function, but with the author's confusion about the proper use and limitations of the Random function.
May 9, 2016 at 6:57 am
Thanks for the article.
May 9, 2016 at 7:14 am
I believe I saw Jeff Moden of these forums post a simple, truly random generator.
SELECT (ABS(CHECKSUM(NEWID())) %20) + 1
This generates random numbers between 1 and 20. To have a different range, change the Modulo number.
Very elegant.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
Itโs unpleasantly like being drunk.
Whatโs so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
May 9, 2016 at 7:27 am
Sioban Krzywicki (5/9/2016)
I believe I saw Jeff Moden of these forums post a simple, truly random generator.
SELECT (ABS(CHECKSUM(NEWID())) %20) + 1
This generates random numbers between 1 and 20. To have a different range, change the Modulo number.
Very elegant.
CHECKSUM(NEWID()) can be -2 147 483 648.
Select ABS( -2147483648) -> Msg 8115, LEVEL 16, State 2, Line 2 Arithmetic overflow error converting expression to data type int.
Better version:
SELECT (ABS(CHECKSUM(NEWID())/10) %20) + 1
louie1487 78804 (5/9/2016)
...It happens only once per 4 billions (2^32),
but this mean it will happen day after deploy on production.
This made my day! ๐
It's happend!
May 9, 2016 at 7:48 am
wojciech.muszynski (5/9/2016)
Sioban Krzywicki (5/9/2016)
I believe I saw Jeff Moden of these forums post a simple, truly random generator.
SELECT (ABS(CHECKSUM(NEWID())) %20) + 1
This generates random numbers between 1 and 20. To have a different range, change the Modulo number.
Very elegant.
CHECKSUM(NEWID()) can be -2 147 483 648.
Select ABS( -2147483648) -> Msg 8115, LEVEL 16, State 2, Line 2 Arithmetic overflow error converting expression to data type int.
Better version:
SELECT (ABS(CHECKSUM(NEWID())/10) %20) + 1
louie1487 78804 (5/9/2016)
...It happens only once per 4 billions (2^32),
but this mean it will happen day after deploy on production.
This made my day! ๐
It's happend!
I'd rather do this
SELECT (ABS(CAST(CHECKSUM(NewID()) AS bigint)) %20) + 1
than divide by 10. Matter of preference, really.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
Itโs unpleasantly like being drunk.
Whatโs so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
May 9, 2016 at 8:29 am
I believe scientists rely on hardware solutions such as radioactive decay detection devices to get true randomness.
I stay away from pseudo random data generation and try to approach real-world distributions. There are likely 100,000 times more Bakers, Cooks, Millers, Smiths, and Tailors than there are of my last name. Most things in the world follow a power law distribution, natural distribution, or other mathematical deviation from linear. Scientists looking for life on other planets are guided by the rule that: God doesn't use straight lines. And the most common error made is in thinking that a bell curve will look anything like the standard one taught in statistics classes.
May 9, 2016 at 8:41 am
I use the
ABS(CONVERT(BIGINT, CHECKSUM(NEWID())))
method whenever I need random numbers. I've just tested with the method described by the author and found
1)
InOrder=499951
2)
High occurrence=10139
Both look pretty random to me.
May 9, 2016 at 12:08 pm
Given the ascendency of data science I think there is scope for some articles utilising the different statiscal techniques.
I did enjoy the data mining series on this site.. I've learned more about statistics in the past 5 years than I did at school. Now I have a real world context a dry subject has come to life
May 9, 2016 at 6:14 pm
robert.bruhin (5/9/2016)
Thank you! I was planning on writing a similar response. The problem here is not with the Random function, but with the author's confusion about the proper use and limitations of the Random function.
What confusion and what proper use?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2016 at 6:33 pm
Andrew Morgan (5/9/2016)
However beware, when used in a select, it is only evaluated once, egSELECT RAND(),*
FROM sys.objects
GO
will output the SAME value.
That and a workaround substantial enough for the generation of simple, non-critical test data was covered in the article.
-- Seed ONCE (based on the current millisecond)
SELECTRAND(DATEPART(MILLISECOND,GETDATE()))
GO
-- Use MANY
SELECTRAND()
GO50
will produce 50 reasonable random numbers, and every run of the whole batch will be different.
Use that for generating a million pseudo random values for simple, non-critical test data and see how interested you'll be in doing multiple repeated tests. ๐
Ultimately, it is the quality of the INITIAL seed that governs the output sequence. There are numerous strategies for a good seed: current date, GUID, CHECKSUM etc. Develop one that works for your situation, but remember: SAME seed = SAME sequence.
Agreed... that and the scale, or rather, the lack of scale of the values. The 15 decimal places offered by FLOAT in SQL Server and similar functions in other languages is quick to form "pockets" of data simply due to the lack of scale, as also demonstrated in the article. Of course, a lot of calculations done by even Statisticians have the same problem but the calculations are frequently "good enough".
For the generation of simple, non-critical test data and non-critical pseudo random selection, it's sufficient and I believe that's what the article was aimed at. If we're talking true randomness, obviously neither the methods in the article nor most programming-level functions will suffice.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 26 total)
You must be logged in to reply to this topic. Login to reply