January 9, 2015 at 7:20 am
Phil Parkin (1/9/2015)
Thank you Jeff, that is an outstandingly good explanation.Thanks also for the TOP (n) trick to force the generation of a query plan. I wish I'd known that before.
For fun, I validated your findings by calculating the likelihood of a NULL being generated in this case. Assuming RAND()'s absolute integrity, it's (5/6)^6 = 0.335 or so.
declare @a int = 1;
declare @b-2 int;
declare @T table (Vowel char(1) null);
while @a <= 10000
begin
set @b-2 = ceiling(rand() * 6)
insert @T
(Vowel
)
select choose(ceiling(rand() * 6), 'a', 'e', 'i', 'o', 'u', 'y')
end;
select count(*) from @T t where t.Vowel is null
You will find that the count is around 33nn every time.
Thanks for the feedback, Phil. And thanks for the testing! 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
January 9, 2015 at 7:48 am
Quick thought, an alternative way of getting an execution plan for this kind of simple statements is to use a CTE, has less inpact/noise in the plan as it's normally only one operator instead of three for the TOP generated plan.
😎
-- TOP
SELECT TOP(1) CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE
-- CTE
;WITH CHOOSE_CHOICE AS
(
SELECT CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE
)
SELECT
*
FROM CHOOSE_CHOICE CC;
TOP Plan
StmtText
-- TOP
SELECT TOP(1) CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE
|--Compute Scalar(DEFINE:([Expr1000]=CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(2) THEN 'e' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(3) THEN 'i' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(4) THEN 'o' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(5) THEN 'u' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(6) THEN 'y' ELSE NULL END END END END END END))
|--Top(TOP EXPRESSION:((1)))
|--Constant Scan
CTE Plan
StmtText
-- CTE
;WITH CHOOSE_CHOICE AS
(
SELECT CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE
)
SELECT
*
FROM CHOOSE_CHOICE CC;
|--Constant Scan(VALUES:((CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(2) THEN 'e' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(3) THEN 'i' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(4) THEN 'o' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(5) THEN 'u' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(6) THEN 'y' ELSE NULL END END END END END END)))
January 9, 2015 at 8:35 am
That's cool. Thanks, Eirikur. Since I usually don't look at anything but the Compute Scalar for such things, the TOP 1 trick is easier for me simply because it's less typing and I can test "insitu" without much work on my part.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 9, 2015 at 9:00 am
It's interesting that neither of these:
substring('aeiouy', cast(rand() * 6 + 1 as int), 1)
substring('aeiouy', cast(ceiling(rand() * 6) as int), 1)
exhibit the same problem.
Don Simpson
January 9, 2015 at 9:11 am
DonlSimpson (1/9/2015)
It's interesting that neither of these:
substring('aeiouy', cast(rand() * 6 + 1 as int), 1)
substring('aeiouy', cast(ceiling(rand() * 6) as int), 1)
exhibit the same problem.
There is no (implicit or explicit) CASE construction here, so the expected behaviour is different.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
January 9, 2015 at 10:04 am
My thought is that some of the random function implementations in SQL Server are more caused by the limitations of this particular function ( RAND() ) rather than anything else. To be more specific, each instance of the function within a statement produces a single (consistent) value, which par set definition is not very random. This is possibly the reason for the way the function is implemented as Jeff demonstrated in his excellent post.
😎
Consider this code
USE tempdb;
GO
SET NOCOUNT ON;
GO
SET SHOWPLAN_ALL ON;
GO
SELECT TOP (100)
RAND()
FROM sys.all_columns X1, sys.all_columns X2;
Plucking out the Compute Scalar operator from the execution plan shows a single RAND expression
DEFINE: ([Expr1120]=rand())
And not so surprisingly, the results
Output
---------------
0.532528081522471
0.532528081522471
0.532528081522471
0.532528081522471
0.532528081522471
0.532528081522471
0.532528081522471
....
January 9, 2015 at 11:36 am
Eirikur Eiriksson (1/9/2015)
My thought is that some of the random function implementations in SQL Server are more caused by the limitations of this particular function ( RAND() ) rather than anything else. To be more specific, each instance of the function within a statement produces a single (consistent) value, which par set definition is not very random. This is possibly the reason for the way the function is implemented as Jeff demonstrated in his excellent post.
In this case, it's because of the way CHOOSE and CASE work. As you can see in the first part of the code demo I wrote, @b-2 IS deterministic and it has the very same problem in that @b-2 is "recalculated" as instance of @b-2 every time. If @b-2 were indeterminate, it would have the same problem. Even if you were to use all constants (just as a demo, of course), I'm pretty sure that CHOOSE would still do the cascading CASE thing and list the constant(s) in every CASE just as it did with @b-2 and the indeterminate formula.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 9, 2015 at 12:07 pm
Jeff Moden (1/9/2015)
Eirikur Eiriksson (1/9/2015)
My thought is that some of the random function implementations in SQL Server are more caused by the limitations of this particular function ( RAND() ) rather than anything else. To be more specific, each instance of the function within a statement produces a single (consistent) value, which par set definition is not very random. This is possibly the reason for the way the function is implemented as Jeff demonstrated in his excellent post.In this case, it's because of the way CHOOSE and CASE work. As you can see in the first part of the code demo I wrote, @b-2 IS deterministic and it has the very same problem in that @b-2 is "recalculated" as instance of @b-2 every time. If @b-2 were indeterminate, it would have the same problem. Even if you were to use all constants (just as a demo, of course), I'm pretty sure that CHOOSE would still do the cascading CASE thing and list the constant(s) in every CASE just as it did with @b-2 and the indeterminate formula.
Think we are more or less saying the same thing here: to mitigate the non-(set based)-randomness of a single instance of the function (Compute Scalar RAND()), multiple instances of the function are used instead, otherwise the statement would only produce a single value for the expression. The downside of this implementation is of course that the CASE is evaluated sequentially which produces the misses (NULL).
"Slightly" off the topic, SQL Server lacks a true (or as close as possible) random number set generator function/method. The NEWID function is useful for a simple set generation but the produce cannot be considered a truly random set. BOL's description of the RAND function "Returns a pseudo-random float value from 0 through 1, exclusive" by itself disqualifies the function so obviously it is not fit for that purpose.
😎
January 9, 2015 at 12:14 pm
I personally don't know of any programming language that actually has a true random number generator function. That's not to say that there isn't one but I don't know of any.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 9, 2015 at 12:46 pm
Jeff Moden (1/9/2015)
I personally don't know of any programming language that actually has a true random number generator function. That's not to say that there isn't one but I don't know of any.
I should have said "random set generator" rather than "true random...", meaning more fit for purposes such as statistical calculations than the currently available methods. Realistically it would be more in the domain of unpredictability rather than being truly random.
😎
January 9, 2015 at 12:51 pm
Have you seen Dwain C's article on the generation of random sets?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 9, 2015 at 1:05 pm
Got it printed in a folder along with your's (about three years back) and Phil Factor's last years RBARR / RBAR-RAND one;-)
😎
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply