CHOOSE() a RAND() vowel not working???

Question

CHOOSE() a RAND() vowel not working???

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 1

Phil Parkin (1/9/2015)
Thank you Jeff, that is an outstandingly good explanation.
Thanks also for the TOP (n) trick to force the generation of a query plan. I wish I'd known that before.
For fun, I validated your findings by calculating the likelihood of a NULL being generated in this case. Assuming RAND()'s absolute integrity, it's (5/6)^6 = 0.335 or so.
declare @a int = 1;
declare @b-2 int;
declare @T table (Vowel char(1) null);
while @a <= 10000
begin
set @b-2 = ceiling(rand() * 6)
insert @T
(Vowel
)
select choose(ceiling(rand() * 6), 'a', 'e', 'i', 'o', 'u', 'y')
set @a = @a + 1
end;
select count(*) from @T t where t.Vowel is null
You will find that the count is around 33nn every time.

Thanks for the feedback, Phil. And thanks for the testing! 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182937 More actions · Answer 2

Quick thought, an alternative way of getting an execution plan for this kind of simple statements is to use a CTE, has less inpact/noise in the plan as it's normally only one operator instead of three for the TOP generated plan.

😎

-- TOP

SELECT TOP(1) CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE

-- CTE

;WITH CHOOSE_CHOICE AS

(

SELECT CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE

)

SELECT

*

FROM CHOOSE_CHOICE CC;

TOP Plan

StmtText

-- TOP

SELECT TOP(1) CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE

|--Compute Scalar(DEFINE:([Expr1000]=CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(2) THEN 'e' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(3) THEN 'i' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(4) THEN 'o' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(5) THEN 'u' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(6) THEN 'y' ELSE NULL END END END END END END))

|--Top(TOP EXPRESSION:((1)))

|--Constant Scan

CTE Plan

StmtText

-- CTE

;WITH CHOOSE_CHOICE AS

(

SELECT CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE

)

SELECT

*

FROM CHOOSE_CHOICE CC;

|--Constant Scan(VALUES:((CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(2) THEN 'e' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(3) THEN 'i' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(4) THEN 'o' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(5) THEN 'u' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(6) THEN 'y' ELSE NULL END END END END END END)))

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 3

That's cool. Thanks, Eirikur. Since I usually don't look at anything but the Compute Scalar for such things, the TOP 1 trick is easier for me simply because it's less typing and I can test "insitu" without much work on my part.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DonlSimpson SSCertifiable Points: 7043 More actions · Answer 4

It's interesting that neither of these:

substring('aeiouy', cast(rand() * 6 + 1 as int), 1)

substring('aeiouy', cast(ceiling(rand() * 6) as int), 1)

exhibit the same problem.

Don Simpson

I'm not sure about Heisenberg.

Phil Parkin SSC Guru Points: 247037 More actions · Answer 5

DonlSimpson (1/9/2015)
It's interesting that neither of these:
substring('aeiouy', cast(rand() * 6 + 1 as int), 1)
substring('aeiouy', cast(ceiling(rand() * 6) as int), 1)
exhibit the same problem.

There is no (implicit or explicit) CASE construction here, so the expected behaviour is different.

Eirikur Eiriksson SSC Guru Points: 182937 More actions · Answer 6

My thought is that some of the random function implementations in SQL Server are more caused by the limitations of this particular function ( RAND() ) rather than anything else. To be more specific, each instance of the function within a statement produces a single (consistent) value, which par set definition is not very random. This is possibly the reason for the way the function is implemented as Jeff demonstrated in his excellent post.

😎

Consider this code

USE tempdb;

GO

SET NOCOUNT ON;

GO

SET SHOWPLAN_ALL ON;

GO

SELECT TOP (100)

RAND()

FROM sys.all_columns X1, sys.all_columns X2;

Plucking out the Compute Scalar operator from the execution plan shows a single RAND expression

DEFINE: ([Expr1120]=rand())

And not so surprisingly, the results

Output

---------------

0.532528081522471

....

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 7

Eirikur Eiriksson (1/9/2015)
My thought is that some of the random function implementations in SQL Server are more caused by the limitations of this particular function ( RAND() ) rather than anything else. To be more specific, each instance of the function within a statement produces a single (consistent) value, which par set definition is not very random. This is possibly the reason for the way the function is implemented as Jeff demonstrated in his excellent post.

In this case, it's because of the way CHOOSE and CASE work. As you can see in the first part of the code demo I wrote, @b-2 IS deterministic and it has the very same problem in that @b-2 is "recalculated" as instance of @b-2 every time. If @b-2 were indeterminate, it would have the same problem. Even if you were to use all constants (just as a demo, of course), I'm pretty sure that CHOOSE would still do the cascading CASE thing and list the constant(s) in every CASE just as it did with @b-2 and the indeterminate formula.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182937 More actions · Answer 8

Jeff Moden (1/9/2015)
Eirikur Eiriksson (1/9/2015)
My thought is that some of the random function implementations in SQL Server are more caused by the limitations of this particular function ( RAND() ) rather than anything else. To be more specific, each instance of the function within a statement produces a single (consistent) value, which par set definition is not very random. This is possibly the reason for the way the function is implemented as Jeff demonstrated in his excellent post.
In this case, it's because of the way CHOOSE and CASE work. As you can see in the first part of the code demo I wrote, @b-2 IS deterministic and it has the very same problem in that @b-2 is "recalculated" as instance of @b-2 every time. If @b-2 were indeterminate, it would have the same problem. Even if you were to use all constants (just as a demo, of course), I'm pretty sure that CHOOSE would still do the cascading CASE thing and list the constant(s) in every CASE just as it did with @b-2 and the indeterminate formula.

Think we are more or less saying the same thing here: to mitigate the non-(set based)-randomness of a single instance of the function (Compute Scalar RAND()), multiple instances of the function are used instead, otherwise the statement would only produce a single value for the expression. The downside of this implementation is of course that the CASE is evaluated sequentially which produces the misses (NULL).

"Slightly" off the topic, SQL Server lacks a true (or as close as possible) random number set generator function/method. The NEWID function is useful for a simple set generation but the produce cannot be considered a truly random set. BOL's description of the RAND function "Returns a pseudo-random float value from 0 through 1, exclusive" by itself disqualifies the function so obviously it is not fit for that purpose.

😎

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 9

I personally don't know of any programming language that actually has a true random number generator function. That's not to say that there isn't one but I don't know of any.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182937 More actions · Answer 10

Jeff Moden (1/9/2015)
I personally don't know of any programming language that actually has a true random number generator function. That's not to say that there isn't one but I don't know of any.

I should have said "random set generator" rather than "true random...", meaning more fit for purposes such as statistical calculations than the currently available methods. Realistically it would be more in the domain of unpredictability rather than being truly random.

😎

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 11

Have you seen Dwain C's article on the generation of random sets?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182937 More actions · Answer 12

Got it printed in a folder along with your's (about three years back) and Phil Factor's last years RBARR / RBAR-RAND one;-)

😎