CHOOSE() a RAND() vowel not working???

  • Phil Parkin (1/9/2015)


    Thank you Jeff, that is an outstandingly good explanation.

    Thanks also for the TOP (n) trick to force the generation of a query plan. I wish I'd known that before.

    For fun, I validated your findings by calculating the likelihood of a NULL being generated in this case. Assuming RAND()'s absolute integrity, it's (5/6)^6 = 0.335 or so.

    declare @a int = 1;

    declare @b-2 int;

    declare @T table (Vowel char(1) null);

    while @a <= 10000

    begin

    set @b-2 = ceiling(rand() * 6)

    insert @T

    (Vowel

    )

    select choose(ceiling(rand() * 6), 'a', 'e', 'i', 'o', 'u', 'y')

    set @a = @a + 1

    end;

    select count(*) from @T t where t.Vowel is null

    You will find that the count is around 33nn every time.

    Thanks for the feedback, Phil. And thanks for the testing! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Quick thought, an alternative way of getting an execution plan for this kind of simple statements is to use a CTE, has less inpact/noise in the plan as it's normally only one operator instead of three for the TOP generated plan.

    😎

    -- TOP

    SELECT TOP(1) CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE

    -- CTE

    ;WITH CHOOSE_CHOICE AS

    (

    SELECT CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE

    )

    SELECT

    *

    FROM CHOOSE_CHOICE CC;

    TOP Plan

    StmtText

    -- TOP

    SELECT TOP(1) CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE

    |--Compute Scalar(DEFINE:([Expr1000]=CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(2) THEN 'e' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(3) THEN 'i' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(4) THEN 'o' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(5) THEN 'u' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(6) THEN 'y' ELSE NULL END END END END END END))

    |--Top(TOP EXPRESSION:((1)))

    |--Constant Scan

    CTE Plan

    StmtText

    -- CTE

    ;WITH CHOOSE_CHOICE AS

    (

    SELECT CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y') AS CHOICE

    )

    SELECT

    *

    FROM CHOOSE_CHOICE CC;

    |--Constant Scan(VALUES:((CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(2) THEN 'e' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(3) THEN 'i' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(4) THEN 'o' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(5) THEN 'u' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(6) THEN 'y' ELSE NULL END END END END END END)))

  • That's cool. Thanks, Eirikur. Since I usually don't look at anything but the Compute Scalar for such things, the TOP 1 trick is easier for me simply because it's less typing and I can test "insitu" without much work on my part.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's interesting that neither of these:

    substring('aeiouy', cast(rand() * 6 + 1 as int), 1)

    substring('aeiouy', cast(ceiling(rand() * 6) as int), 1)

    exhibit the same problem.

    Don Simpson



    I'm not sure about Heisenberg.

  • DonlSimpson (1/9/2015)


    It's interesting that neither of these:

    substring('aeiouy', cast(rand() * 6 + 1 as int), 1)

    substring('aeiouy', cast(ceiling(rand() * 6) as int), 1)

    exhibit the same problem.

    There is no (implicit or explicit) CASE construction here, so the expected behaviour is different.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • My thought is that some of the random function implementations in SQL Server are more caused by the limitations of this particular function ( RAND() ) rather than anything else. To be more specific, each instance of the function within a statement produces a single (consistent) value, which par set definition is not very random. This is possibly the reason for the way the function is implemented as Jeff demonstrated in his excellent post.

    😎

    Consider this code

    USE tempdb;

    GO

    SET NOCOUNT ON;

    GO

    SET SHOWPLAN_ALL ON;

    GO

    SELECT TOP (100)

    RAND()

    FROM sys.all_columns X1, sys.all_columns X2;

    Plucking out the Compute Scalar operator from the execution plan shows a single RAND expression

    DEFINE: ([Expr1120]=rand())

    And not so surprisingly, the results

    Output

    ---------------

    0.532528081522471

    0.532528081522471

    0.532528081522471

    0.532528081522471

    0.532528081522471

    0.532528081522471

    0.532528081522471

    ....

  • Eirikur Eiriksson (1/9/2015)


    My thought is that some of the random function implementations in SQL Server are more caused by the limitations of this particular function ( RAND() ) rather than anything else. To be more specific, each instance of the function within a statement produces a single (consistent) value, which par set definition is not very random. This is possibly the reason for the way the function is implemented as Jeff demonstrated in his excellent post.

    In this case, it's because of the way CHOOSE and CASE work. As you can see in the first part of the code demo I wrote, @b-2 IS deterministic and it has the very same problem in that @b-2 is "recalculated" as instance of @b-2 every time. If @b-2 were indeterminate, it would have the same problem. Even if you were to use all constants (just as a demo, of course), I'm pretty sure that CHOOSE would still do the cascading CASE thing and list the constant(s) in every CASE just as it did with @b-2 and the indeterminate formula.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/9/2015)


    Eirikur Eiriksson (1/9/2015)


    My thought is that some of the random function implementations in SQL Server are more caused by the limitations of this particular function ( RAND() ) rather than anything else. To be more specific, each instance of the function within a statement produces a single (consistent) value, which par set definition is not very random. This is possibly the reason for the way the function is implemented as Jeff demonstrated in his excellent post.

    In this case, it's because of the way CHOOSE and CASE work. As you can see in the first part of the code demo I wrote, @b-2 IS deterministic and it has the very same problem in that @b-2 is "recalculated" as instance of @b-2 every time. If @b-2 were indeterminate, it would have the same problem. Even if you were to use all constants (just as a demo, of course), I'm pretty sure that CHOOSE would still do the cascading CASE thing and list the constant(s) in every CASE just as it did with @b-2 and the indeterminate formula.

    Think we are more or less saying the same thing here: to mitigate the non-(set based)-randomness of a single instance of the function (Compute Scalar RAND()), multiple instances of the function are used instead, otherwise the statement would only produce a single value for the expression. The downside of this implementation is of course that the CASE is evaluated sequentially which produces the misses (NULL).

    "Slightly" off the topic, SQL Server lacks a true (or as close as possible) random number set generator function/method. The NEWID function is useful for a simple set generation but the produce cannot be considered a truly random set. BOL's description of the RAND function "Returns a pseudo-random float value from 0 through 1, exclusive" by itself disqualifies the function so obviously it is not fit for that purpose.

    😎

  • I personally don't know of any programming language that actually has a true random number generator function. That's not to say that there isn't one but I don't know of any.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/9/2015)


    I personally don't know of any programming language that actually has a true random number generator function. That's not to say that there isn't one but I don't know of any.

    I should have said "random set generator" rather than "true random...", meaning more fit for purposes such as statistical calculations than the currently available methods. Realistically it would be more in the domain of unpredictability rather than being truly random.

    😎

  • Have you seen Dwain C's article on the generation of random sets?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Got it printed in a folder along with your's (about three years back) and Phil Factor's last years RBARR / RBAR-RAND one;-)

    😎

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply