January 5, 2015 at 7:07 pm
On my SQL Server 2012 SP2 installation, The following expression always returns an integer between 1 and 6:
[font="Courier New"]SELECT CEILING(RAND()*6)[/font]
...but this function only works half the time, returning a random vowel, but also gives random NULLs:
[font="Courier New"]SELECT CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y')[/font]
According to MSDN, no data type conversion should be necessary: "If the provided index value has a numeric data type other than int, then the value is implicitly converted to an integer." I tried casting it to an INT anyway, and it still didn't work.
So, I tried testing it by setting the value is a script variable:
[font="Courier New"]DECLARE @a INT = 1;
DECLARE @b-2 INT;
WHILE @a <= 100
BEGIN
SET @b-2 = CEILING(RAND()*6)
SELECT CHOOSE(@B,'a','e','i','o','u','y')
END[/font]
...and it works every time.
Any idea why the single-line version returns NULLs only some of the time?
Thanks!
January 6, 2015 at 12:42 am
jwiseh (1/5/2015)
On my SQL Server 2012 SP2 installation, The following expression always returns an integer between 1 and 6:[font="Courier New"]SELECT CEILING(RAND()*6)[/font]
...but this function only works half the time, returning a random vowel, but also gives random NULLs:
[font="Courier New"]SELECT CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y')[/font]
According to MSDN, no data type conversion should be necessary: "If the provided index value has a numeric data type other than int, then the value is implicitly converted to an integer." I tried casting it to an INT anyway, and it still didn't work.
So, I tried testing it by setting the value is a script variable:
[font="Courier New"]DECLARE @a INT = 1;
DECLARE @b-2 INT;
WHILE @a <= 100
BEGIN
SET @b-2 = CEILING(RAND()*6)
SELECT CHOOSE(@B,'a','e','i','o','u','y')
END[/font]
...and it works every time.
Any idea why the single-line version returns NULLs only some of the time?
Thanks!
This is kind of a "behavioural problem" as the server doesn't necessarily follow the code's implied order of execution but rather what it deems fit. Few similar issues around, none which have been marked as a bug.
😎
Under the hood, the CHOOSE and the IIF functions are implemented as CASE statement, in fact the CASE statement will produce the same results, that is some random NULL values even if both 0 and NULL are in the list. Consider this code sample
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @SAMPLE_SIZE INT = 100;
DECLARE @ITER INT = 0;
DECLARE @RESULT TABLE (RC CHAR(1) NULL,CC CHAR(1) NULL);
WHILE (@ITER < @SAMPLE_SIZE)
BEGIN
SET @ITER += 1;
INSERT INTO @RESULT(RC,CC)
SELECT
CHOOSE( CEILING( RAND() * 6 ),'a','e','i','o','u','y')
,CASE CEILING( RAND() * 6 )
WHEN NULL THEN 'N'
WHEN 0 THEN '!'
WHEN 1 THEN 'a'
WHEN 2 THEN 'e'
WHEN 3 THEN 'i'
WHEN 4 THEN 'o'
WHEN 5 THEN 'u'
WHEN 6 THEN 'y'
END;
END
SELECT
X.RC
,X.CC
FROM @RESULT X;
Results
RC CC
---- ----
a NULL
y NULL
o NULL
o e
a u
NULL e
e e
NULL NULL
NULL NULL
y i
a NULL
o NULL
NULL NULL
NULL y
NULL u
i a
NULL NULL
e NULL
y o
e e
a i
i a
NULL NULL
NULL i
i y
NULL o
i o
u i
e NULL
a a
a y
NULL NULL
y NULL
NULL i
i y
a y
NULL NULL
e u
i i
a NULL
a NULL
NULL NULL
NULL u
e a
NULL a
i a
NULL i
o NULL
y e
u NULL
y NULL
NULL y
NULL u
NULL o
NULL u
e u
a o
NULL NULL
NULL NULL
e NULL
e NULL
a i
a NULL
NULL u
i i
y e
NULL NULL
NULL e
a NULL
NULL i
NULL NULL
NULL NULL
NULL o
a a
NULL NULL
a o
a e
o o
NULL a
y i
a NULL
i NULL
o i
i a
e e
NULL NULL
i i
NULL NULL
NULL NULL
NULL NULL
a e
NULL o
e i
NULL NULL
NULL a
a i
i NULL
o NULL
NULL a
i NULL
The reason for the NULLs returned by the CHOOSE is by design, if the index number's item is not found in the list, the function returns NULL.
The workaround is to generate the index number outside the CHOOSE function, here is a quick example
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @SAMPLE_SIZE INT = 100;
;WITH T(N) AS ( SELECT N FROM ( VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
, NUMS(N) AS ( SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)
/* Generate random numbers */
,RANDOM_SET AS
(
SELECT
NM.N
,1 + (ABS(CHECKSUM(NEWID())) % 6 ) AS RNUM
FROM NUMS NM
)
/* Choose from the list */
SELECT
RS.N
,RS.RNUM
,CHOOSE(CEILING(RS.RNUM),'a','e','i','o','u','y') AS CHOICE
FROM RANDOM_SET RS;
Results
N RNUM CHOICE
------- ------ ------
1 1 a
2 6 y
3 4 o
4 2 e
5 6 y
6 4 o
7 6 y
8 1 a
9 2 e
10 3 i
11 6 y
12 5 u
13 5 u
14 2 e
15 1 a
16 6 y
17 1 a
18 3 i
19 3 i
20 4 o
21 6 y
22 6 y
23 5 u
24 3 i
25 6 y
26 5 u
27 2 e
28 5 u
29 6 y
30 3 i
31 4 o
32 5 u
33 2 e
34 6 y
35 5 u
36 1 a
37 4 o
38 4 o
39 1 a
40 6 y
41 6 y
42 3 i
43 3 i
44 4 o
45 1 a
46 2 e
47 6 y
48 3 i
49 6 y
50 2 e
51 2 e
52 5 u
53 4 o
54 6 y
55 5 u
56 5 u
57 3 i
58 4 o
59 1 a
60 4 o
61 3 i
62 3 i
63 1 a
64 4 o
65 2 e
66 3 i
67 5 u
68 4 o
69 5 u
70 4 o
71 4 o
72 4 o
73 5 u
74 3 i
75 5 u
76 2 e
77 3 i
78 5 u
79 1 a
80 2 e
81 6 y
82 3 i
83 6 y
84 3 i
85 3 i
86 2 e
87 2 e
88 4 o
89 6 y
90 4 o
91 6 y
92 5 u
93 2 e
94 3 i
95 4 o
96 1 a
97 4 o
98 5 u
99 4 o
100 2 e
January 6, 2015 at 3:25 am
Just played around with this a bit and it seems that the CHOOSE function does not deal well using functions directly in the 'index' argument.
This is not just a consequence of short-circuiting: I'd go so far as to say that it is a bug.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 6, 2015 at 8:24 am
jwiseh (1/5/2015)
On my SQL Server 2012 SP2 installation, The following expression always returns an integer between 1 and 6:[font="Courier New"]SELECT CEILING(RAND()*6)[/font]
...but this function only works half the time, returning a random vowel, but also gives random NULLs:
[font="Courier New"]SELECT CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y')[/font]
According to MSDN, no data type conversion should be necessary: "If the provided index value has a numeric data type other than int, then the value is implicitly converted to an integer." I tried casting it to an INT anyway, and it still didn't work.
So, I tried testing it by setting the value is a script variable:
[font="Courier New"]DECLARE @a INT = 1;
DECLARE @b-2 INT;
WHILE @a <= 100
BEGIN
SET @b-2 = CEILING(RAND()*6)
SELECT CHOOSE(@B,'a','e','i','o','u','y')
END[/font]
...and it works every time.
Any idea why the single-line version returns NULLs only some of the time?
Thanks!
Forget about CHOOSE here. Forget about WHILE loops here. The following will generate as many random vowels (up to about 16 million) as you need in two shakes of a stick and I guarantee no NULLs will come up. It uses the natural "pseudo-cursor" effect built into every SELECT.
DECLARE @NumberOfVowels INT;
SELECT @NumberOfVowels = 1000000;
SELECT TOP (@NumberOfVowels)
RandomVowel = SUBSTRING('aeiouy',ABS(CHECKSUM(NEWID()))%6+1,1)
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
--Jeff Moden
Change is inevitable... Change for the better is not.
January 6, 2015 at 8:46 am
Nice one Jeff 😎
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 6, 2015 at 9:02 am
Phil Parkin (1/6/2015)
Nice one Jeff 😎
Agreed - very nice. Use math in a set and watch it fly.
January 6, 2015 at 1:44 pm
Thank you , Jeff, that's a perfect workaround!
Thank you for the rest of the replies as well. It's nice to know it's probably a bug in CHOOSE, and not me missing something... 😉
January 6, 2015 at 2:17 pm
Jeff Moden (1/6/2015)
Forget about CHOOSE here. Forget about WHILE loops here. The following will generate as many random vowels (up to about 16 million) as you need in two shakes of a stick and I guarantee no NULLs will come up.
Plenty of alternatives here (nice one Jeff), my thought is why the extreme short-cutting of the CASE clan, as Phil mentioned, it's small, dark, many legs, gets stuck in a relay, qualifies as a bug to me.
😎
While you'r at it, save yourself around 15% using an inline tally
DECLARE @NumberOfVowels INT;
SELECT @NumberOfVowels = 1000000
;
DECLARE @CHAR_BUCKET CHAR(1) = '';
SET STATISTICS IO, TIME ON;
SELECT TOP (@NumberOfVowels)
@CHAR_BUCKET = SUBSTRING('aeiouy',ABS(CHECKSUM(NEWID()))%6+1,1)
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
SET STATISTICS IO, TIME OFF;
SET STATISTICS IO, TIME ON;
;WITH T(N) AS ( SELECT N FROM ( VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
, NUMS(N) AS ( SELECT TOP(@NumberOfVowels) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6, T T7)
SELECT
@CHAR_BUCKET = SUBSTRING('aeiouy',ABS(CHECKSUM(NEWID()))%6+1,1)
FROM NUMS;
SET STATISTICS IO, TIME OFF;
Table 'syscolpars'. Scan count 1, logical reads 409, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 2, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 358 ms, elapsed time = 349 ms.
SQL Server Execution Times:
CPU time = 312 ms, elapsed time = 300 ms.
January 6, 2015 at 11:27 pm
Eirikur Eiriksson (1/6/2015)
While you'r at it, save yourself around 15% using an inline tally
Not to mention reads. That's why I keep a Tally function at the ready in every database.
For the CASE flaw, I like to think of it as a built in bug because the object of the test is evaluated for every WHEN. If something is non-deterministic, then it's very possible for the re-evaluations of the object test to miss all of the WHEN's and produce a NULL if there is no ELSE.
Of course, I'm happy if they don't change it. To me, it's not a fault. It's a feature that I can use when generating test data that needs some NULLs in the result.
I don't know because I don't use CHOOSE but it seems to me that it's probably just a case statement with a pretty wrapper on it.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 6, 2015 at 11:31 pm
jwiseh (1/6/2015)
Thank you , Jeff, that's a perfect workaround!Thank you for the rest of the replies as well. It's nice to know it's probably a bug in CHOOSE, and not me missing something... 😉
MS wouldn't classify it as a bug. Like what I said about CASE just above, they'd tell you that it's only guaranteed to work in the presence of deterministic operands. RAND is not deterministic.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 7, 2015 at 12:52 am
If something is non-deterministic, then it's very possible for the re-evaluations of the object test to miss all of the WHEN's and produce a NULL if there is no ELSE.
Even though in this case there are no circumstances where the index argument can produce a result which would force logic flow down the ELSE path?
I do not understand your argument this time Jeff.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 7, 2015 at 8:37 am
Phil Parkin (1/7/2015)
If something is non-deterministic, then it's very possible for the re-evaluations of the object test to miss all of the WHEN's and produce a NULL if there is no ELSE.
Even though in this case there are no circumstances where the index argument can produce a result which would force logic flow down the ELSE path?
I do not understand your argument this time Jeff.
Oh, the whole thing is lunacy. The T-SQL team decided some time back that the optimiser was king, and if it wanted to attempt to apply a function to an argument before the argument had been computed that was fine, no problem, regardless of what nonsense might result. So it isn't a bug, it's by design. That the design says that functions can be given any old crap instead of their arguments is generally swept under the carpet.
When CASE or CHOOSE is involved, of course, there's an argument in some cases for evaluating multiple possible paths before one knows which path is needed; that's done in many languages (and often control-flow languages are implemented by systems which do the same with IF statements when the hardware suits that approach), but T-SQL is the only language I know where it's permissable to take a decision to use the result of a particular path before the it's been determined which path is the correct one, and also the only language I know where paths which are logically impossible are included in "multiple possible paths". So I regard it as a bug. MS said years ago (in response to a Connect item) that they would maybe do something about it sometime; but nothing has happened.
Tom
January 8, 2015 at 5:46 pm
CHOOSE your poison:
WITH Tally (n) AS
(
SELECT TOP 20 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns
)
SELECT CHOOSE(n, 'a', 'e', 'i', 'o', 'u', 'y')
,SUBSTRING('aeiouy', n, 1)
FROM
(
SELECT n=1+ABS(CHECKSUM(NEWID()))%6
FROM Tally
) a;
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 8, 2015 at 10:00 pm
Phil Parkin (1/7/2015)
If something is non-deterministic, then it's very possible for the re-evaluations of the object test to miss all of the WHEN's and produce a NULL if there is no ELSE.
Even though in this case there are no circumstances where the index argument can produce a result which would force logic flow down the ELSE path?
I do not understand your argument this time Jeff.
Let's take a part of the code from the While Loop in the original post so that we return just one value.
DECLARE @b-2 INT;
SET @b-2 = CEILING(RAND()*6);
SELECT CHOOSE(@B,'a','e','i','o','u','y');
The problem with that is we can't actually tell what it's doing because we can't get an execution plan from such a "memory only" query. We can, however, trick it into thinking it's not memory only using TOP.
DECLARE @b-2 INT;
SET @b-2 = CEILING(RAND()*6);
SELECT TOP 1 CHOOSE(@B,'a','e','i','o','u','y');
Now we can look at the Compute Scalar in the execution plan and see the following in the Properties window...
[Expr1000] = Scalar Operator(
CASE WHEN [@B]=(1) THEN 'a'
ELSE CASE WHEN [@B]=(2) THEN 'e'
ELSE CASE WHEN [@B]=(3) THEN 'i'
ELSE CASE WHEN [@B]=(4) THEN 'o'
ELSE CASE WHEN [@B]=(5) THEN 'u'
ELSE CASE WHEN [@B]=(6) THEN 'y'
ELSE NULL
END END END END END END)
If we look at that, what can the value of @b-2 be at any point in that CASE query? The answer is that @b-2 can only be (sorry, pun) a single given value throughout that entire query. The RAND() function in the code IS indeterminate but a fixed value was assigned to @b-2 and that fixed value remains the same value throughout the CASE that CHOOSE is behind the scenes. So, the code works as expected and, as you say, you simply cannot fall all the way through to the NULL (whether there's an ELSE NULL or not).
That, notwithstanding, we do begin to see the actual problem. You would think that there'd be a single CASE with the correct number of WHENs, THENs, and the final ELSE but that's not the case (sorry. 'nuther pun). Instead, it's a cascading CASE where @b-2 is re-evaluated (same as "recalculated") for every occurrence of @B.
Now, let's see what happens if we put an indeterminate value in the place of @b-2 by using the first query of the original post. We need to use the TOP trick again to get it to cough up an execution plan...
SELECT TOP 1 CHOOSE(CEILING(RAND()*6),'a','e','i','o','u','y');
Again, looking at the Compute Scalar properties in the execution plan, we see how it really renders out...
[Expr1000] = Scalar Operator(
CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(1) THEN 'a'
ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(2) THEN 'e'
ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(3) THEN 'i'
ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(4) THEN 'o'
ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(5) THEN 'u'
ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(6.000000000000000e+000)),0)=(6) THEN 'y'
ELSE NULL
END END END END END END)
Where the fixed value of @b-2 used to be, we now have a formula and that formula is recalculated for each and every CASE. So, as just one of many possibilities...
1. If we end up with a value of "2" in the first CASE, 'a' will [font="Arial Black"]not [/font]be chosen.
2. Then if we end up with a value of "6" in the second CASE, 'e' will [font="Arial Black"]not [/font]be chosen.
3. Then if we end up with a value of "1" in the third CASE, 'i' will [font="Arial Black"]not [/font]be chosen.
4. Then if we end up with a value of "5" in the fourth CASE, 'o' will [font="Arial Black"]not [/font]be chosen.
5. Then if we end up with a value of "4" in the fifth CASE, 'u' will [font="Arial Black"]not [/font]be chosen.
6. Then if we end up with a value of "3" in the sixth CASE, 'y' will [font="Arial Black"]not [/font]be chosen and that causes us to fall through to the ELSE NULL.
This is all because of the cascading CASE functions which produces multiple instances of RAND() and each instance of RAND() is almost guaranteed to have a different value than all the other RAND()'s in the same query. Put another way, MS didn't assign the result of the indeterminate function to a runtime variable and, instead, recalculates the formula for each CASE/WHEN. You can see what happens there in the following simple example.
SELECT RAND(),RAND(),RAND(),RAND(),RAND(),RAND();
Results: 6 different RAND()s in the same query, 6 different results.
---------------------- ---------------------- ---------------------- ---------------------- ---------------------- ----------------------
0.92526870926772 0.569286021553846 0.446306211000083 0.208165465940019 0.886579991734558 0.0297250034007641
(1 row(s) affected)
The following won't work either because what you see is NOT what you get...
SELECT TOP 1 CASE CEILING(RAND()*6)
WHEN 1 THEN 'a'
WHEN 2 THEN 'e'
WHEN 3 THEN 'i'
WHEN 4 THEN 'o'
WHEN 5 THEN 'u'
WHEN 6 THEN 'y'
ELSE NULL
END
... because, behind the scenes, this is what the Compute Scalar renders out to be. Looks familiar?
[Expr1000] = Scalar Operator(
CASE WHEN ceiling(rand()*(6.000000000000000e+000))=(1.000000000000000e+000) THEN 'a'
ELSE CASE WHEN ceiling(rand()*(6.000000000000000e+000))=(2.000000000000000e+000) THEN 'e'
ELSE CASE WHEN ceiling(rand()*(6.000000000000000e+000))=(3.000000000000000e+000) THEN 'i'
ELSE CASE WHEN ceiling(rand()*(6.000000000000000e+000))=(4.000000000000000e+000) THEN 'o'
ELSE CASE WHEN ceiling(rand()*(6.000000000000000e+000))=(5.000000000000000e+000) THEN 'u'
ELSE CASE WHEN ceiling(rand()*(6.000000000000000e+000))=(6.000000000000000e+000) THEN 'y'
ELSE NULL
END END END END END END)
Again, the problem is that they didn't assign the results of the equation to a runtime variable and CASE, in certain instances, explodes to multiple CASES where the formula needs to be recalculated. If it's an indeterminate formula, then it's very likely that the CASE will fall through to the ELSE NULL.
And, to be sure, CHOOSE is nothing but a CASE function behind the scenes.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 9, 2015 at 12:47 am
Thank you Jeff, that is an outstandingly good explanation.
Thanks also for the TOP (n) trick to force the generation of a query plan. I wish I'd known that before.
For fun, I validated your findings by calculating the likelihood of a NULL being generated in this case. Assuming RAND()'s absolute integrity, it's (5/6)^6 = 0.335 or so.
declare @a int = 1;
declare @b-2 int;
declare @T table (Vowel char(1) null);
while @a <= 10000
begin
set @b-2 = ceiling(rand() * 6)
insert @T
(Vowel
)
select choose(ceiling(rand() * 6), 'a', 'e', 'i', 'o', 'u', 'y')
end;
select count(*) from @T t where t.Vowel is null
You will find that the count is around 33nn every time.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 15 posts - 1 through 15 (of 26 total)
You must be logged in to reply to this topic. Login to reply