In a previous blog entry (Inconsistent NullIf behaviour), Jeff Moden and Dan Halliday(twitter) both made a very similar observation.
When executing this code
- select top(10) abs(checksum(newid()))%4
- from sys.objects
the only possible values returned can be 0,1,2 or 3. Anything else is a mathematical impossibility. Abs(checksum(new())) will return a ‘random’ positive integer and the modulus (%) operator will returned the remainder after the division. So that being the case, how case this code return NULL values.
- select top(10)
- case abs(checksum(newid()))%4
- when 0 then 0
- when 1 then 1
- when 2 then 2
- when 3 then 3 end
- from sys.objects
Does that mean that SQLServer is mathematically flawed ? Should we avoid modulus ? Once again we need to see what SQLServer is ACTUALLY executing. Within the execution plan, the defined values for the ‘Compute Scalar’ operator is shown below.
Which when applied as sql code would be
- Select top (10)
- CASE WHEN abs(checksum(newid()))%(4)=(0) THEN (0) ELSE
- CASE WHEN abs(checksum(newid()))%(4)=(1) THEN (1) ELSE
- CASE WHEN abs(checksum(newid()))%(4)=(2) THEN (2) ELSE
- CASE WHEN abs(checksum(newid()))%(4)=(3) THEN (3) ELSE
- NULL END END END END)
- from sys.objects
Now its pretty clear where the NULL values are coming from. Some people have suggested that the real problem is that newid() is non-deterministic, and as such gives a different value on each execution. I would disagree with that point of view, newid() should be non-deterministic.
What is the real world value of this knowledge ? So what if you cant use a newid() in a case statement , does that really matter ? When put like that, I would have to agree that, no not really, but let us now expand upon this.
Let us create a simplistic function in AdventureWorks
- Create Function fnGetOrderDate(@SalesOrderId integer)
- returns date
- with schemabinding
- as
- begin
- declare @OrderDate smalldatetime
- select @OrderDate = OrderDate
- from sales.SalesOrderHeader
- where SalesOrderID = @SalesOrderId
- return @OrderDate
- end
Then execute the following code
- select case datepart(MONTH,dbo.fnGetOrderDate(45038))
- when 1 then 'Jan'
- when 2 then 'Feb'
- when 3 then 'Mar'
- when 4 then 'Apr'
- when 5 then 'May'
- when 6 then 'Jun'
- when 7 then 'Jul'
- when 8 then 'Aug'
- when 9 then 'Sept'
- when 10 then 'Oct'
- when 11 then 'Nov'
- when 12 then 'Dec' end
with a trace on SP:StatementCompleted and SQL:BatchCompleted , we will then see this.
But what will happen when we change the passed SalesOrderId to a different value , try with 43659.
Now, this is probably saying more about inappropriate use of a scalar user defined function than the case statement itself , but it highlights how performance can be damaged with the combination of the two.