I’ve read a lot of things lately pointing to scalar functions as if they were the devil. In this blog I’m going to explore if that’s the case. Let’s have a look.
It’s true that in many situations a scalar function is often a performance bottleneck; but, is there a situation where they could be responsibly used?
What if you had a lookup table that almost never changed? Is it worth doing a join on the lookup to get the data you need?
Let’s examine a simple join between a customer address and a state lookup table.
We have the following tables
To join these you’d usually use a very simple inner join to get the state code, which could also be the state name which is longer than the 4 bytes for int. You may also include other metrics on the States table like geo data or capitol. For now let’s pretend like Abbr is much larger than State (int) so it would make sense to dump it to a lookup table.
Here’s what the join query may look like
To test the value of scalar functions we created the the following code (note that it cuts off but you get the point)
Now that we have a function to test with, we want to create a query that returns the same results as our join query. Let’s have a look.
How do these compare with an actual plan?
Wow! The query with the function has much less cost! Incredible results, right? Not so fast! Sometimes cost can be misleading. To perform a true test we’ll need to do a bit more. Let’s turn off the plan and turn on STATISTICS TIME.
206 ms total elapsed time for the join query.
Only 92 ms for the funtion query. That’s good!
Let’s examine the query stats to see how these two compare.
Here we find some unexpected results. The execution time was less for this single run; however, CPU was greater by a significant amount.
We should dig deeper. You may have not noticed in the query plans that the function isn’t represented in the plan at all. Note that this is SQL Server 2017 and the database is running in the latest compatibility level.
To find out more we’re going to look at the function stats DMV.
SELECT db_name(database_id) as DB,
object_name(object_id) as fn,type_desc,
last_execution_time,
execution_count,
total_elapsed_time,
total_worker_time,
total_logical_reads,
total_physical_reads,
total_logical_writes
FROM sys.dm_exec_function_stats
What we find is the function was executed 192 times. This is because the function must run once for each row in the result and it’s primarily this reason why scalar functions aren’t usually a great choice.
One final test is to run the queries 100 times. To do this, we’re going to use GO 100.
What have we learned? While the function may look great in the query plan it doesn’t stack up to actual runtime metrics. If there’s one thing a DBA should always say it’s “It depends” but in this case it doesn’t and you should instead add this quote to your toolbox, “You should never fully trust cost in query plans”.