One new thing that SQL Server 2016 has added is the ability to natively compile user-defined functions. Previously, native compilation, part of the hekaton feature, was limited to stored procedures.
When I saw that, the first question that came to mind is whether natively compiling a scalar function reduces the overhead when calling that function within another query. I’m not talking about data-accessing scalar UDFs, since natively compiled functions can only access in-memory tables, but functions that do simple manipulation of the parameters passed in. String formatting, for example, or date manipulation.
While not as harmful as data-accessing scalar UDFs, there’s still overhead as these are not inline functions, they’re called for each row in the resultset (as a look at the Stored Procedure Completed XE event would show), and the call to the function takes time. Admittedly not a lot of time, but when it’s on each row of a large resultset the total can be noticeable.
I decided to do a simple test. A query against a table with ~600k rows, one query with a traditional scalar function, one with a natively compiled function and one with the function’s contents in the query as a column.
The two functions:
CREATE FUNCTION dbo.DateOnly (@Input DATETIME) RETURNS DATETIME AS BEGIN RETURN DATEADD(dd, DATEDIFF (dd, 0, @Input), 0); END GO
and
CREATE FUNCTION dbo.DateOnlyHekaton (@Input DATETIME) RETURNS DATETIME WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'English') RETURN DATEADD(dd,DATEDIFF(dd,0,@Input),0); END GO
And the query without the function would be a simple
SELECT DATEADD(dd, DATEDIFF(dd, 0, SomeDate), 0) FROM SomeTable;
Durations and CPU usage were caught with Extended Events. I ran each query 25 times and aggregated the results.
Average CPU (ms) | Average Duration (ms) | |
In-line expression | 289 | 294 |
Normal function | 3555 | 3814 |
Natively Compiled Function | 3318 | 3352 |
Not quite what I hoped. While the overhead of the natively compiled function is lower, it’s lower only by about 10%, which really is not worth it, now when we’re talking about an order of magnitude difference from the query without the function call.
Looks like the guidance is still going to be to not use scalar UDFs within other queries.