October 20, 2011 at 1:29 pm
I'm currently using 2005 and hope to move to 2008 R2 soon. Having used UDFs (maybe inappropriately) I'm a little confused about the different types of UDF in BOL 2008 - terminology seems to have changed, perhaps other things too? I know people will say 'it depends', show me the tables/code, test alternatives, but there must be some general rules-of-thumb...
Scalar - simple: OK if just for some maths on parameter values, no referencing of tables.
Scalar - simple (single select): May perform poorly as will be in the calling SELECT list and hence called once per row. Similar to a correlated sub-query in the SELECT (with overhead?). Consider re-writing without the UDF (what I'm trying to do now) by 'pushing' the referenced tables from the UDF into the FROM.
Scalar - multi-line (multi-statement): As above, perhaps worse for optimizer?
Table-valued - in-line (single select): A parameterized view used in the calling FROM. OK.
Table-valued - multi-line (multi-statement): For when you would like to write
SELECT * FROM exec MyStoredProc(P1, P2). More difficult for the optimizer if used in a join in the calling FROM?
Am I on the right tracks here?
October 20, 2011 at 3:19 pm
Understand that the number of times the function is going to be called is dependent on whether or not it's correlated with the outer query (and some other more complex factors). If it is, then yes, it will be called once for each record in the outer table. If no, then it's basically a derived table encapsulated within a UDF for re-usability. Thus, in this case, the UDF is fired once only.
Also, remember that it's not necessarily bad to have a query (whether in a UDF or a subquery) execute for each record depending on the circumstance. If your outer query has a limited number of records and the records you need from the correlated query (your UDF) are spread over a large table it can be a great way to go.
Imagine that you have 20 rows in the outer query and you need an aggregation of 10-50 related records in another table (TableB) for each of the 20. If TableB is well indexed for what you need and has 50 million records, doing 20 separate seek operations to quickly grab the records you need will be better than solutions that would reduce the number of separate executions.
Where you really want to avoid this approach is where you're going to have a lot of rows in the outer query (thus you'll have a lot of executions) and/or if a simple join would be better in cases where you would end up scanning the same portions of the index (in which case you're just wasting time doing it multiple times) or if the index you need is small anyway (in which case a single scan of what you need could be faster than multiple smaller operations.)
**edit: This answer obviously isn't exhaustive and a lot of it depends on what exactly you're trying to do.**
October 21, 2011 at 2:09 am
Imagine that you have 20 rows in the outer query and you need an aggregation of 10-50 related records in another table (TableB) for each of the 20. If TableB is well indexed for what you need and has 50 million records, doing 20 separate seek operations to quickly grab the records you need will be better than solutions that would reduce the number of separate executions.
Thanks.
This is the scenario I've got, so will test it both ways - with and without UDF. I read somewhere that if your function is scalar and returning one value, there can be a performance gain from turning it into an inline table-valued function, so might try this too.
October 21, 2011 at 8:19 am
zapouk (10/21/2011)
This is the scenario I've got, so will test it both ways - with and without UDF. I read somewhere that if your function is scalar and returning one value, there can be a performance gain from turning it into an inline table-valued function, so might try this too.
True. If the function call is not correlated (using a column as parameter), it may be beneficial to execute it and assign the value to a variable, and use this variable instead of the function call in the rest of your code. The query optimizer is much better at making use of statistics when you compare against variables than against function results.
As for inline VS multi-statement, I've written a little blog post on it using system tables to demonstrate the difference in execution plans.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply