Re: User defined functions

  • Are there any disadvantages to using t-sql user defined functions? From a coding/maintainability perspective, I prefer to use them. However, I wasn't sure if I was taking a performance hit by using UDFs versus "cutting and pasting" code.

  • UDFs, in the same way as sprocs, reduce the compilation cost in T-SQL by caching the execution plan and saving them for repeated executions. You take a performance hit whenever you first compile the UDF.

    Likewise, CLR functions have an even larger performance advantage over T-SQL functions for complex computation, string manipulation, and business logic.

    Jeremiah Peschka
    Microsoft SQL Server MVP
    Managing Director - Brent Ozar PLF, LLC

  • Jeremiah is right on the compilation of the INNER process. That being said - a lot of functions tend to negatively impact the performance of the OUTER process (meaning - the query that uses the function). In particular - the functions will have a tendency to impact how effectively the query optimizer can use indexes, so what might have been a seek can turn into a scan in order to serialize calling the function one row at a time. This can have a dramatic effect on overall execution time (where the version with the function is many many times slower than the "direct" query).

    Unfortunately, from what I've seen, in most cases the penalty will outweigh any performance gains you might have had from the function gettting its own execution plan.

    It comes down to testing. Test it and test wisely.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (10/3/2008)


    Jeremiah is right on the compilation of the INNER process. That being said - a lot of functions tend to negatively impact the performance of the OUTER process (meaning - the query that uses the function). In particular - the functions will have a tendency to impact how effectively the query optimizer can use indexes, so what might have been a seek can turn into a scan in order to serialize calling the function one row at a time. This can have a dramatic effect on overall execution time (where the version with the function is many many times slower than the "direct" query).

    Unfortunately, from what I've seen, in most cases the penalty will outweigh any performance gains you might have had from the function gettting its own execution plan.

    It comes down to testing. Test it and test wisely.

    Are you saying that results may vary? If so, are there any rules of thumb for determining UDF's use? Or do I just need to try the stored procedure using the UDF definition and the UDF function call and and see if the UDF takes a huge performance hit?

    Right now, I'm only using it for table-valued UDFs.

  • I am trying to include some code in a stored proc that will look to see if a user defined function exists in the database and if not that will create it as part of a larger stored procedure. The reason I want to do this is that this database belongs to a third-party vendor and when we go through upgrades, it is possible that my custom code may be removed. So, I thought adding this at the top of my proc would help me. However, I am doing something wrong syntax-wise because it won't let me use it. Can someone help? The function itself works great. It's the piece where I try to see if it exists or not that seems to be wrong. Here's the code I'm trying to use:

    IF NOT EXISTS

    (select * from sys.all_objects

    where type_desc='SQL_SCALAR_FUNCTION'

    and name='fnConcatTest2')

    begin

    CREATE FUNCTION [dbo].[fnConcatTest2] (@eventID varchar(12))

    RETURNS VARCHAR(255)

    AS

    BEGIN

    DECLARE @return VARCHAR(255),

    @CurrentYear varchar(4),

    @CurrentTerm varchar(12)

    set @CurrentYear=(select setting from abt_settings where label_name='current_year')

    set @CurrentTerm=(select setting from abt_settings where label_name='current_term')

    SELECT @Return = ISNULL(@Return+'; ','')+ [day]+' '+RIGHT(CONVERT(VARCHAR,start_time),7) + ' - '+ RIGHT(CONVERT(VARCHAR,end_time),7) + ' '+Building_Code+' '+Room_ID

    from sectionschedule

    where event_Id+section=@eventID

    and academic_year=@CurrentYear

    and academic_term=@CurrentTerm

    RETURN @Return END

    end

    This is the error:Msg 156, Level 15, State 1, Line 6

    Incorrect syntax near the keyword 'FUNCTION'.

    Msg 137, Level 15, State 2, Line 18

    Must declare the scalar variable "@eventID".

    Msg 178, Level 15, State 1, Line 21

    A RETURN statement with a return value cannot be used in this context.

  • Are you saying that results may vary? If so, are there any rules of thumb for determining UDF's use? Or do I just need to try the stored procedure using the UDF definition and the UDF function call and and see if the UDF takes a huge performance hit?

    Right now, I'm only using it for table-valued UDFs.

    Essentially, yes, your mileage may vary. It's best to evaluate how a query will perform using a UDF both as a standalone query, as a sproc, and under heavy load.

    One of the bloggers on sqlblog.com reported that he created a CLR UDF that performed admirably when tested in isolation, and even under test loads. But when they exposed the CLR UDF to production level loads, performance was miserable to do marshalling data types.

    It ultimately comes down to test, test, and test some more.

    Jeremiah Peschka
    Microsoft SQL Server MVP
    Managing Director - Brent Ozar PLF, LLC

  • In order to do what you're describing, you'll have to use sp_executesql to execute your statement at run time. You might run into binding issues using a function that may not exist when the sproc is created.

    When a sproc is executed for the first time, the processor will read the entire text of the sproc and check that the name of the objects are present. If an object isn't present before the sproc is executed, the query processor will throw an error.

    Take a look a the BOL topic Deferred Name Resolution and Compilation .

    lduvall (10/3/2008)


    I am trying to include some code in a stored proc that will look to see if a user defined function exists in the database and if not that will create it as part of a larger stored procedure. The reason I want to do this is that this database belongs to a third-party vendor and when we go through upgrades, it is possible that my custom code may be removed. So, I thought adding this at the top of my proc would help me. However, I am doing something wrong syntax-wise because it won't let me use it. Can someone help? The function itself works great. It's the piece where I try to see if it exists or not that seems to be wrong. Here's the code I'm trying to use:

    IF NOT EXISTS

    (select * from sys.all_objects

    where type_desc='SQL_SCALAR_FUNCTION'

    and name='fnConcatTest2')

    begin

    CREATE FUNCTION [dbo].[fnConcatTest2] (@eventID varchar(12))

    RETURNS VARCHAR(255)

    AS

    BEGIN

    DECLARE @return VARCHAR(255),

    @CurrentYear varchar(4),

    @CurrentTerm varchar(12)

    set @CurrentYear=(select setting from abt_settings where label_name='current_year')

    set @CurrentTerm=(select setting from abt_settings where label_name='current_term')

    SELECT @Return = ISNULL(@Return+'; ','')+ [day]+' '+RIGHT(CONVERT(VARCHAR,start_time),7) + ' - '+ RIGHT(CONVERT(VARCHAR,end_time),7) + ' '+Building_Code+' '+Room_ID

    from sectionschedule

    where event_Id+section=@eventID

    and academic_year=@CurrentYear

    and academic_term=@CurrentTerm

    RETURN @Return END

    end

    This is the error:Msg 156, Level 15, State 1, Line 6

    Incorrect syntax near the keyword 'FUNCTION'.

    Msg 137, Level 15, State 2, Line 18

    Must declare the scalar variable "@eventID".

    Msg 178, Level 15, State 1, Line 21

    A RETURN statement with a return value cannot be used in this context.

    Jeremiah Peschka
    Microsoft SQL Server MVP
    Managing Director - Brent Ozar PLF, LLC

  • jlp3630 (10/3/2008)


    Matt Miller (10/3/2008)


    Jeremiah is right on the compilation of the INNER process. That being said - a lot of functions tend to negatively impact the performance of the OUTER process (meaning - the query that uses the function). In particular - the functions will have a tendency to impact how effectively the query optimizer can use indexes, so what might have been a seek can turn into a scan in order to serialize calling the function one row at a time. This can have a dramatic effect on overall execution time (where the version with the function is many many times slower than the "direct" query).

    Unfortunately, from what I've seen, in most cases the penalty will outweigh any performance gains you might have had from the function gettting its own execution plan.

    It comes down to testing. Test it and test wisely.

    Are you saying that results may vary? If so, are there any rules of thumb for determining UDF's use? Or do I just need to try the stored procedure using the UDF definition and the UDF function call and and see if the UDF takes a huge performance hit?

    Right now, I'm only using it for table-valued UDFs.

    Be VERY wary of TVF's. Table-valued Functions return Table Variables, which are very harsh on the query optimizer when you use them in JOIN or WHERE clauses. For one - table variable don't acquire statistics like "regular" tables, so the cardinality analysis will almost always be bad, so the optimizer will have a tendency to make bad decisions.

    Again - test it a lot. At full size, and under realistic prod conditions (like Jeremiah pointed out).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • jeremiah.peschka (10/3/2008)


    Essentially, yes, your mileage may vary. It's best to evaluate how a query will perform using a UDF both as a standalone query, as a sproc, and under heavy load.

    One of the bloggers on sqlblog.com reported that he created a CLR UDF that performed admirably when tested in isolation, and even under test loads. But when they exposed the CLR UDF to production level loads, performance was miserable to do marshalling data types.

    It ultimately comes down to test, test, and test some more.

    What criteria do you use to evaluate the best option? Examine the query execution plan and select the one with the lowest overall subtree cost? Or something else?

  • jlp3630 (10/3/2008)


    What criteria do you use to evaluate the best option? Examine the query execution plan and select the one with the lowest overall subtree cost? Or something else?

    Typically I look at a combination of subtree cost and, if I have a trace, disk I/O and CPU utilization for both the UDF and any calling sprocs as a whole. I suspect that I am less scientific than most data folks when it comes to this.

    Jeremiah Peschka
    Microsoft SQL Server MVP
    Managing Director - Brent Ozar PLF, LLC

  • What criteria do you use to evaluate the best option? Examine the query execution plan and select the one with the lowest overall subtree cost? Or something else?

    I personally avoid inline UDF's like the plague. What I mean by inline UDF's, is anything that would be called like:

    Select EmployeeID,

    dbo.fn_HoursWorked

    From tblEmployee

    Before I do anything like this, I ask myself if there is any way at all to write the query with set based logic instead of using that function, even if it requires a bit more time to create the stored procedure.

    In the above example, you could simply do:

    Select EmployeeID,

    SUM(Hours_worked) HoursWorked

    FROM tblEmployee

    WHERE (all the criteria you put in your function)

    Inline functions are not set based programming, and in my experience, can exponentially slow down your query. Joins based on UDF's can also get out of control extremely fast (ie. FROM tablea A INNER JOIN tableb B ON A.somevalue = dbo.fn_somecalculation(B.value) ).

    One of the biggest problems with these aforementioned methods, is that they can appear (even after a ton of testing) to run just fine. The query plan often doesn't show anything out of whack(because it's not recursively running that function the 300,000 times it's going to when you actually run it). The trace may not look horrible (depending on what you're viewing). They can even run fine for years in production. Then one day, your data store or server load may increase to the point that the function can no longer be run entirely in memory, and BAM... there goes your query. A query that used to take 2 minutes now takes 6 hours.

    If you have a function to pass a variable into... especially one that only manipulates the variable without selecting rows from a table, I find that they are usually just about as efficient, can save you time and make maintenance easier.

    For example:

    SET @Morning = dbo.fn_GetMorningOf(GETDATE())

    While this function is by no means necessary, and is possibly too simplistic to really warrant a function at all, it is only going to be run once, and will typically have a negligible impact on overall query time. At that point it's figuring out where you personally draw the line between ease of coding and optimization. If it takes you an extra half hour to improve something by 1ms, is it worth it? That's up to you and your particular situation.

    The biggest thing you want to ask yourself (in my opinion anyways, and I'm by no means as knowledgeable on this subject as a lot of other people on this board), is how many times this function could potentially run. If it's a very limited number of times, you're probably OK. If it's going to run once for every line in a query that returns 300,000 lines... it's time to figure out another way.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (10/3/2008)


    What criteria do you use to evaluate the best option? Examine the query execution plan and select the one with the lowest overall subtree cost? Or something else?

    I personally avoid inline UDF's like the plague. ...

    This is sage advice. I typically only use UDFs for things like GetFirstDayOfMonth or GetFirstDayOfFiscalYear. Frequently I do this to increase the readability of my code. GetFirstDayOfMonth looks a lot cleaner than DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)

    Jeremiah Peschka
    Microsoft SQL Server MVP
    Managing Director - Brent Ozar PLF, LLC

  • Garadin (10/3/2008)


    I personally avoid inline UDF's like the plague. What I mean by inline UDF's, is anything that would be called like:

    Select EmployeeID,

    dbo.fn_HoursWorked

    From tblEmployee

    Before I do anything like this, I ask myself if there is any way at all to write the query with set based logic instead of using that function, even if it requires a bit more time to create the stored procedure.

    In the above example, you could simply do:

    Select EmployeeID,

    SUM(Hours_worked) HoursWorked

    FROM tblEmployee

    WHERE (all the criteria you put in your function)

    Inline functions are not set based programming

    I was planning to use inline table functions to reduce the amount of code required for complicated joins.

    For example, I use the following query as part of a larger query in multiple stored procedures:

    SELECTset_id

    FROMtbl_pa_label_sets AS ls WITH (NOLOCK)

    WHEREfk_label_id IN (SELECT fk_label_id

    FROMtbl_pa_classification_labels AS cl WITH (NOLOCK)

    WHEREfk_classification_id = @classification_id)

    GROUP BY set_id

    HAVING COUNT(distinct fk_label_id) = (SELECTCOUNT(distinct fk_label_id)

    FROMtbl_pa_classification_labels AS cl WITH (NOLOCK)

    WHEREfk_classification_id = @classification_id))

    I was hoping to just replace the code with a table-valued function call. However, I wasn't sure what the implications were.

    If you have a function to pass a variable into... especially one that only manipulates the variable without selecting rows from a table, I find that they are usually just about as efficient, can save you time and make maintenance easier.

    For example:

    SET @Morning = dbo.fn_GetMorningOf(GETDATE())

    Would you use user defined functions for manipulations not involving a database table (i.e., string manipulation, input validation, etc.)

    The biggest thing you want to ask yourself (in my opinion anyways, and I'm by no means as knowledgeable on this subject as a lot of other people on this board), is how many times this function could potentially run. If it's a very limited number of times, you're probably OK. If it's going to run once for every line in a query that returns 300,000 lines... it's time to figure out another way.

    I don't understand what you mean. For table-valued user defined functions, would you only use it if the function would be called rarely in production?

  • Would you use user defined functions for manipulations not involving a database table (i.e., string manipulation, input validation, etc.)

    If I wanted to manipulate a single value, or a very small set of values each time I ran the SP I was putting it in, possibly. If I wanted to manipulate 2000 values every time I ran an SP (ie. formatting a phone number for output for all customer addresses), then No, I would not, as you have to call that function 2000 times, rather than just applying one logic path to 2000 rows.

    I was hoping to just replace the code with a table-valued function call. However, I wasn't sure what the implications were.

    If the database operations you are performing tend to target a single row at a time(can't tell if that query targets one row or many), whether or not you use functions will likely make little difference, because if you ever got to that tipping point I spoke of, it's probably going to break whether you are using functions or just using direct subselects.

    I don't understand what you mean. For table-valued user defined functions, would you only use it if the function would be called rarely in production?

    The key is how many times the function will have to be called for each call to the stored procedure. If you call a stored procedure once, and it calls the function 500 times... that's no good. If you call the stored procedure once, and it calls the function once, then the function vs. hardcoding likely won't have any major impact on you.

    If you're talking about using a function with RETURNS TABLE, then I'm not really sure how that will affect you, as I've never really utilized that functionality.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • jeremiah.peschka (10/3/2008)


    Likewise, CLR functions have an even larger performance advantage over T-SQL functions for complex computation, string manipulation, and business logic.

    With the exception of RegExReplace, many of us have been able to beat the tar out of every CLR that people think they need. CLR's have the same and more overhead than a T-SQL function and both are frequently abused. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply