User Defined Functions Investigation

  • "It wasn't by the 17 times that Carl claimed (I never do trust CPU times, just elapsed times)"

    Actually, the the test results were:

    In a two minute duration:

    1. Execution counts was 3,160 for the Inline SQL versus 2,417 for the UDF. This equates to at 24% drop in thruput.

    2. Average Cpu usage in milliseconds was 1.72 for the Inline SQL was versus 29.642 for the UDF. This equates to a 17 times more CPU resource needed.

    This means that the average duration is based how many SQL statements can be executed in 2 minutes (120000 milliseconds). The average duration is then:

    Inline SQL 37.89 milliseconds = 120000 / 3160

    UDF was 49.648 milliseconds = 120000 / 2417

    Based on 830 rows, the additional increase in duration due to this UDF is .0141662 milliseconds per row ( 49.648 - 37.89 ) / 830

    This does NOT mean that if you use UDF, there will be about a 24% drop in performance.

    Take a case where a SQL statement returns 10 rows and runs in 100 milliseconds. After adding the UDF, this adds about 1/10 of a milliseconds to the duration, which is an almost undetectable amount.

    What can happen is the use of UDFs can cause a CPU bottleneck. As the load on the system increases, SQL statements will start to wait for CPU resources, the wait time will start to exceed the execution times and the final result in a short term increase in the duration.

    A typical senario is that the users complain of poor response time. By the time you logon and look at the work-load, the problem is gone. A few days later, the problem occurs again, and you cannot find the cause.

    Here is what can happen:

    You have 100 users and they each execute a SQL statement on the average of once per minute. Using Gaussian distribution (a bell curve), you could calculate the probability that all users will execute some SQL statement within the same one second period. When this happens, some resource, such as CPU, becomes a bottleneck, the sql statments backup, and there is a short term decreae in performance.

    Another case is were there a SQL statement that uses significant resources but has a low execution frequency. When the SQL statement runs, some resource becomes a bottleneck and there is a short term increase in duration for all of the other SQL statements running.

    SQL = Scarcely Qualifies as a Language

  • Ok, so does that short-term bottleneck stuff explain why my latest test consistently showed that the UDF code was twice as slow as the clear code regardless of the number of rows in the table? On a single cpu with only 1 user and no network connections?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BWAAAA-HAAAAA-HAAAAAA-HAAAAA!   I just ran the same set of tests on a 4 processor box "real server" with minor traffic for 2,000,000 records using my previous test code and the differences between the UDF code and the clear code are hardly worth talking about (less than 300 milliseconds difference on each 2Meg run) but the interesting thing is that sometimes the UDF code beat the clear code by that 300 milliseconds.  I can't wait to see what it does does on the big 16 processor box!

    Soooooo, kids.... I'm thinking that UDF's aren't that bad at all!  Certainly the statement that they are "always harmful" is has been refuted once again.  And, the development time saved by using well written UDF's and the readability gained for troubleshooting purposes certainly makes them well worth using.

    Have fun... I'm all done with this thread.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 2 Carl.

    another test for your article.

    select O.ID, O.name

    from dbo.sysobjects  O, sysobjects O1

    GROUP BY O.ID, O.name

    -- run time < 1 sec.

    select object_id(O.name) as ID_, O.name

    from dbo.sysobjects O, sysobjects O1

    GROUP BY object_id(O.name), O.name

    -run time > 1 min

    Consideration: never use function OBJECT_ID().

    Smart enough?

    _____________
    Code for TallyGenerator

  • Sergiy, I'm beginning to think that there may be something wrong with the system you're running these tests on.

    I ran each of these statements against the master database on two seperate servers. Both servers are dual 3.4Ghz with 2GB Ram.

    Both statements returned in under 1 second.

     

    --------------------
    Colt 45 - the original point and click interface

  • Seriously???? This function uses a select to get the data. It CANNOT be faster or even the same speed as the query that fetch the info without doing ANYTHING extra.

  • Are you sure?

    Probably my server has only 1GB of RAM but it should not make such a big difference.

    Did you join second sysobjects?

     

    _____________
    Code for TallyGenerator

  • Another typical task:

    WEB application supplies comma delimited list of IDs.

    You need to select rows with IDs mentioned in this list.

    Sounds familiar?

    IF EXISTS (SELECT * FROM sysobjects WHERE name = N'ListNumericShort')

     DROP FUNCTION ListNumericShort

    GO

    CREATE FUNCTION dbo.ListNumericShort

     (@List varchar(8000),

      @Delim char(1))

    RETURNS @Values TABLE

     (NumValue numeric)

    AS

    /**

     * This function parses a string with large set of IDs and returns them

     * as the table.  Duplicate values are eliminated by this function.

     */

    BEGIN

    DECLARE @Value varchar(50)

    DECLARE @LeftPointer smallint

    DECLARE @RightPointer smallint

    SET @LeftPointer = 1

    WHILE @RightPointer < LEN(@List) OR @RightPointer IS NULL

     BEGIN

      SELECT @RightPointer = CHARINDEX(@Delim, @List, @LeftPointer)

     

      IF @RightPointer = 0

       SET @RightPointer = LEN(@List)+1

     

      SET @Value = SUBSTRING(@List, @LeftPointer, @RightPointer-@LeftPointer)

      SELECT @Value = REPLACE(REPLACE(REPLACE(@Value, char(13), ''), char(10), ''), char(9), '')

      IF ISNUMERIC(@Value) = 1

      IF CHARINDEX('.', @Value) = 0 and CHARINDEX('$', @Value) = 0

       IF not exists (select 1 from @Values where NumValue = @Value)

        INSERT INTO @Values (NumValue)

        SELECT @Value

     

      SET @LeftPointer = @RightPointer + 1

     END

    RETURN

    END

    GO

    Now for testing I use system tables again:

    declare @OBJIDList varchar(8000)

    -- creating string with set of IDs

    select @OBJIDList = ISNULL(@OBJIDList + ',', '') + cast(id as varchar(10))

    from sysobjects

    where id%5 = 0

    -- just to pick some subset of IDs from the table

    --print @OBJIDList

    SELECT id

    FROM sysobjects O

    INNER JOIN dbo.ListNumericShort (@OBJIDList, ',') L on O.Id = L.NumValue

    Can you provide any competitive solution?

    If you don't have big enough number of objects in DB, use syscolumns instead of sysobject.

    _____________
    Code for TallyGenerator

  • This is the function I use for splitting values. It takes advantage of a "Numbers" table to provide a set-based query. I suppose the actual query could be used as a derived table and thus do away with the function, but in this instance I like the function for code maintainability.

    CREATE FUNCTION dbo.udf_SplitChar (
     @vcrArray varchar(8000)
     , @chrDelim char(1)
    )
    RETURNS @Results TABLE (
     Items varchar(8000)
    )
    AS
    BEGIN
     DECLARE @Array TABLE (Items varchar(8000))
     INSERT INTO @Array VALUES ( @vcrArray )
     INSERT INTO @Results(
      Items
      )
     SELECT
      SUBSTRING(Items, n, CHARINDEX(@chrDelim, Items + @chrDelim, n) - n) AS [val]
     FROM @Array 
      JOIN DBA.dbo.Nums ON n <= LEN(Items)
       AND SUBSTRING(@chrDelim + Items, n, 1) = @chrDelim
     ORDER BY 
      n - LEN(REPLACE(LEFT(Items, n), @chrDelim, '')) + 1  
     RETURN
    END

     

    --------------------
    Colt 45 - the original point and click interface

  • Why do you use the table function version instead of the inline version??

  • 2 Philicart

    I cannot understand why you use @Array table in your UDF. I changed it this way:

    IF EXISTS (SELECT * FROM sysobjects WHERE name = N'udf_SplitChar')

     DROP FUNCTION udf_SplitChar

    GO

    CREATE FUNCTION dbo.udf_SplitChar  

     ( @vcrArray varchar(8000) , @chrDelim char(1))

    RETURNS @Results TABLE

     (Items varchar(8000))

    AS

    BEGIN

     INSERT INTO @Results (Items)

     SELECT  SUBSTRING(@vcrArray, n, CHARINDEX(@chrDelim, @vcrArray + @chrDelim, n) - n) AS [val]

     FROM dbo.Nums

     WHERE n <= LEN(@vcrArray) AND SUBSTRING(@chrDelim + @vcrArray, n, 1) = @chrDelim

     ORDER BY n - LEN(REPLACE(LEFT(@vcrArray, n), @chrDelim, '')) + 1

     RETURN

    END

     Works as good as yours option but a little faster.

    The point I use my option is cleaning up the string before processing.

    But actually me and you are standing on the same side: we are using UDF.

    Carl Federl declared that UDF is always bad for performance. So I asked for "non-UDF" competitive solution for this task.

    _____________
    Code for TallyGenerator

  • Sergiy, Hmmm ... the @Array table could have been some bits I didn't cleanup from an earlier revision, the function definitely doesn't need it.

    "But actually me and you are standing on the same side: we are using UDF.

    Carl Federl declared that UDF is always bad for performance. So I asked for "non-UDF" competitive solution for this task."

    Actually, I agree with Carl, in some cases UDF's are very bad for performance. So as a rule you should avoid using functions unless the complexity of what you're trying to achieve out-weighs the performance impact.

    The function I provided does show that using a function isn't really necessary. As I mentioned in the previous post, "the actual query could be used as a derived table and thus do away with the function".

    RGR'us, I use the table function to return a table Basically the only reason I have the function at all is because it's called in a few dozen places across 5-6 different databases. If I only had a couple of instances where this was required I'd include the query as a derived table.

    --------------------
    Colt 45 - the original point and click interface

  • Key words "some cases".

    That make a difference between your statement and the one from Carl.

    _____________
    Code for TallyGenerator

  • But I think more in line with what Carl is trying to produce,

    "So as a rule you should avoid using functions"

     

    --------------------
    Colt 45 - the original point and click interface

  • As well as views.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 31 through 45 (of 52 total)

You must be logged in to reply to this topic. Login to reply