Functions with internal Sorting - is it cached?

  • I have to update values from table with the returned value from a user function.  Within the user function, it does an ORDER BY. 

    Question:  The function sorts a very large table, so does the sort get re-executed for each update?

    Here's the paraphrased versions of my UPDATE statement and function:

    --------

    UPDATE dbo.myOtherTable

    SET myValue = foo(x.ID1, x.ID2)

    FROM myOtherTable x

    WHERE x.ID2 IN (SELECT someID FROM myThirdTable)

    --------

    CREATE FUNCTION foo(@a int, @b-2 int)

    RETURNS VARCHAR(10) AS

    BEGIN

     DECLARE @value VARCHAR(10)

     SELECT @value = COALESCE(@value + ';', '') + CONVERT(VARCHAR(10), myTable.ID)

     FROM myTable

     ORDER BY myTable.ID

     RETURN @value

    END

     

     

    Thanks in advance!

    Thuan

  • Yes, it does.


    N 56°04'39.16"
    E 12°55'05.25"

  • You could create a covering index on the columns used by the UDF and have the ordered column as the first in the index - this would at least speed things up.  Look at the execution plan.

  • It seems that the function returns the same string for every row. If that is the case, store the function string before and update later, as

    DECLARE @result VARCHAR(10)

    SELECT @result = foo(x.ID1, x.ID2)

    UPDATE dbo.myOtherTable

    SET myValue = @result

    FROM myOtherTable x

    WHERE x.ID2 IN (SELECT someID FROM myThirdTable)

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Presumably in the actual function, the input params limit the records? You can't have more than 4000 records per call to the function - if you ever did, it would break.

    So you won't be sorting the whole table each time, just the relevant subset of records. For indexing, the column(s) used in the where clause is most important, and should be the leading column.

    But I would avoid storing this value anyway (it's not even 1nf). Why do you want to do it? If you really must (and I don't see why you would need to), you could consider using a trigger to populate the column.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Guess I'm being a bit picky here... where does @a or @b-2 parameters of the function get used in the function?  I'm thinking the function posted has some other problems... unless it was just a really bad copy/paste...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yeah, the code I posted wasn't exact, and only included the "meat" of the code.

    Thanks for comments. I didn't know about the 4000 record limitation. I'll have to take a closer look at the data. I'm jumping into a new DBA job, and looking at a long running nightly process.

    So would you recommend presorting the data into a staging table, and then use that to update from? The execution does show the sort, but I didn't know if the optimizer was smart enough to keep it sorted in memory.

  • You are right that each time the function is called, it will undertake a sort operation.

    I'll reiterate. The sort process involves only a small number of 'child' records (as restricted by the input params in the real function). Even if each data value were 1 char in length, when commas are taken into account, that only allows 4000 values. It's not sorting the whole table. 

    The more central issue is that because you want an ordered list, you can't do this process in a single UPDATE statement, and instead have to use a function call per row. I would expect this to cause repeated passes through the target index or heap, searching for the child records before sorting them and reading them into a string. Of these operations the most expensive is likely to be searching for the child records. So your covering index on the target table should include the search column(s) as leading columns, then the sort column. Ordering by the sort column without taking into account which subsets of records are actually going to be sorted would not be particularly useful in any case. I don't think SQL Server necessarily preserves row order when fetching a subset of rows from an index/indexed table.

    Using a staging table isn't necessary.

    I don't like nightly processes for maintaining denormalised data, but as long as it is clear to everyone involved that the data may not be current, and if there are compelling reasons (a) for storing this data at all, and (b) for not using a trigger to maintain it transactionally in real (ordinal) time, then I suppose I'll have to try and forget that it's out there, struggling away every night, always lagging behind and never finished, like a digital Sisyphus...

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply