Functions with internal Sorting - is it cached?

Question

Post reply

Functions with internal Sorting - is it cached?

caffeinated

Ten Centuries

Points: 1271
More actions
July 3, 2006 at 6:50 pm

#153068

I have to update values from table with the returned value from a user function. Within the user function, it does an ORDER BY.
Question: The function sorts a very large table, so does the sort get re-executed for each update?
Here's the paraphrased versions of my UPDATE statement and function:
--------
UPDATE dbo.myOtherTable
SET myValue = foo(x.ID1, x.ID2)
FROM myOtherTable x
WHERE x.ID2 IN (SELECT someID FROM myThirdTable)
--------
CREATE FUNCTION foo(@a int, @b-2 int)
RETURNS VARCHAR(10) AS
BEGIN
DECLARE @value VARCHAR(10)
SELECT @value = COALESCE(@value + ';', '') + CONVERT(VARCHAR(10), myTable.ID)
FROM myTable
ORDER BY myTable.ID
RETURN @value
END

Thanks in advance!
Thuan

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 1

SwePeso

SSC-Dedicated

Points: 39757

July 4, 2006 at 12:18 am

#647311

Yes, it does.

N 56°04'39.16"
E 12°55'05.25"

Ian Yates SSCoach Points: 19738 More actions · Answer 2

You could create a covering index on the columns used by the UDF and have the ordered column as the first in the index - this would at least speed things up. Look at the execution plan.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 3

It seems that the function returns the same string for every row. If that is the case, store the function string before and update later, as

DECLARE @result VARCHAR(10)

SELECT @result = foo(x.ID1, x.ID2)

UPDATE dbo.myOtherTable

SET myValue = @result

FROM myOtherTable x

WHERE x.ID2 IN (SELECT someID FROM myThirdTable)

N 56°04'39.16"
E 12°55'05.25"

stax68 SSChampion Points: 11711 More actions · Answer 4

Presumably in the actual function, the input params limit the records? You can't have more than 4000 records per call to the function - if you ever did, it would break.

So you won't be sorting the whole table each time, just the relevant subset of records. For indexing, the column(s) used in the where clause is most important, and should be the leading column.

But I would avoid storing this value anyway (it's not even 1nf). Why do you want to do it? If you really must (and I don't see why you would need to), you could consider using a trigger to populate the column.

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Jeff Moden SSC Guru Points: 1004365 More actions · Answer 5

Guess I'm being a bit picky here... where does @a or @b-2 parameters of the function get used in the function? I'm thinking the function posted has some other problems... unless it was just a really bad copy/paste...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

caffeinated Ten Centuries Points: 1271 More actions · Answer 6

Yeah, the code I posted wasn't exact, and only included the "meat" of the code.

Thanks for comments. I didn't know about the 4000 record limitation. I'll have to take a closer look at the data. I'm jumping into a new DBA job, and looking at a long running nightly process.

So would you recommend presorting the data into a staging table, and then use that to update from? The execution does show the sort, but I didn't know if the optimizer was smart enough to keep it sorted in memory.

stax68 SSChampion Points: 11711 More actions · Answer 7

You are right that each time the function is called, it will undertake a sort operation.

I'll reiterate. The sort process involves only a small number of 'child' records (as restricted by the input params in the real function). Even if each data value were 1 char in length, when commas are taken into account, that only allows 4000 values. It's not sorting the whole table.

The more central issue is that because you want an ordered list, you can't do this process in a single UPDATE statement, and instead have to use a function call per row. I would expect this to cause repeated passes through the target index or heap, searching for the child records before sorting them and reading them into a string. Of these operations the most expensive is likely to be searching for the child records. So your covering index on the target table should include the search column(s) as leading columns, then the sort column. Ordering by the sort column without taking into account which subsets of records are actually going to be sorted would not be particularly useful in any case. I don't think SQL Server necessarily preserves row order when fetching a subset of rows from an index/indexed table.

Using a staging table isn't necessary.

I don't like nightly processes for maintaining denormalised data, but as long as it is clear to everyone involved that the data may not be current, and if there are compelling reasons (a) for storing this data at all, and (b) for not using a trigger to maintain it transactionally in real (ordinal) time, then I suppose I'll have to try and forget that it's out there, struggling away every night, always lagging behind and never finished, like a digital Sisyphus...

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant