User Defined Functions Investigation

Question

Post reply

User Defined Functions Investigation

Viewing 15 posts - 31 through 45 (of 52 total)

You must be logged in to reply to this topic. Login to reply

Carl Federl One Orange Chip Points: 25384 More actions · Answer 1

"It wasn't by the 17 times that Carl claimed (I never do trust CPU times, just elapsed times)"

Actually, the the test results were:

In a two minute duration:

1. Execution counts was 3,160 for the Inline SQL versus 2,417 for the UDF. This equates to at 24% drop in thruput.

2. Average Cpu usage in milliseconds was 1.72 for the Inline SQL was versus 29.642 for the UDF. This equates to a 17 times more CPU resource needed.

This means that the average duration is based how many SQL statements can be executed in 2 minutes (120000 milliseconds). The average duration is then:

Inline SQL 37.89 milliseconds = 120000 / 3160

UDF was 49.648 milliseconds = 120000 / 2417

Based on 830 rows, the additional increase in duration due to this UDF is .0141662 milliseconds per row ( 49.648 - 37.89 ) / 830

This does NOT mean that if you use UDF, there will be about a 24% drop in performance.

Take a case where a SQL statement returns 10 rows and runs in 100 milliseconds. After adding the UDF, this adds about 1/10 of a milliseconds to the duration, which is an almost undetectable amount.

What can happen is the use of UDFs can cause a CPU bottleneck. As the load on the system increases, SQL statements will start to wait for CPU resources, the wait time will start to exceed the execution times and the final result in a short term increase in the duration.

A typical senario is that the users complain of poor response time. By the time you logon and look at the work-load, the problem is gone. A few days later, the problem occurs again, and you cannot find the cause.

Here is what can happen:

You have 100 users and they each execute a SQL statement on the average of once per minute. Using Gaussian distribution (a bell curve), you could calculate the probability that all users will execute some SQL statement within the same one second period. When this happens, some resource, such as CPU, becomes a bottleneck, the sql statments backup, and there is a short term decreae in performance.

Another case is were there a SQL statement that uses significant resources but has a low execution frequency. When the SQL statement runs, some resource becomes a bottleneck and there is a short term increase in duration for all of the other SQL statements running.

SQL = Scarcely Qualifies as a Language

Jeff Moden SSC Guru Points: 1004433 More actions · Answer 2

Ok, so does that short-term bottleneck stuff explain why my latest test consistently showed that the UDF code was twice as slow as the clear code regardless of the number of rows in the table? On a single cpu with only 1 user and no network connections?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004433 More actions · Answer 3

BWAAAA-HAAAAA-HAAAAAA-HAAAAA! I just ran the same set of tests on a 4 processor box "real server" with minor traffic for 2,000,000 records using my previous test code and the differences between the UDF code and the clear code are hardly worth talking about (less than 300 milliseconds difference on each 2Meg run) but the interesting thing is that sometimes the UDF code beat the clear code by that 300 milliseconds. I can't wait to see what it does does on the big 16 processor box!

Soooooo, kids.... I'm thinking that UDF's aren't that bad at all! Certainly the statement that they are "always harmful" is has been refuted once again. And, the development time saved by using well written UDF's and the readability gained for troubleshooting purposes certainly makes them well worth using.

Have fun... I'm all done with this thread.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 4

2 Carl.

another test for your article.

select O.ID, O.name

from dbo.sysobjects O, sysobjects O1

GROUP BY O.ID, O.name

-- run time < 1 sec.

select object_id(O.name) as ID_, O.name

from dbo.sysobjects O, sysobjects O1

GROUP BY object_id(O.name), O.name

-run time > 1 min

Consideration: never use function OBJECT_ID().

Smart enough?

_____________
Code for TallyGenerator

philcart SSC-Forever Points: 47796 More actions · Answer 5

Sergiy, I'm beginning to think that there may be something wrong with the system you're running these tests on.

I ran each of these statements against the master database on two seperate servers. Both servers are dual 3.4Ghz with 2GB Ram.

Both statements returned in under 1 second.

--------------------
Colt 45 - the original point and click interface

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

Seriously???? This function uses a select to get the data. It CANNOT be faster or even the same speed as the query that fetch the info without doing ANYTHING extra.

Sergiy SSC Guru Points: 110208 More actions · Answer 7

Are you sure?

Probably my server has only 1GB of RAM but it should not make such a big difference.

Did you join second sysobjects?

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110208 More actions · Answer 8

Another typical task:

WEB application supplies comma delimited list of IDs.

You need to select rows with IDs mentioned in this list.

Sounds familiar?

IF EXISTS (SELECT * FROM sysobjects WHERE name = N'ListNumericShort')

DROP FUNCTION ListNumericShort

GO

CREATE FUNCTION dbo.ListNumericShort

(@List varchar(8000),

@Delim char(1))

RETURNS @Values TABLE

(NumValue numeric)

AS

/**

* This function parses a string with large set of IDs and returns them

* as the table. Duplicate values are eliminated by this function.

*/

BEGIN

DECLARE @Value varchar(50)

DECLARE @LeftPointer smallint

DECLARE @RightPointer smallint

SET @LeftPointer = 1

WHILE @RightPointer < LEN(@List) OR @RightPointer IS NULL

BEGIN

SELECT @RightPointer = CHARINDEX(@Delim, @List, @LeftPointer)

IF @RightPointer = 0

SET @RightPointer = LEN(@List)+1

SET @Value = SUBSTRING(@List, @LeftPointer, @RightPointer-@LeftPointer)

SELECT @Value = REPLACE(REPLACE(REPLACE(@Value, char(13), ''), char(10), ''), char(9), '')

IF ISNUMERIC(@Value) = 1

IF CHARINDEX('.', @Value) = 0 and CHARINDEX('$', @Value) = 0

IF not exists (select 1 from @Values where NumValue = @Value)

INSERT INTO @Values (NumValue)

SELECT @Value

SET @LeftPointer = @RightPointer + 1

END

RETURN

END

GO

Now for testing I use system tables again:

declare @OBJIDList varchar(8000)

-- creating string with set of IDs

select @OBJIDList = ISNULL(@OBJIDList + ',', '') + cast(id as varchar(10))

from sysobjects

where id%5 = 0

-- just to pick some subset of IDs from the table

--print @OBJIDList

SELECT id

FROM sysobjects O

INNER JOIN dbo.ListNumericShort (@OBJIDList, ',') L on O.Id = L.NumValue

Can you provide any competitive solution?

If you don't have big enough number of objects in DB, use syscolumns instead of sysobject.

_____________
Code for TallyGenerator

philcart SSC-Forever Points: 47796 More actions · Answer 9

This is the function I use for splitting values. It takes advantage of a "Numbers" table to provide a set-based query. I suppose the actual query could be used as a derived table and thus do away with the function, but in this instance I like the function for code maintainability.

CREATE FUNCTION dbo.udf_SplitChar (
 @vcrArray varchar(8000)
 , @chrDelim char(1)
)
RETURNS @Results TABLE (
 Items varchar(8000)
)
AS

BEGIN

 DECLARE @Array TABLE (Items varchar(8000))
 INSERT INTO @Array VALUES ( @vcrArray )

 INSERT INTO @Results(
  Items
  )
 SELECT
  SUBSTRING(Items, n, CHARINDEX(@chrDelim, Items + @chrDelim, n) - n) AS [val]
 FROM @Array 
  JOIN DBA.dbo.Nums ON n <= LEN(Items)
   AND SUBSTRING(@chrDelim + Items, n, 1) = @chrDelim
 ORDER BY 
  n - LEN(REPLACE(LEFT(Items, n), @chrDelim, '')) + 1

 RETURN

END

--------------------
Colt 45 - the original point and click interface

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

Why do you use the table function version instead of the inline version??

Sergiy SSC Guru Points: 110208 More actions · Answer 11

2 Philicart

I cannot understand why you use @Array table in your UDF. I changed it this way:

IF EXISTS (SELECT * FROM sysobjects WHERE name = N'udf_SplitChar')

DROP FUNCTION udf_SplitChar

GO

CREATE FUNCTION dbo.udf_SplitChar

( @vcrArray varchar(8000) , @chrDelim char(1))

RETURNS @Results TABLE

(Items varchar(8000))

AS

BEGIN

INSERT INTO @Results (Items)

SELECT SUBSTRING(@vcrArray, n, CHARINDEX(@chrDelim, @vcrArray + @chrDelim, n) - n) AS [val]

FROM dbo.Nums

WHERE n <= LEN(@vcrArray) AND SUBSTRING(@chrDelim + @vcrArray, n, 1) = @chrDelim

ORDER BY n - LEN(REPLACE(LEFT(@vcrArray, n), @chrDelim, '')) + 1

RETURN

END

Works as good as yours option but a little faster.

The point I use my option is cleaning up the string before processing.

But actually me and you are standing on the same side: we are using UDF.

Carl Federl declared that UDF is always bad for performance. So I asked for "non-UDF" competitive solution for this task.

_____________
Code for TallyGenerator

philcart SSC-Forever Points: 47796 More actions · Answer 12

Sergiy, Hmmm ... the @Array table could have been some bits I didn't cleanup from an earlier revision, the function definitely doesn't need it.

"But actually me and you are standing on the same side: we are using UDF.

Carl Federl declared that UDF is always bad for performance. So I asked for "non-UDF" competitive solution for this task."

Actually, I agree with Carl, in some cases UDF's are very bad for performance. So as a rule you should avoid using functions unless the complexity of what you're trying to achieve out-weighs the performance impact.

The function I provided does show that using a function isn't really necessary. As I mentioned in the previous post, "the actual query could be used as a derived table and thus do away with the function".

RGR'us, I use the table function to return a table Basically the only reason I have the function at all is because it's called in a few dozen places across 5-6 different databases. If I only had a couple of instances where this was required I'd include the query as a derived table.

--------------------
Colt 45 - the original point and click interface

Sergiy SSC Guru Points: 110208 More actions · Answer 13

Key words "some cases".

That make a difference between your statement and the one from Carl.

_____________
Code for TallyGenerator

philcart SSC-Forever Points: 47796 More actions · Answer 14

But I think more in line with what Carl is trying to produce,

"So as a rule you should avoid using functions"

--------------------
Colt 45 - the original point and click interface

Sergiy SSC Guru Points: 110208 More actions · Answer 15

As well as views.

_____________
Code for TallyGenerator