July 31, 2003 at 2:29 am
User Defined Functions are a new feature in SQL2000. I'm interested to know how and people use them or whether they avoid them like the plague.
In another thread ( http://www.sqlservercentral.com/forum/link.asp?TOPIC_ID=14689 ), I discovered that UDFs were killing performance even when the UDF was very simple. This is the UDF:
CREATE FUNCTION [dbo].[fn_DateCheck] (@date smalldatetime)
RETURNS varchar(25) AS
BEGIN
declare @output_date varchar(25)
if @date > '1950-01-01' and isDate(@date) = 1 set @output_date = @date
else set @output_date = null
return @output_date
END
I had 8 date fields to validate and using the UDF it halved the processing rate of an update statement.
The question is if UDFs kill performance so much, why should anyone use them? Are they any guidelines as to when to use them or when not to use them? If there anything that should not go into a UDF or is anything valid?
All thoughts and ideas welcome.
Jeremy
July 31, 2003 at 2:52 am
I tend to keep away from UDF's if at all possible. It is a great feature to reuse code, but it can kill performance as you noticed.
One place we 'often' use them is as default values in a table. And only if chances are rather slim that someone will be inserting 100's of rows in a batch that all need the default value.
July 31, 2003 at 3:28 am
sorry to interrupt, but the actual question still remains un answered. I am aware of the fact that UDF's were one of the new feature added in the SQL2000 and were absent in SQL 7.0. The answere i am looking out is for the question, 'How do I implement UDf on SQL 7.0' - may be some round about way is present. Does anyone tell me please .
Thanks
Harriet.
July 31, 2003 at 5:06 am
I generally use UDFs but only in the capacity of the columns output for the select statement. So
SELECT col, col, udf(col) as x, col... FROM tbl
I tend to avoid them in WHERE clauses like this.
SELECT cols... FROM tbl WHERE [dbo].[fn_DateCheck](dtcol) IS NOT NULL
The reason is this. For the value to be evaluated the UDF must run against all the records of the column to it's output then evaluated.
What happens is a simple statement like WHERE DTCOL > '1950-01-01' which has an index on the DTCOL column will have to perform an index scan to read all the rows, perform function then evaluate.
But running WHERE DTCOL > '1950-01-01' will most likely perform a seek instead. This means a very large table will take an enormous performance hit.
Now understanding why the performance hit occurrs also keep in mind if there is no index on that column and that is the only piece to you WHERE clause then you should only see a slight performance hit as the UDF has to be performed. The reason is you would have a table or clustered index scan occurr anyway.
Ok then let us complicate things a bit. Suppose you have an index on that column and are going to also have another item in the WHERE clause that is hitting a Clustered Index. This is where you have to test. Most all cases will find the UDF will add minimal overhead (due to it's processing) as long as it is not removing a large number of records. It will in this case most likely shoe a filter in the execution plan for the UDF step.
Then finaly make sure you keep your datatypes as close to original as possible and try not to move them between types within you UDF if possible.
The reason is each (especially implicit) conversion adds to CPU cycles and IO depending on the type of conversion. So with your item I personally would have done this instead
CREATE FUNCTION [dbo].[fn_DateCheck] (@date smalldatetime)
RETURNS smalldatetime AS
BEGIN
return (case when @date > '19500101' then
@date
else
null
end)
END
Since @date is a date there is no reason to perform the IsDate check and the output can be either a date or null so leaving return same as incoming datatype unless you will evaluate otherwise.
I got rid of the @outputdate as the CASE within the return provides the same results without the creation of the extra variable in memory.
One thing about this I still haven't verified is whether or not the variable is created and destroyed each time the UDF processes a column, if so then that adds also to the CPU cycles and the overall performance hit.
But even with all that said, UDFs should be use to wrap and reuse complex logic more so than anything else.
Hoep my observations are helpfull.
July 31, 2003 at 5:31 am
H,
I wasn't trying to steal your thread. I was trying to find out how other people use UDFs - there are a lot of people with a lot more experience than me and I was trying to tap into their skills.
As far as I am aware, I don't know of a work around for UDFs in SQL 7. Someone else might, but if you can give an example of what you are trying to achieve then someone might have an idea.
Jeremy
July 31, 2003 at 5:47 am
Antares,
I can agree with what you are saying in your post. A 'simple' processing UDF should not incur a large overhead.
However, I'm not clear on how the performance is affected if you start using lookup queries in the UDF, e.g. to translate one currency to another one using a conversion rate table. Do you have any pointers on that?
July 31, 2003 at 5:58 am
Can you give me an example of how the query is structured? And exactly how you are working it.
July 31, 2003 at 6:23 am
Antares,
I certainly agree to avoid UDFs in a where clause - I've been caught out a few times by this.
I made the changes to my date UDF as you suggested and it did improve the performance. I had the original checks in there in case the date field was corrupt. With your UDF the whole update statement would fail whereas with the original I would get a null value for that row but the update would complete. As I have been running this update for over a year without a date problem I guess its a chance I am prepared to take for the sake of speed.
Thanks for the insight.
Jeremy
July 31, 2003 at 6:41 am
Ah, I understand. However if the columns datatype to begin with is a datetime type you will not have to worry about corruption. Only if you are storing in a varchar type which if that were the case you would get an error on your UDF because the input value is required to be of a datetime type (in this case smalldatetime).
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply