December 4, 2005 at 5:45 am
I am in the process of investigating User Defined Functions including developing an alternative single SQL statement solution and then comparing the performance of these solutions for the duration, logical reads and CPU resources used. I hope to write an article regarding this investigation if any meaningful conclusions are reached.
Matthew L. Wigdahl wrote an article titled "Are UDFs Harmful to SQL Server Performance?" which performs such a comparison. See http://www.sql-server-performance.com/mw_sql_server_udfs.asp
You assistance is needed to insure that as many solutions using UDFs are investigated. If you know of a situation where a UDF is superior, please provide the situation, preferable re-producible using the Northwind schema.
One requirement that can only be done with UDFs, at least under SQL Server 2000, is concatenate column values from multiple rows into a single columns in a single row at described at http://www.aspfaq.com/show.asp?id=2529
The conclusion I am reaching is why you should never, ever, under any circumstances use UDFs. e.g. DENY CREATE FUNCTION to public should be standard.
Below is an example of a UDF and the Alternative with performance statistics captured thru a trace. Note that using "set statistics" for IO or time does not capture the resources used by the UDF, hence the use of a trace.
UDF SQL
Duration15665 4106
Reads5613320934
CPU 4806 1002
create function dbo.Customers_OrderCount
( @CustomerID nchar (5)
)
RETURNS INTEGER
AS
BEGIN
RETURN (SELECT COUNT(*) from dbo.Orders where CustomerId = @CustomerID)
END
go
declare @loop int
set @loop = 1
while @loop < 100
begin
set @loop = @loop + 1
select Customers.CustomerID
, SUM(CASE when Orders.CustomerID is null then 0 else 1 end )
from dbo.Customers
LEFT OUTER JOIN dbo.Orders
on Orders.CustomerID = Customers.CustomerID
GROUP BY Customers.CustomerID
end
declare @loop int
set @loop = 1
while @loop < 100
begin
set @loop = @loop + 1
select Customers.CustomerID
,dbo.Customers_OrderCount( Customers.CustomerID )
from dbo.Customers
end
SQL = Scarcely Qualifies as a Language
December 4, 2005 at 3:25 pm
You've taken one of "BIG NO"s - don't access tables within scalar UDF.
It creates hidden cursor.
Your sample will really affect performance. But it happens not because of UDF but because you apply right tool in wrong place.
Your example is one of essential "bad practice" examples in any SQL book.
If you'll try to move 40 foot container with Toyota Corolla it will probably die on first 100 meters.
But if you will suggest to everyone from this experience not to use Corolla anyway people just will consider you freaky.
_____________
Code for TallyGenerator
December 4, 2005 at 3:30 pm
>The conclusion I am reaching is why you should never, ever, under any circumstances use UDFs. e.g. DENY CREATE FUNCTION to public should be standard.
My experience is that UDF's are very usefull (especially if a presentation layer is not present) but, like any other code, they must be properly written with things like accuracy, scalability, and performance in mind at all times. I'd rather spend the time discussing the value of peer reviews, DBA code reviews, mentoring, production code submittal controls, and the merits of spending the bit of extra time to find set based solutions for tasks where it seems as if no method other than looping (in any form) will work.
I do have to agree with Sergiy... if one of my Developers were to write a UDF (or any other code, for that matter) in the manner of your example, I'd have to shoot him out of a cannon butt first.
Regardless of my personal feelings about the usefulness of UDFs, I do wish you well in your investigation. It will be very interesting to see what you come up with on what I'm sure will be a highly controversial study.
Sincerely,
--Jeff Moden
Change is inevitable... Change for the better is not.
December 4, 2005 at 3:31 pm
As another example try this function:
IF EXISTS (SELECT * FROM sysobjects WHERE name = N'DateOnly')
DROP FUNCTION DateOnly
GO
CREATE FUNCTION dbo.DateOnly
(@DT datetime)
RETURNS datetime
AS
BEGIN
RETURN convert(datetime, convert(int, @dt - 0.5))
END
GO
And compare performance of this function with any other method you use to cut off time part from datetime value.
_____________
Code for TallyGenerator
December 4, 2005 at 4:21 pm
Outstanding, Sergiy!
I ran the following and the proc that uses the DateOnly function doesn't take any longer than the other...
USE NORTHWIND
GO
SET STATISTICS TIME ON
GO
CREATE PROCEDURE dbo.ProcTest1 AS
SELECT dbo.DateONLY(ShippedDate)
FROM Orders
GO
CREATE PROCEDURE dbo.ProcTest2 AS
SELECT convert(datetime, convert(int, ShippedDate - 0.5))
FROM Orders
GO
EXEC dbo.ProcTest1
EXEC dbo.ProcTest2
I'm thinking "End of study" and "Myth Busted".
--Jeff Moden
Change is inevitable... Change for the better is not.
December 4, 2005 at 5:06 pm
And funny part that on multy-CPU server UDF sometimes is faster than set of built-in functions in SQL statement.
Check this out:
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=238981&p=3
_____________
Code for TallyGenerator
December 4, 2005 at 7:37 pm
Thanks for the feedback especially that the benchmark needs to be run on a variety of hardware. I will make sure to run the benchmark on hardware varying between a small 2CPU P400 to a brand new 16 CPU Data Center Cluster.
The condition I have identified so far:
1.Deterministic UDFs vs In-Line SQL in the SELECT section of SQL statement
2.Non-Deterministic UDFs vs In-Line SQL in the SELECT section of SQL statement
3.Deterministic UDFs vs In-Line SQL in the WHERE section of SQL statement
4.Non-Deterministic UDFs vs In-Line SQL in the WHERE section of SQL statement
5.Deterministic UDFs vs In-Line SQL as a CHECK Constraint
6.Non-Deterministic UDFs as CHECK Constraint vs In-Line SQL in a Trigger
For references, I have found:
Ken Henderson's "The Guru's Guide to SQL Server Stored Procedures, XML and HTML" which offers some interesting test cases. Since all of the source is on a CD, this is very convient.
Joseph Gama's "TSQL functions" at http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=502&lngWId=5 These UDFs appear to be all deterministic.
Andrew Novick's UDF library at http://www.novicksoftware.com/
Sergiy's UDF for dbo.DateOnly is interesting and I will be sure to include various date manipulation alternatives such as first-day-of-month, last-day-of-month.
SQL = Scarcely Qualifies as a Language
December 4, 2005 at 7:59 pm
Serqiy, that was my finding, as well. I was being conservative considering the volitile subject.
My thought is that properly written UDF's are much too valuable to simply write off even if they were to run a bit slower and, clearly, they do not run slower. The fact that they will, many times, run faster than clear code not to mention that, when properly titled, will increase the readability and simplification of code, seems to make them more valuable still. The myth that UDF's cause poor performance truly seems busted to me and, like all else in SQL, depends on the code itself, not the fact that a UDF was used.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 5, 2005 at 10:38 am
My two cents:
Over the years working with UDFs, I have found that they have a place and time. I have rewritten code in a SELECT statement to NOT use UDFs and found that the performance was substantially faster. On the other hand, if your query is only working with a small set up data, then a UDF could be useful. Of few of my recommendations are to use Scalar UDFs strategically. They can and will degrade performance due to the nature of an internal cursor. So far, I am not opposed to Inline-Table Functions (Table UDFs). Based on my analysis of the execution plans, Table UDFs perform similar to that of Views. And can also help with performance. Table UDFs do now have an internal cursor like the Scalar UDF and can perform that of an searchable view.
Overall,
Strategically use Scalar UDFs - Internal Cursor exists and DOES effect performance with large datasets
Strategically use Inline Table UDFs - These seem to be treated like VIEWS and can help performance at the same time, many joins can degrade performance.
I have noticed that a Scalar UDFs disables parallelism. So you get the degradation of an internal cursor and no parallelism. I spoke to a Delaney and Tripp about this during a Pass conference, and they didn't expect it to be fixed in 2000.
Scalar UDFs are probably ALWAYS harmful, but if the harm is negligible, and the growth is not expected for the returned data set, then why not. On the other hand, if you cannot predict the growth and are concerned with scalability, then be extremly cautious.
Greg
December 5, 2005 at 8:25 pm
Well, they say one measurement is worth a thousand speculations.... so, I did one. I'll have to eat a little crow on the "clearly they are not slower" thing but not by much... Admittedly, the test I made did not do any joins... just simple SELECTS because my purpose was to find out are UDF's really that harmful...
As I said before, UDF's can be just too darn useful to summarily dismiss but my findings are that, even on simple SELECTs, UDF's are, as Greg suggested, slower on large datasets. Greg's idea of strategic use of Scalar UDF's does in-fact have some merit. Carl's supposition is that there are few things that can't be done without a Scalar UDF (he's mostly correct) and he pointed one of those out (Table returned UDF's appear to be the exception).
I used Sergiy's function in the tests because it's nice and simple. Here's what I found according to the tests I ran...
1. UDF's in the SELECT statement take 100% longer than a formula in the SELECT statement.
2. UDF's in a simple WHERE clause take 31% longer than a formula in the WHERE clause.
EVERYBODY is happy, right? Don't celebrate just yet because there are other factors at stake. The difference between the two methods on item (1) above was only 16 seconds (32 to 16 respectively). The difference between the two methods on item (2) above as only 19 seconds (1:19 to 1:00 respectively). AND, each test was on 2 MILLION records using the same table.
What's all that mean? For Scalar UDF's...
1. It means Carl is partially right... there are few things that can't be done without a Scalar UDF. If mindbending speed is all you're after, don't use a UDF if you don't have to, even if one exists.
2. It means that Sergiy is right... on smaller record sets, you won't see much difference and the UDF may run a bit faster on the smaller sets (we found that out earlier).
3. It means that Greg is right... you have to use them Strategically.
4. And, I think it means I'm right but for reasons that have nothing at all to do with performance of code. It has to do with the performance of the Developers writing the code. Although (it certainly appears from my tests) that Scalar UDFs are slower than clear code on large data sets, reinventing the wheel or copying equivelent functionality from a code library will certainly slow Development down. And, since the funtionality may be fairly complex, UDF's can make the code much more readable and easier to troubleshoot which also saves expensive time.
I'll have to side with Greg on this one... carefully consider the advantages and disadvantages of using Scalar UDF's for every instance. You shouldn't summarily dismiss their usefulness nor should you write everything, even if for the advantages of code reuse, as a UDF. Balance coding convenience with performance with readability, etc.
Here's the results from my test runs on a stand-alone single-user non-networked server on my little ol' 1.8Ghz, 2GB ram, IDE disk, desktop server running SQL Server 2000 Developer's Edition with SP3a on an XP-sp2 box... I ran it a couple of times to be sure... very similar results on all runs...
Formula in SELECT
(2000000 row(s) affected)
00:00:16:813
------------------------------
Function in SELECT
(2000000 row(s) affected)
00:00:32:827
------------------------------
Formula in WHERE
(2000000 row(s) affected)
00:01:00:140
------------------------------
Function in WHERE
(2000000 row(s) affected)
00:01:18:923
------------------------------
==============================
Function in SELECT
(2000000 row(s) affected)
00:00:32:810
------------------------------
Formula in SELECT
(2000000 row(s) affected)
00:00:16:533
------------------------------
Function in WHERE
(2000000 row(s) affected)
00:01:19:017
------------------------------
Formula in WHERE
(2000000 row(s) affected)
00:01:00:280
------------------------------
And... here's the test code I used including the 2 million row test table "generator"... do be careful because the test code does drop a table and function and then recreates them... run the code in the GRID mode or the test will take about a week to complete! It took just over 7 minutes on my box including the table generation.
--===================================================================
-- Create the test Table (Careful!!! Has a drop!!!!)
--===================================================================
--===== If the test table exists, drop it (CAREFUL!!!!)
IF OBJECT_ID('dbo.LargeTest') IS NOT NULL
DROP TABLE dbo.LargeTest
--===== Create and populate the test table on the fly
SELECT TOP 2000000 --That's 2 million rows
IDENTITY(INT,1,1) AS ID,
GETDATE() AS Date1,
GETDATE() AS Date2,
GETDATE() AS Date3,
'THEY WILL BOTH PERFORM THE SAME' AS SomeString,
'TURNS OUT NOT TRUE!!!' AS SomeOtherString,
NEWID() AS GUID
INTO dbo.LargeTest
FROM dbo.SYSCOLUMNS sc1,
dbo.SYSCOLUMNS sc2,
dbo.SYSCOLUMNS sc3
--===== Add a primary key to the test table
ALTER TABLE dbo.LargeTest
ADD PRIMARY KEY CLUSTERED (ID)
GO
--===================================================================
-- Create the test function (Careful!!! Has a drop!!!!)
--===================================================================
--===== If the test function exists, drop it (CAREFUL!!!!)
IF OBJECT_ID('dbo.TestDateOnly') IS NOT NULL
DROP FUNCTION dbo.TestDateOnly
GO
--===== Create the test function
CREATE FUNCTION dbo.TestDateOnly
(@DT datetime)
RETURNS DATETIME AS BEGIN
RETURN CONVERT(DATETIME, CONVERT(INT, @dt - 0.5))
END
GO
--===================================================================
-- Do some tests....
--===================================================================
--===== Create some local performance variables
DECLARE @MyStart DATETIME
DECLARE @MyEnd DATETIME
PRINT 'Formula in SELECT'
SET @MyStart = GETDATE()
SELECT CONVERT(DATETIME, CONVERT(INT, Date2 - 0.5)) AS DateNoTime
FROM dbo.LargeTest
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Function in SELECT'
SET @MyStart = GETDATE()
SELECT dbo.TestDateOnly(Date2)
FROM dbo.LargeTest
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Formula in WHERE'
SET @MyStart = GETDATE()
SELECT *
FROM dbo.LargeTest
WHERE CONVERT(DATETIME, CONVERT(INT, Date2 - 0.5)) <= Date3
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Function in WHERE'
SET @MyStart = GETDATE()
SELECT *
FROM dbo.LargeTest
WHERE dbo.TestDateOnly(Date2) <= Date3
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
--=================================================================
PRINT REPLICATE ('=',30) -- Test pairs reversed
--=================================================================
PRINT 'Function in SELECT'
SET @MyStart = GETDATE()
SELECT dbo.TestDateOnly(Date2)
FROM dbo.LargeTest
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Formula in SELECT'
SET @MyStart = GETDATE()
SELECT CONVERT(DATETIME, CONVERT(INT, Date2 - 0.5)) AS DateNoTime
FROM dbo.LargeTest
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Function in WHERE'
SET @MyStart = GETDATE()
SELECT *
FROM dbo.LargeTest
WHERE dbo.TestDateOnly(Date2) <= Date3
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Formula in WHERE'
SET @MyStart = GETDATE()
SELECT *
FROM dbo.LargeTest
WHERE CONVERT(DATETIME, CONVERT(INT, Date2 - 0.5)) <= Date3
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
p.s. I don't think I'd ever create a UDF just to drop the time but it certainly suited this test well. Thanks Sergiy.
p.p.s. Carl, is THAT kinda what you had in mind? Thanks for asking the question and posing the challenge... I don't agree with summarily dismissing the use of Scalar UDF's but you were mostly correct about the speed thing.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 5, 2005 at 9:27 pm
There are some options (as always)
I tried to minimise network traffic influence. Feel the difference:
PRINT 'Formula in WHERE'
SET @MyStart = GETDATE()
SELECT *
FROM dbo.LargeTest
WHERE CONVERT(DATETIME, CONVERT(INT, Date2 - 0.5)) <= Date3
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Function in WHERE'
SET @MyStart = GETDATE()
SELECT *
FROM dbo.LargeTest
WHERE dbo.TestDateOnly(Date2) <= Date3
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Formula in WHERE'
SET @MyStart = GETDATE()
SELECT count(*)
FROM dbo.LargeTest
WHERE CONVERT(DATETIME, CONVERT(INT, Date2 - 0.5)) <= Date3
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Function in WHERE'
SET @MyStart = GETDATE()
SELECT count(*)
FROM dbo.LargeTest
WHERE dbo.TestDateOnly(Date2) <= Date3
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
------------------------------
Formula in WHERE
(20000 row(s) affected)
00:00:09:580
------------------------------
Function in WHERE
(20000 row(s) affected)
00:00:08:687
------------------------------
Formula in WHERE
(1 row(s) affected)
00:00:00:030
------------------------------
Function in WHERE
(1 row(s) affected)
00:00:00:313
------------------------------
_____________
Code for TallyGenerator
December 5, 2005 at 9:35 pm
Another option.
I replace real values with NULLs.
---------------
DECLARE @MyStart DATETIME
DECLARE @MyEnd DATETIME
Declare @DummyDate datetime
PRINT 'Formula in SELECT, returns NULLS'
SET @MyStart = GETDATE()
SELECT CONVERT(DATETIME, CONVERT(INT, Date2 - 0.5)) + @DummyDate AS DateNoTime
FROM dbo.LargeTest
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Function in SELECT, returns NULLS'
SET @MyStart = GETDATE()
SELECT dbo.TestDateOnly(Date2) + @DummyDate AS DateNoTime
FROM dbo.LargeTest
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Formula in SELECT'
SET @MyStart = GETDATE()
SELECT CONVERT(DATETIME, CONVERT(INT, Date2 - 0.5))AS DateNoTime
FROM dbo.LargeTest
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
PRINT 'Function in SELECT'
SET @MyStart = GETDATE()
SELECT dbo.TestDateOnly(Date2) AS DateNoTime
FROM dbo.LargeTest
SET @MyEnd = GETDATE()
PRINT CONVERT(VARCHAR(12),@MyEnd-@MyStart,114)
PRINT REPLICATE('-',30)
Here is the result:
----------------------
Formula in SELECT, returns NULLS
(200000 row(s) affected)
00:00:01:640
------------------------------
Function in SELECT, returns NULLS
(200000 row(s) affected)
00:00:02:093
------------------------------
Formula in SELECT
(200000 row(s) affected)
00:00:08:547
------------------------------
Function in SELECT
(200000 row(s) affected)
00:00:09:627
------------------------------
Same set of queries agains another server:
Formula in SELECT, returns NULLS
(200000 row(s) affected)
00:00:01:440
------------------------------
Function in SELECT, returns NULLS
(200000 row(s) affected)
00:00:03:500
------------------------------
Formula in SELECT
(200000 row(s) affected)
00:00:06:453
------------------------------
Function in SELECT
(200000 row(s) affected)
00:00:04:390
------------------------------
_____________
Code for TallyGenerator
December 6, 2005 at 7:35 pm
No fair! You used a real server with the Enterprise Edition... I used a pc with the Developer's Edition!
Ok, I just ran this on one of the 4 processor servers at work and still didn't get times nearly as good as your's... what did you do?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2005 at 8:19 pm
Copy - Paste.
Nothing special. Only reduces number of rows to 500k or 200k depending on capability of server I used.
You can see it in results. I did not post 500k results from 4 CPU server because they are generally the same.
Sometimes Function is even faster by 5%, but next time I run the same set of queries it slower by 5%. It's clearly not only me playing on the server.
But main idea remains the same: proper desined function does not affect performance in any significant way.
_____________
Code for TallyGenerator
December 6, 2005 at 9:56 pm
Ah... now I see ... it was 200k rows, not 2M rows... too late at night... It does support what we said earlier... for lesser numbers of rows, the Scalar Functions may actually be advantageous...
Lemme try it with only 200k rows on both boxes...
--Jeff Moden
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 52 total)
You must be logged in to reply to this topic. Login to reply