October 31, 2013 at 3:53 am
I created a User Defined Function that performs some arithmetic calculations on a few columns and returns an amount.
The UDF does not do any lookup on the database, just uses the parameters passed (and calls another UDF with similar characteristics).
I organised my underlying table so that all required columns are available in that table.
I expected the UDF performance to be extremely quick but it happens to be VERY slow.
Retrieving ~300,000 rows with the UDF takes about 10mns.
Just removing the function makes it about 15 seconds (scanning the table and displaying in SSMS).
There is no index at all on my table and I don't specify any criteria so both Tests involve just a table scan.
I was under the impression that I would have no performance problem as long as there was no database access within the UDF but this seems wrong.
I suppose I could get the required performance using a CLR function but I would rather avoid that because it is way beyond the technical skills of my customer (I am a consultant).
My bottleneck is entirely CPU
Any idea how to improve this?
October 31, 2013 at 4:34 am
If you have 300,000 rows, the UDF will be called 300,000 times if you defined a scalar UDF.
More info:
User Defined Functions and Performance
Using a table valued function could improve performance.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 31, 2013 at 4:37 am
I know but I was hoping it would be fast...
Just talked to my customer and I'll try using CLR!
Yipee!!! 😀
October 31, 2013 at 4:39 am
Eric Mamet (10/31/2013)
I know but I was hoping it would be fast...Just talked to my customer and I'll try using CLR!
Yipee!!! 😀
Never seen someone so excited about CLR 🙂
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 31, 2013 at 8:31 am
It is much faster to apply a table function.
CREATE TABLE Test1 (ID INT);
GO
INSERT INTO Test1
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY a.object_id)
FROM sys.all_columns a
CROSS JOIN sys.all_columns b
GO
CREATE FUNCTION UDF_INLINE (@Input INT)
RETURNS INT
AS
BEGIN
DECLARE @I INT;
SET @I = @Input * 0.14;
RETURN @I;
END;
GO
CREATE FUNCTION UDF_APPLY (@Input INT)
RETURNS TABLE
AS
RETURN (SELECT @Input * 0.14 AS Result);
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
-- Return the column with no function
SELECT
ID
INTO
#test1
FROM
Test1;
/*
Table 'Test1'. Scan count 1, logical reads 3345
CPU time = 437 ms, elapsed time = 533 ms.
*/
-- Return the inline function
SELECT
dbo.UDF_INLINE(ID) AS RowName
INTO
#test2
FROM
Test1;
/*
Table '#test2'. Scan count 0, logical reads 1001607
Table 'Test1'. Scan count 1, logical reads 3345
CPU time = 10389 ms, elapsed time = 13965 ms.
*/
-- Apply the function
SELECT
b.Result
INTO
#test3
FROM
Test1 t
CROSS APPLY
dbo.UDF_APPLY(t.ID) AS b;
/*
Table 'Test1'. Scan count 1, logical reads 3345
CPU time = 577 ms, elapsed time = 578 ms.
*/
October 31, 2013 at 11:49 am
I know but I was hoping it would be fast...
Just talked to my customer and I'll try using CLR!
Yipee!!!
I don't know if a CLR is going to be the way to go. I have been a big proponent of CLRs for some things because they do have their place but, in this case, there is no reason to believe that a CLR is going to perform aggregations faster than a T-SQL function. Remember, creating and Implementing a CLR is not a trivial task and introduces new risks and overhead for your SQL environment. You may just be adding more work without any added benefits.
Take a look at this article How to Make Scalar UDFs Run Faster (SQL Spackle)[/url]. As Sean was saying and demonstrated: you should get much better results turning your function into an inline table valued function. You will have to change your query logic to include a cross apply but that is something numerous people on this site can help you with if you get stuck.
P.S. I believe there is a new SQL Server Central Stairway on CLRs coming soon. I expect that it will be a good read. 😉
-- Itzik Ben-Gan 2001
October 31, 2013 at 11:58 am
Actually, I am not too sure to understand what the table function is doing in all this...
As for the speed, it's shockingly different!
To retrieve about 1.5 million rows using a simple TSQL UDF took about 4 minutes.
Replacing the TSQL UDF by a CLR function shrinks the 4 minutes to 18"
Removing any function still gives me about the same (17").
In other words, the CLR function is practically invisible in terms of performance! :w00t:
In fact, I should have remembered because this is an experiment that Itzik Ben Gan had already demonstrated in his wonderful book "Inside SQL Server 2005 TSQL Querying"
Silly me...
October 31, 2013 at 12:16 pm
CLR might perform very well, but it seems to me that you're cracking nuts with a sledgehammer (or as said in spanish, killing flies with cannonballs).
Your bottleneck was CPU using UDFs because it will limit your CPU use to one (in other words, you're not using parallelism).
Good T-SQL should be enough for your problem, but it's all up to you.
October 31, 2013 at 1:42 pm
What's the nature of the function you're trying to apply to the data? As Koen says, using functions against large sets of rows can have serious performance implications. If you can do it all inline, you may consider a computed column on the table, which SQL might be able to optimize better than calling a function for each row.
October 31, 2013 at 3:27 pm
The machine I used was a dual processor and yes CPU was limited to 1 processor.
So I had one processor flat out for 4 mns.
I doubt that using 2 processors instead of one would improve the performance to a couple of seconds...
I don't have access to the actual procedure now but I'll try to post it tomorrow.
October 31, 2013 at 4:19 pm
The lack of parallelism is just a part of the reasons why udfs slow down the code performance. I'll be glad to help (and I'm sure others will be as well) when we know what is intended with the function.
October 31, 2013 at 4:38 pm
Sean Pearce (10/31/2013)
It is much faster to apply a table function.
CREATE TABLE Test1 (ID INT);
GO
INSERT INTO Test1
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY a.object_id)
FROM sys.all_columns a
CROSS JOIN sys.all_columns b
GO
CREATE FUNCTION UDF_INLINE (@Input INT)
RETURNS INT
AS
BEGIN
DECLARE @I INT;
SET @I = @Input * 0.14;
RETURN @I;
END;
GO
CREATE FUNCTION UDF_APPLY (@Input INT)
RETURNS TABLE
AS
RETURN (SELECT @Input * 0.14 AS Result);
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
-- Return the column with no function
SELECT
ID
INTO
#test1
FROM
Test1;
/*
Table 'Test1'. Scan count 1, logical reads 3345
CPU time = 437 ms, elapsed time = 533 ms.
*/
-- Return the inline function
SELECT
dbo.UDF_INLINE(ID) AS RowName
INTO
#test2
FROM
Test1;
/*
Table '#test2'. Scan count 0, logical reads 1001607
Table 'Test1'. Scan count 1, logical reads 3345
CPU time = 10389 ms, elapsed time = 13965 ms.
*/
-- Apply the function
SELECT
b.Result
INTO
#test3
FROM
Test1 t
CROSS APPLY
dbo.UDF_APPLY(t.ID) AS b;
/*
Table 'Test1'. Scan count 1, logical reads 3345
CPU time = 577 ms, elapsed time = 578 ms.
*/
+1. I don't know what others call the type of function you wrote for the "Apply" function but I call them "iSF" or "Inline Scalar Function". Of course, they're really just an Inline Table Valued Function (iTVF) that returns a single element.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2013 at 4:43 pm
Eric Mamet (10/31/2013)
I know but I was hoping it would be fast...Just talked to my customer and I'll try using CLR!
Yipee!!! 😀
Heh... hate to throw a wet blanket on your fire but you don't need to resort to an SQLCLR function for something so simple (although if you're comfortable with that, then fire away! It can be a great solution when done properly.:-D). Please see the example that Sean Pierce provided in his post above and please see the following article that demonstrates the problem and the fix (the "iSF").
How to Make Scalar UDFs Run Faster (SQL Spackle)
[/url]
--Jeff Moden
Change is inevitable... Change for the better is not.
November 1, 2013 at 12:41 am
Jeff Moden (10/31/2013)
+1. I don't know what others call the type of function you wrote for the "Apply" function but I call them "iSF" or "Inline Scalar Function". Of course, they're really just an Inline Table Valued Function (iTVF) that returns a single element.
I was really struggling with terminology when I wrote that :blush:
November 1, 2013 at 4:28 am
Indeed this is what I have done and the result is very impressive.
I don't think I can detect any load due to the SQLCLR function.
I have even been able to convince the DBA in charge to CLR enable the database 😛
Perfect!
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply