How to avoid using scalar User Defined Functions?

Question

How to avoid using scalar User Defined Functions?

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Alan Burstein SSC Guru Points: 61141 More actions · Answer 1

Kim Crosser (9/28/2015)
Jeff Moden (9/28/2015)
Kim Crosser (9/28/2015)
On the other hand, UDFs used to do some fancy formatting of an output result value aren't likely to have any significant impact on performance.
Lets not guess... Post a couple such functions and let's find out.
Sometime - "good enough is"...
...until someone uses it somewhere else and that IS not only the nature of functions, but their primary purpose.
Example where I would use a UDF for formatting - Phone numbers.
Given a table where the phone numbers are stored as:
...
CC int, -- Country Code
NPA int, -- Numbering plan area/region code
NXX varchar(9), -- Exchange code
SNUM varchar(9) not null, -- Subscriber number
EXT varchar(9), -- Extension
...
(Yes - they could be INT values, except that a lot of people and companies like to use the keyboard alpha strings instead.)
Customer wants phone numbers to display in results as:
if CC is not null and CC <> 1 ("US"), display as "+CC"
if NPA is not null, display as "(NPA)"
if NXX is not null, display as "NXX-"
display SNUM
if EXT is not null, display as " xEXT"
Yes - you can write an inline Isnull/Coalesce/Case expression to format this, and repeat it in every column in every query that accesses the data, or you can write one simple UDF.
So - which is easier to maintain?
select
...
case isnull(CC,1) when 1 then '' else '+' + convert(varchar(5),CC) + ' ' end
+ coalesce('(' + convert(varchar(5),NPA) + ') ','')
+ coalesce(EXCH + '-','')
+ coalesce(SNUM,'') -- yeah, I know it is a not null column, but I am paranoid...
+ coalesce(' x' + EXT,'') as nicenum
...
or
select
...
myUDFNicePhone(CC,NPA,NXX,SNUM,EXT) as nicenum
I loaded a table with 800,000 records (with different field values) and queried it both ways multiple times. The function version took 3.4-4.5 seconds to process the 800,000 records, while the inline expression took 1.1-1.8 seconds.
Thus, the function averaged 1.6-3.4 seconds slower over 800,000 records, or an average of 4.25 microseconds per record slower (or less).
In real life, I have had to implement functions like this to handle multiple foreign telephone formats, where some country formats have dashes and some just spaces, and some have spaces at fixed intervals, while others can vary.
A similar issue arises with Postal (Zip) codes and formatting of address lines, which vary in interesting ways in different countries. Yes, you could write a big complex Case/Coalesce expression and then copy/paste it wherever you wanted to output these values, or you can write (and fully debug) one UDF and use it where needed. I know which of those customer systems I would rather support when some country decides to change its postal code format (like when the US went to Zip+4).
Code should be efficient and clean, but IMO replicating complex expressions in multiple locations should only be done when necessary - not just because you can squeeze a few more microseconds out of a query that is already performing well.

Ok, let's do an 800,000 row test. Keep in mind - I don't know exactly how what you posted is supposed to work so I just created some random data. Feel free to modify my test so that it's more accurate.

First the scalar and inline version of the function:

USE tempdb -- a db we all have

GO

-- Create a Scalar UDF

IF OBJECT_ID('dbo.SVF_FormatNbr') IS NOT NULL DROP FUNCTION dbo.SVF_FormatNbr

GO

CREATE FUNCTION dbo.SVF_FormatNbr

(

@cc varchar(5),

@NPA varchar(5),

@EXCH varchar(10),

@SNUM varchar(10),

@EXT varchar(10)

)

RETURNS varchar(50) AS

BEGIN

RETURN

(

SELECT

CASE isnull(@CC,1) WHEN 1 THEN '' ELSE '+' + convert(varchar(5),@CC) + ' ' END

+ coalesce('(' + convert(varchar(5),@NPA) + ') ','')

+ coalesce(@EXCH + '-','')

+ coalesce(@SNUM,'')

+ coalesce(' x' + @EXT,'') AS nicenum

);

END

GO

-- Create a iTVF

IF OBJECT_ID('dbo.iTVF_FormatNbr') IS NOT NULL DROP FUNCTION dbo.iTVF_FormatNbr

GO

CREATE FUNCTION dbo.iTVF_FormatNbr

(

@cc varchar(5),

@NPA varchar(5),

@EXCH varchar(10),

@SNUM varchar(10),

@EXT varchar(10)

)

RETURNS TABLE AS RETURN

(

SELECT

CASE isnull(@CC,1) WHEN 1 THEN '' ELSE '+' + convert(varchar(5),@CC) + ' ' END

+ coalesce('(' + convert(varchar(5),@NPA) + ') ','')

+ coalesce(@EXCH + '-','')

+ coalesce(@SNUM,'')

+ coalesce(' x' + @EXT,'') AS nicenum

);

GO

Now for some sample data:

IF OBJECT_ID('dbo.sometable') IS NOT NULL DROP TABLE dbo.sometable

GO

CREATE TABLE dbo.sometable

(

someid int identity primary key,

CC INT,

NPA INT,

EXCH varchar(10),

SNUM varchar(10),

EXT varchar(10)

)

-- Insert sample data

-- will take 5-10 seconds depending on your system (more if it stinks)

INSERT dbo.sometable(CC,NPA,EXCH,SNUM,EXT)

SELECT TOP (800000)

1000+ABS(CHECKSUM(newid())%9995),

REPLACE(LEFT(newid(),6),'-',''),

REPLACE(LEFT(newid(),6),'-','')

FROM sys.all_columns a, sys.all_columns b;

GO

And now the tests. For the iTVF version I tested it with a serial query plan and then with a parallel plan using Adam Mechanic's make_parallel function. I obviously did not test the scalar function with a parallel plan because, as we all know, scalar functions don't do parallel.

We're using a throw-away variable because I don't care how long it takes to return the query in the GUI. (I learned this here[/url]). We ran each test 5 times....

PRINT '=== Scalar test ==='

DECLARE @st datetime = getdate(), @x varchar(50);

SELECT @x = dbo.SVF_FormatNbr(CC,NPA,EXCH,SNUM,EXT)

FROM sometable

PRINT DATEDIFF(MS,@st,getdate());

GO 5

PRINT '=== iTVF test with serial plan ==='

DECLARE @st datetime = getdate(), @x varchar(50);

SELECT @x = nicenum

FROM sometable

CROSS APPLY dbo.iTVF_FormatNbr(CC,NPA,EXCH,SNUM,EXT)

OPTION (MAXDOP 1) -- ensure that it gets a serial plan

PRINT DATEDIFF(MS,@st,getdate());

GO 5

PRINT '=== iTVF test with parallel plan ==='

DECLARE @st datetime = getdate(), @x varchar(50);

SELECT @x = nicenum

FROM sometable

CROSS APPLY dbo.iTVF_FormatNbr(CC,NPA,EXCH,SNUM,EXT)

CROSS APPLY dbo.make_parallel() -- force a parallel plan

PRINT DATEDIFF(MS,@st,getdate());

GO 5

I knew what was going to happen before I ran the test....

Beginning execution loop

=== Scalar test ===

1860

=== Scalar test ===

1880

=== Scalar test ===

1853

=== Scalar test ===

1883

=== Scalar test ===

1873

Batch execution completed 5 times.

Beginning execution loop