May 19, 2009 at 11:39 am
Hi Everyone,
We are running SQL2005 with the latest updates etc. We do have a separate drive RAID 0 from the TEMP DB. We routinely create joins linking 2-5 tables with 50-250mln records and apply all the proper indexing, make tall skinny tables to improve performance.
We also have a couple of relatively simple functions that either verify parts of the data of create keys out of multiple columns. When we employ the functions on the same query to assemble results tables queries that normally took ~20hrs suddenly takes 2-3 days or never finish. We tried both SELECT INTO as well as INSERT INTO SELECT type queries and there is no difference.
Why could that be? I can run a top 100 on the queries with or without the SQL-functions.
Thanks!
May 19, 2009 at 12:20 pm
Once I had a a very similar issue after applying a service pack. After rebuilding indexes and updating statistics all returned to the habitual rates...
May 19, 2009 at 12:33 pm
By including functions in a query add overhead.
Picture the scenario below for a 10M rows myTable table...
select column-a
from MyTable
Query does a full table scan on 10M rows MyTable, isn't it?
Now lets add a custom function to the very same query...
select do_something(column-a)
from MyTable
What happens now?
On top of the full table scan on 10M rows MyTable SQL Server has to execute do_something() function 10M times -that's overhead.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.May 19, 2009 at 2:15 pm
I will not dispute the overhead, but is there a way to quantify the overhead?
May 19, 2009 at 2:18 pm
Can you post the function, maybe making it Inline might help performance, if it's not already an Inline function. At least, it will get inserted in the execution plan, and the optimiser will be able to calculate statistics on it.
Post the code for the said function, and I'll try to help.
Cheers,
J-F
May 19, 2009 at 3:20 pm
Here you go:
CREATE FUNCTION [dbo].[Valid_Phone]
(@inputstring varchar(8000) )
RETURNS varchar(8000)
AS
BEGIN
/*return valid phone or no space blank*/
declare
@outputstring as varchar(8000)
set @outputstring = ''
-- Take out punctuations ()-.
set @inputstring = replace(@inputstring,'(','')
set @inputstring = replace(@inputstring,')','')
set @inputstring = replace(@inputstring,'-','')
set @inputstring = replace(@inputstring,'.','')
set @inputstring = replace(@inputstring,' ','')
--if length = 11 and start with 1 then trim the 1 at the 1st byte
if len(@inputstring) = 11 and left(@inputstring,1) = '1'
begin
set @inputstring = substring(@inputstring,2,10)
end
--if length = 10, is numeric and valid area code then output phone number
if len(@inputstring) = 10
begin
if isnumeric(@inputstring)=1
and left(@inputstring,1) not in ('0','1')
and substring(@inputstring,2,1) != '9'
and substring(@inputstring,2,2) != '11'
and substring(@inputstring,4,1) not in ('0','1')
and substring(@inputstring,5,2) != '11'
and left(@inputstring,3) not in
--(select AreaCode from dbo.FPDW_AreaCode where Filter = 'Y')
('456', '500', '555', '600', '700', '710', '800', '822', '833', '844', '855', '866', '877', '888', '898', '900', '976', '999')
-- check last 7 bytes
and right(@inputstring,7) not in ('2222222','3333333','4444444','5555555','6666666','7777777','8888888','9999999')
begin
set @outputstring = @inputstring
end
end
return @outputstring
END
May 19, 2009 at 3:38 pm
Came across this article that wants me to believe Scalar functions are not a great idea to begin with, not sure I am buying into it completely.
http://sqlblogcasts.com/blogs/simons/archive/2008/11/03/TSQL-Scalar-functions-are-evil-.aspx
May 19, 2009 at 3:51 pm
Hi Pieter
Some questions:
1.) Do you call these functions in your WHERE clause?
2.) If yes, are the initial replacements "()-. " needed? They may be a very huge problem for another solution I'm currently thinking about.
3.) Could CLR be an option?
4.) Can you provide some pseudo sample data which should work and which should not?
Greets
Flo
Edited: Typo
May 19, 2009 at 4:07 pm
If you use functions in the where clause, whether user defined or system functions, then the optimizer is unaware of the value and has to determine it on each row, if you need to use functions to massage the data used in the where clause, could you not create computed columns using the functions and then join on these?
Good article explaining what I beleieve you are describing:
May 19, 2009 at 4:15 pm
Please post your query that uses this function.
May 20, 2009 at 7:03 am
You can quantify the differences. Get the execution plans first. That will show you where the changes are occurring between the execution without the function and the one with it. You can also collect STATISTICS IO to see the scans & reads change between the two executions. That should be enough to quantify exactly what's going on. And yes, scalar functions can (not necessarily will) cause your queries to perform badly, especially if those scalar functions also have data access within them (which your example did not). It's my understanding, I haven't tested this, so I'm just repeating stuff I've read, that string manipulation like this is better done using a CLR function.
If the functions are being used in the WHERE clause of the query, the function will prevent index use.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 20, 2009 at 7:50 am
The TSQL functions are in the select portion not the Where clause or Join conditions.
CLR is an option and I tried to convert that phone function. However, as unexpected as it was, there was no performance improvement.
May 20, 2009 at 7:55 am
Would you please post your query.
May 20, 2009 at 8:01 am
Pieter (5/20/2009)
The TSQL functions are in the select portion not the Where clause or Join conditions.CLR is an option and I tried to convert that phone function. However, as unexpected as it was, there was no performance improvement.
Hi Pieter,
I've built this query to replace your scalar function, would you mind testing it on your data, and post the results/Query plan?
Hope it will help,
CREATE FUNCTION [dbo].[Valid_Phone_Inline]
(@inputstring VARCHAR(8000))
RETURNS Table
AS
RETURN(
with PhoneToValidate(TrimmedPhone)
as (SELECT CASE
WHEN Len(TrimmedPhone) = 11
AND left(TrimmedPhone,1) = '1'
THEN Substring(TrimmedPhone,2,10)
ELSE TrimmedPhone
END AS TrimmedPhone
FROM (SELECT replace(replace(replace(replace(replace(@inputstring,'(',''),')',''),
'-',''),'.',''),' ','') AS TrimmedPhone) AS a)
Select case when IsNumeric(TrimmedPhone) = 1
AND left(TrimmedPhone,1) not IN ('0','1')
AND substring(TrimmedPhone,2,1) != '9'
AND substring(TrimmedPhone,2,2) != '11'
AND substring(TrimmedPhone,4,1) NOT IN ('0','1')
AND substring(TrimmedPhone,5,2) != '11'
AND left(TrimmedPhone,3) NOT IN -- (select AreaCode from dbo.FPDW_AreaCode where Filter = 'Y')
('456','500','555','600',
'700','710','800','822',
'833','844','855','866',
'877','888','898','900',
'976','999')
AND right(TrimmedPhone,7) NOT IN ('2222222','3333333','4444444','5555555',
'6666666','7777777','8888888','9999999')
THEN TrimmedPhone
ELSE '' END as OutputPhone
FROM PhoneToValidate);
GO
SELECT dbo.Valid_Phone('1(700) 226-2354')
SELECT OutputPhone
FROM dbo.Valid_Phone_Inline('1(700) 226-2354')
SELECT OutputPhone
FROM dbo.Valid_Phone_Inline('1(226) 226-2354')
SELECT OutputPhone
FROM dbo.Valid_Phone_Inline('0(534) 226-2354')
SELECT OutputPhone
FROM dbo.Valid_Phone_Inline('1(534) 226-2354')
Cheers,
J-F
May 20, 2009 at 4:08 pm
Hi Pieter
I just tried some CLR stuff with 1 million rows.
Currently your function takes 17s
CLR takes 7s
I think this can be faster. Give me a little bit more time.
Questions:
1.)
Are you using C# or VB.Net?
2.)
Why do you use this as on-demand function? I don't think you get 250mio new phone numbers every day. What about a new column in your table which contains the cleaned phone numbers?
@J-F:
I stopped your function at two minutes.
Greets
Flo
Viewing 15 posts - 1 through 15 (of 41 total)
You must be logged in to reply to this topic. Login to reply