February 1, 2010 at 9:22 am
The following is our T-SQL function that is being called for all VarChar,Char,NChar and NVarChar fields in our SELECT statement in a stored proc. There is a lot of fields selected over a lot of rows on a lot of tables which all use the FOR XML statement. The users are cutting and pasting duff chars into fields.
We cannot stop this at source, hence the conversion Function.
Does anyone have any thoughts or advice on speeding up / making more efficient this Function.
Any input appreciated.
ALTER FUNCTION dbo.fn_RemoveInvalidCharacters(@Temp NVarChar(max))
RETURNS NVarChar(max)
AS
BEGIN
DECLARE @regex VarChar(100)
SET @regex = '%[^a-zA-Z0-9 !"#%&,/:;<=>@_`{}~%\$\(\)\*\+\.\?\[\^\|\u0009\u000A\u000D\u20AC\u005D-]%'
-- Remove any character in the input parameter if it is ASCII code
-- that is not compatible with XML i.e. a non-displable char
WHILE PatIndex(@regex, @Temp) > 0
SET @Temp = Stuff(@Temp, PatIndex(@regex, @Temp), 1, '')
RETURN @Temp
END
February 1, 2010 at 9:28 am
You could always do this on the application side after the user input and prior to the db call.
February 1, 2010 at 9:32 am
The user data is input using a different application over which I have no jurisdiction. This is the reason we have gone this way. I could do inline code in the Stored Procedure but the SP code would get uber-bloated so that isn't going to happen either.
February 1, 2010 at 9:43 am
mark.veitch (2/1/2010)
The user data is input using a different application over which I have no jurisdiction. This is the reason we have gone this way. I could do inline code in the Stored Procedure but the SP code would get uber-bloated so that isn't going to happen either.
Is this a 3rd party application, or in house? If it's "in house" you may be able to address this issue to the application side and let them know it's causing performance problems. If not, it seems the choices are limited.
Function that you're currently doing. This is nice in that it's all in one place so mainting the code is easy.
or
Snippets of TSQL in each sproc, but as you alluded to, can really get ugly.
February 1, 2010 at 9:49 am
I can't say for sure but I'm fairly certain that even if it's in-house then getting it fixed at source won't be an option. :angry:
There is the possibility that it might even be a mixture of in-house and third party. Looks like the clients may have to accept bad response times even though they are a vast improvement on what the previously had.
February 1, 2010 at 9:49 am
mark i believe you can also change the function to return a table value function, and use CROSS APPLY to apply it against your data for a big performance gain.
Lynn Pettis gave a great example of it in this thread a little while ago:
http://www.sqlservercentral.com/Forums/Topic842953-360-1.aspx?Highlight=fnGICSPFParseNumericForCobol
He(Lynn) took the inline function, changed it to a table value, and the original poster saw a huge performance gain:
Lowell
February 1, 2010 at 9:54 am
If you do not have access to the application code, you could look at using a trigger to clean the data as it was being saved.
February 1, 2010 at 9:59 am
Thank you very much Lowell. Just had a quick scan and although this is new to me I'm thinking it might be a possibility. About to sign off so I'll look more indepth tomorrow.
Once again, thanks for the input guys.
February 1, 2010 at 12:23 pm
Just to add a little more, you may want to take a look my blog post Comparing Hardcoded functions, In-Line TVF's, and Scalar Functions
Lowell, thank you for reference. Much appreciated! 🙂
February 1, 2010 at 12:44 pm
Converting your existing function into an Inline TVF is not going to be easy and it may not solve your problem. One reason I say that, NVARCHAR(MAX)/VARCHAR(MAX) fields. I am curious, how large is the longest field you have currently encountered? This may actually be one of those cases where a CLR function may be benefit, but don't ask me how to write it as I haven't had the chance to dive into that area yet.
The biggest problem you are facing right now in your current function is the RBAR, the WHILE loop. This is what is keeping the function from scaling well at the moment.
February 1, 2010 at 1:22 pm
I have some code for you, it requires that you have a Tally table to work. You should look at the fourth article I reference below in my signature block for more about Tally tables.
create function dbo.ValidString (
@pInputString nvarchar(max),
@pValidChars varchar(128) = '[^a-zA-Z0-9 !"#%&,/:;<=>@_`{}~%\$\(\)\*\+\.\?\[\^\|\u0009\u000A\u000D\u20AC\u005D-]'
)
returns table
as
return(
with ValidChars as (
select
substring(@pInputString, N, 1) as ValidChar
from
dbo.Tally
where
N <= len(@pInputString)
and substring(@pInputString, N, 1) not like '%' + @pValidChars + '%' -- '%[^a-zA-Z0-9 !"#%&,/:;<=>@_`{}~%\$\(\)\*\+\.\?\[\^\|\u0009\u000A\u000D\u20AC\u005D-]%'
)
select
replace((select ',' + ValidChar from ValidChars for xml path('')),',','') as ValidString
);
select * from dbo.ValidString('This is a Test String 1.',default)
select * from dbo.ValidString('This is a Test String 1.','[^a-zA-Z .]')
February 1, 2010 at 9:12 pm
mark.veitch (2/1/2010)
The following is our T-SQL function that is being called for all VarChar,Char,NChar and NVarChar fields in our SELECT statement in a stored proc. There is a lot of fields selected over a lot of rows on a lot of tables which all use the FOR XML statement. The users are cutting and pasting duff chars into fields.We cannot stop this at source, hence the conversion Function.
Does anyone have any thoughts or advice on speeding up / making more efficient this Function.
Any input appreciated.
ALTER FUNCTION dbo.fn_RemoveInvalidCharacters(@Temp NVarChar(max))
RETURNS NVarChar(max)
AS
BEGIN
DECLARE @regex VarChar(100)
SET @regex = '%[^a-zA-Z0-9 !"#%&,/:;<=>@_`{}~%\$\(\)\*\+\.\?\[\^\|\u0009\u000A\u000D\u20AC\u005D-]%'
-- Remove any character in the input parameter if it is ASCII code
-- that is not compatible with XML i.e. a non-displable char
WHILE PatIndex(@regex, @Temp) > 0
SET @Temp = Stuff(@Temp, PatIndex(@regex, @Temp), 1, '')
RETURN @Temp
END
A part of the performance problem is that you're using PATINDEX twice for each character removal. Try something like the following...
CREATE FUNCTION dbo.AlphaNumeric
(@String VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
DECLARE @IncorrectCharLoc SMALLINT
SELECT @IncorrectCharLoc = PATINDEX('%[^0-9A-Za-z]%', @String)
WHILE @IncorrectCharLoc > 0
SELECT @string = STUFF(@String, @IncorrectCharLoc, 1, ''),
@IncorrectCharLoc = PATINDEX('%[^0-9A-Za-z]%', @String)
RETURN @string
END
Of course, you'll need to use your own pattern string...
--Jeff Moden
Change is inevitable... Change for the better is not.
February 3, 2010 at 9:04 pm
mark.veitch (2/1/2010)
Thank you very much Lowell. Just had a quick scan and although this is new to me I'm thinking it might be a possibility. About to sign off so I'll look more indepth tomorrow.Once again, thanks for the input guys.
Okay, it has been a couple of days. What's happening?
February 3, 2010 at 9:40 pm
Lynn Pettis (2/3/2010)
mark.veitch (2/1/2010)
Thank you very much Lowell. Just had a quick scan and although this is new to me I'm thinking it might be a possibility. About to sign off so I'll look more indepth tomorrow.Once again, thanks for the input guys.
Okay, it has been a couple of days. What's happening?
Heh... it snowed here today. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
February 3, 2010 at 9:52 pm
Jeff Moden (2/3/2010)
Lynn Pettis (2/3/2010)
mark.veitch (2/1/2010)
Thank you very much Lowell. Just had a quick scan and although this is new to me I'm thinking it might be a possibility. About to sign off so I'll look more indepth tomorrow.Once again, thanks for the input guys.
Okay, it has been a couple of days. What's happening?
Heh... it snowed here today. 😛
Good. It's trying to here at the moment, but I know we won't get a snow day tomorrow.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply