August 15, 2012 at 8:33 am
Good afternoon.
I have the following function:
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[fn_QBRemoveNonAlphaChars]')
AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[fn_QBRemoveNonAlphaChars]
GO
CREATE FUNCTION [dbo].[fn_QBRemoveNonAlphaChars](@String VARCHAR(1000))
RETURNS VARCHAR(1000)
AS
BEGIN
WHILE PATINDEX('%[^a-z ''^-]%', @String) > 0
SET @String = STUFF(@String, PATINDEX('%[^a-z ''^-]%', @String), 1, '')
RETURN @String
END
GO
This works great most of the time, for example:
select dbo.[fn_QBRemoveNonAlphaChars]('The~@ CA:$%&*$()T - SA0099878778747T ~~~;;;!||o*&")|n the ma@@@@@t')
returns: The CAT - SAT on the mat which is perfect
However, if I try it with:
select dbo.[fn_QBRemoveNonAlphaChars]('The^~@ CA:$%&*$()T - SA0099878778747T ~~~;;;!||o*&")|n the ma@@@@@t')
It returns: The^ CAT - SAT on the mat
I've tried to escape it but I can't get it to work.
Any suggestions?
Thanks.
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 15, 2012 at 9:00 am
The fast way of working around it will be to modify the function's last line, so insead of the the line
RETURN @String
it would be
RETURN REPLACE(@String, '^','')
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
August 15, 2012 at 9:01 am
Total guess, no testing has gone into this: -
CREATE FUNCTION [dbo].[fn_QBRemoveNonAlphaChars] (@String VARCHAR(1000))
RETURNS VARCHAR(1000)
AS
BEGIN
WHILE PATINDEX('%[^a-z ''-]%', @String) > 0
SET @String = STUFF(@String, PATINDEX('%[^a-z ''-]%', @String), 1, '')
RETURN @String
END
You had the ^ character in your "allowed" pattern.
August 15, 2012 at 9:10 am
Cadavre, the ^ at the end is there to exclude the - from being removed.
Adi, I'm happy with your solution!
Thanks to you both.
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 15, 2012 at 9:24 am
Abu Dina (8/15/2012)
Cadavre, the ^ at the end is there to exclude the - from being removed.Adi, I'm happy with your solution!
Thanks to you both.
Test mine 😉
Your code: -
IF EXISTS (
SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[fn_QBRemoveNonAlphaChars]')
AND type IN (N'FN', N'IF', N'TF', N'FS', N'FT')
)
DROP FUNCTION [dbo].[fn_QBRemoveNonAlphaChars]
GO
CREATE FUNCTION [dbo].[fn_QBRemoveNonAlphaChars] (@String VARCHAR(1000))
RETURNS VARCHAR(1000)
AS
BEGIN
WHILE PATINDEX('%[^a-z ''^-]%', @String) > 0
SET @String = STUFF(@String, PATINDEX('%[^a-z ''^-]%', @String), 1, '')
RETURN @String
END
GO
SELECT dbo.[fn_QBRemoveNonAlphaChars](
'The^~@ CA:$%&*$()T - SA0099878778747T ~~~;;;!||o*&")|n the ma@@@@@t')
Returns: -
----------------------------
The^ CAT - SAT on the mat
My change: -
IF EXISTS (
SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[fn_QBRemoveNonAlphaChars]')
AND type IN (N'FN', N'IF', N'TF', N'FS', N'FT')
)
DROP FUNCTION [dbo].[fn_QBRemoveNonAlphaChars]
GO
CREATE FUNCTION [dbo].[fn_QBRemoveNonAlphaChars] (@String VARCHAR(1000))
RETURNS VARCHAR(1000)
AS
BEGIN
WHILE PATINDEX('%[^a-z ''-]%', @String) > 0
SET @String = STUFF(@String, PATINDEX('%[^a-z ''-]%', @String), 1, '')
RETURN @String
END
GO
SELECT dbo.[fn_QBRemoveNonAlphaChars](
'The^~@ CA:$%&*$()T - SA0099878778747T ~~~;;;!||o*&")|n the ma@@@@@t')
Returns: -
---------------------------
The CAT - SAT on the mat
August 15, 2012 at 9:32 am
Cadavre, my sincerest apologies! :blush:
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 15, 2012 at 9:35 am
Abu Dina (8/15/2012)
Cadavre, my sincerest apologies! :blush:
No problem. We're all here to learn and it's better to make a mistake on a forum than on production code, right? 😀
August 15, 2012 at 9:40 am
Hear hear
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 15, 2012 at 10:52 am
would you expect your function to remove high ascii characters as well?
i had posted a solution in your other thread on a similar strip function;
what would you expect for the results of this?
SELECT dbo.[fn_QBRemoveNonAlphaChars](
'ÀAlbèert ËEîinstêeiìn ÌInstìitúutëe - MPG')
Lowell
August 15, 2012 at 12:28 pm
Hi Lowell,
Just got home to see your reply.
I don't have access to my work right now but your method is just easier to understand and adapt.
Will try again tomorrow and report back.
Thanks for your time, much appreciated.
Regards.
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 15, 2012 at 7:59 pm
Abu Dina (8/15/2012)
Hi Lowell,Just got home to see your reply.
I don't have access to my work right now but your method is just easier to understand and adapt.
Will try again tomorrow and report back.
Thanks for your time, much appreciated.
Regards.
I'd like to suggest replacing your While loop driven scalar UDF with a cCTE (Cascading CTE) driven iSF (Inline Scalar Function) for the sake of performance. Here's the function I suggest... you can, of course, change the name if you decide to adopt it.
CREATE FUNCTION dbo.isfRemoveNonAlphaChars
(@pString VARCHAR(8000))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
SELECT CleanString =
(
SELECT SUBSTRING(@pString,t.N,1)
FROM cteTally t --No WHERE clause needed because of TOP above
WHERE SUBSTRING(@pString,t.N,1) LIKE '[a-z ''-]'
ORDER BY t.N
FOR XML PATH(''), TYPE
).value('text()[1]', 'varchar(8000)')
;
Now... let's do 2 slightly different tests. This first test "cleans" just 36 characters across just 10,000 rows.
--===== Conditionallly drop the test table to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL DROP TABLE #TestTable;
--===== Create and populate the test table on-the-fly.
SELECT TOP 10000
SomeMixedString = REPLICATE(CAST(NEWID() AS VARCHAR(50)),1)
INTO #TestTable
FROM sys.All_Columns ac1
CROSS JOIN sys.All_Columns ac2
;
GO
-------------------------------------------------------------
--===== Declare a variable to take display and disk times out of the picture
-- and another to measure duration with. You cannot use SET STATISTICS
-- on Scalar UDF's to measure performance because it will artificially
-- increase the duration by many times.
DBCC FREEPROCCACHE;
DECLARE @Bitbucket VARCHAR(50),
@StartTime DATETIME;
RAISERROR('========== Original Function ==========',0,1)
SELECT @StartTime = GETDATE();
SELECT @Bitbucket = dbo.fn_QBRemoveNonAlphaChars(SomeMixedString)
FROM #TestTable;
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO
--===== Declare a variable to take display and disk times out of the picture
-- and another to measure duration with. You cannot use SET STATISTICS
-- on Scalar UDF's to measure performance because it will artificially
-- increase the duration by many times.
DBCC FREEPROCCACHE;
DECLARE @Bitbucket VARCHAR(50),
@StartTime DATETIME;
RAISERROR('========== cCTE Function ==========',0,1)
SELECT @StartTime = GETDATE();
SELECT @Bitbucket = (SELECT CleanString FROM dbo.isfRemoveNonAlphaChars(SomeMixedString))
FROM #TestTable
--CROSS APPLY dbo.isfRemoveNonAlphaChars(SomeMixedString) c;
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO
Here's what I get on my 10 year old, single 1.8GHz desktop box for durations...
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== Original Function ==========
1383
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== cCTE Function ==========
1180
Now, let's try an identical test but with 4 times the width or just 144 characters wide.
--===== Conditionallly drop the test table to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL DROP TABLE #TestTable;
--===== Create and populate the test table on-the-fly.
SELECT TOP 10000
SomeMixedString = REPLICATE(CAST(NEWID() AS VARCHAR(50)),4)
INTO #TestTable
FROM sys.All_Columns ac1
CROSS JOIN sys.All_Columns ac2
;
GO
-------------------------------------------------------------
--===== Declare a variable to take display and disk times out of the picture
-- and another to measure duration with. You cannot use SET STATISTICS
-- on Scalar UDF's to measure performance because it will artificially
-- increase the duration by many times.
DBCC FREEPROCCACHE;
DECLARE @Bitbucket VARCHAR(50),
@StartTime DATETIME;
RAISERROR('========== Original Function ==========',0,1)
SELECT @StartTime = GETDATE();
SELECT @Bitbucket = dbo.fn_QBRemoveNonAlphaChars(SomeMixedString)
FROM #TestTable;
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO
--===== Declare a variable to take display and disk times out of the picture
-- and another to measure duration with. You cannot use SET STATISTICS
-- on Scalar UDF's to measure performance because it will artificially
-- increase the duration by many times.
DBCC FREEPROCCACHE;
DECLARE @Bitbucket VARCHAR(50),
@StartTime DATETIME;
RAISERROR('========== cCTE Function ==========',0,1)
SELECT @StartTime = GETDATE();
SELECT @Bitbucket = (SELECT CleanString FROM dbo.isfRemoveNonAlphaChars(SomeMixedString))
FROM #TestTable
--CROSS APPLY dbo.isfRemoveNonAlphaChars(SomeMixedString) c;
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO
Here are the duration results for that run...
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== Original Function ==========
11366
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== cCTE Function ==========
3236
The Scalar UDF with the While loop in it increased duration by about 7.2 times even though it was only given 3 times the amount of work.
The cCTE Function only increased duration by about 1.7 times even though it was given 3 times the amount of work.
If we run the same code again but 8 times as wide (288 characters) as the original, here are the run results ...
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== Original Function ==========
40303
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
========== cCTE Function ==========
5830
If you've never worked with a "Tally CTE" (made up of Cascading CTEs) or don't know how it works to replace certain while loops, please see the following article.
http://www.sqlservercentral.com/articles/T-SQL/62867/
--Jeff Moden
Change is inevitable... Change for the better is not.
August 16, 2012 at 4:11 am
Jeff, I'm in awe!
Respect my good man!
So we're rebuilding the string one character at a time and keeping only characters (LIKE '[a-z ''-]'). Maybe I'm just stupid but how do we go about replacing other characters with a blank space instead
SELECT CleanString =
(
SELECT SUBSTRING(@pString,t.N,1)
FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) LIKE '[a-z ''-]'
ORDER BY t.N
FOR XML PATH(''), TYPE
).value('text()[1]', 'varchar(8000)')
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 16, 2012 at 7:17 am
Abu Dina (8/16/2012)
Jeff, I'm in awe!Respect my good man!
So we're rebuilding the string one character at a time and keeping only characters (LIKE '[a-z ''-]'). Maybe I'm just stupid but how do we go about replacing other characters with a blank space instead
SELECT CleanString =
(
SELECT SUBSTRING(@pString,t.N,1)
FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) LIKE '[a-z ''-]'
ORDER BY t.N
FOR XML PATH(''), TYPE
).value('text()[1]', 'varchar(8000)')
This maybe?
CREATE FUNCTION dbo.isfRemoveNonAlphaChars
(@pString VARCHAR(8000))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
SELECT CleanString =
(
SELECT CASE WHEN SUBSTRING(@pString,t.N,1) LIKE '[a-z ''-]' THEN SUBSTRING(@pString,t.N,1) ELSE ' ' END
FROM cteTally t --No WHERE clause needed because of TOP above
ORDER BY t.N
FOR XML PATH(''), TYPE
).value('text()[1]', 'varchar(8000)')
;
August 16, 2012 at 7:41 am
Sorry
select cleanstring from dbo.isfn_QBRemoveNonAlphaChars('Abu???////Dina')
Gives back AbuDina so no spaces insrted. 🙁
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
August 16, 2012 at 8:01 am
Abu Dina (8/16/2012)
Sorry
select cleanstring from dbo.isfn_QBRemoveNonAlphaChars('Abu???////Dina')
Gives back AbuDina so no spaces insrted. 🙁
We've had a couple of different names for this function. There may be some confusion because the modification that Cadavre made works just fine. Make sure that you're executing the correct function.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply