March 17, 2008 at 11:30 am
Hi,
I know the following code will replace single defined character (*)
REPLACE(Phone, '*', '')
but I need to replace (remove) any non-numeric character found within a string. Can anyone help please?
Thanks in advance,
Neal
March 17, 2008 at 12:29 pm
USE [Common]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER function [dbo].[AlphaRemove]
(@String_in varchar(max))
returns varchar(max)
as
begin
/*
*****Takes a string variable and turns it into a set of
*****numbers separated by spaces.
*****Despite the name, it also removes punctuation, not
*****just letters.
*****Input string must be simple ASCII, not Unicode.
*****(No accented letters, etc.)
*/
declare @sub char(1)
--Letters
while patindex('%[a-z]%', @string_in) > 0
begin
set @sub = substring(@string_in, patindex('%[a-z]%', @string_in), 1)
set @string_in = replace(@string_in, @sub, '')
end
--Punctuation
while patindex('%[!-)]%', @string_in) > 0
begin
set @sub = substring(@string_in, patindex('%[!-/]%', @string_in), 1)
set @string_in = replace(@string_in, @sub, '')
end
while patindex('%[+-/]%', @string_in) > 0
begin
set @sub = substring(@string_in, patindex('%[!-/]%', @string_in), 1)
set @string_in = replace(@string_in, @sub, '')
end
while patindex('%[:-=]%', @string_in) > 0
begin
set @sub = substring(@string_in, patindex('%[:-@]%', @string_in), 1)
set @string_in = replace(@string_in, @sub, '')
end
while patindex('%[?-@]%', @string_in) > 0
begin
set @sub = substring(@string_in, patindex('%[:-@]%', @string_in), 1)
set @string_in = replace(@string_in, @sub, '')
end
set @string_in = replace(@string_in, '[', '')
while patindex('%[\-`]%', @string_in) > 0
begin
set @sub = substring(@string_in, patindex('%[\-`]%', @string_in), 1)
set @string_in = replace(@string_in, @sub, '')
end
while patindex('%[{-~]%', @string_in) > 0
begin
set @sub = substring(@string_in, patindex('%[{-~]%', @string_in), 1)
set @string_in = replace(@string_in, @sub, '')
end
while charindex(' ', @string_in, 0) > 0
set @string_in = replace(@string_in, ' ', ' ')
return @string_in
end
It's pretty RBAR, but it does work. One of these days, I'll rewrite it to be set-based. (Originally wrote this thing years ago. Was clueless on SQL.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 17, 2008 at 12:32 pm
Change the final While loop to get rid of any spaces (right now, it gets rid of double-spaces), and it should do what you need.
If you have extended characters, you'll need to add the ranges for those to the thing.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 17, 2008 at 12:42 pm
I haven't tested this enough to be sure, but it should work:
create function dbo.NumberClean
(@String_in nvarchar(100))
returns bigint
as
begin
declare @NumberStr nvarchar(100), @Number_out bigint
;with Chars (Seq, Chr) as
(select number, substring(@string_in, number, 1)
from dbo.Numbers
where number between 0 and len(@string_in))
select @numberstr = coalesce(@numberstr + chr, chr)
from chars
where chr like '[0-9]'
select @number_out = @numberstr
return @number_out
end
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 17, 2008 at 1:03 pm
You can also use regular expression functions.
regular expressions function
CREATE FUNCTION dbo.regexReplace
(
@source varchar(5000),
@regexp varchar(1000),
@replace varchar(1000),
@globalReplace bit = 0,
@ignoreCase bit = 0
)
RETURNS varchar(1000) AS
BEGIN
DECLARE @hr integer
DECLARE @objRegExp integer
DECLARE @result varchar(5000)
EXECUTE @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'Global', @globalReplace
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignoreCase
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OAMethod @objRegExp, 'Replace', @result OUTPUT, @source, @replace
IF @hr <> 0 BEGIN
EXEC @hr = sp_OADestroy @objRegExp
RETURN NULL
END
EXECUTE @hr = sp_OADestroy @objRegExp
IF @hr <> 0 BEGIN
RETURN NULL
END
RETURN @result
END
GO
the code to get the result:
select dbo.regexReplace(MyColumn, '[^0-9]', '', 1, 1 )
from MyTable
Link with a list of regular expression functions:
March 17, 2008 at 1:12 pm
How about this?
declare @pos smallint
declare @string varchar(100)
set @string = '1109A><":{$%^&*4DSE2@!~$%^&567KJHGT'
while isnumeric(@string+'e0') = 0
begin
set @pos = (select patindex('%[^0-9]%',@string))
set @string = (select replace(@string,substring(@string,@pos,1),''))
end
select @string
there is a reson to add 'e0' to the string. check this out,
select IsNumeric('1002e0')
in this case IsNumeric returns 1. So in the above example @String never get replaced with ''. Adding 'e0' again would resolve that.
read more about this:
March 17, 2008 at 6:07 pm
GSquared (3/17/2008)
I haven't tested this enough to be sure, but it should work:
Absolutely the right idea with the numbers table, Gus 🙂 but you don't need the CTE or a derived table.
CREATE FUNCTION dbo.fnDigitsOnly (@pString VARCHAR(8000))
-- Tally table can be found at http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/
RETURNS VARCHAR(8000) AS
BEGIN
DECLARE @CleanString VARCHAR(8000)
SELECT @CleanString = ISNULL(@CleanString,'')+SUBSTRING(@pString,N,1)
FROM dbo.Tally WITH (NOLOCK)
WHERE N<=LEN(@pString)
AND SUBSTRING(@pString,N,1) LIKE ('[0-9]')
RETURN @CleanString
END
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2008 at 2:25 am
Thanks to all concerned.
Sorry I'm late getting back, but I'll try them out now.
Very much appreciated!
March 18, 2008 at 8:01 am
Thanks Jeff. That's definitely better. (As mentioned, mine was thrown together and tested against one case.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 18, 2008 at 8:03 am
Sorry Gus... I actually didn't mean it that way (better)... I just meant it's different and you don't need the CTE if you don't want it. They both run pretty darned fast compared to the While loop methods.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 19, 2008 at 11:49 am
Jeff Moden (3/18/2008)
Sorry Gus... I actually didn't mean it that way (better)... I just meant it's different and you don't need the CTE if you don't want it. They both run pretty darned fast compared to the While loop methods.
Whether that's what you meant or not, when I said, "better", I meant "better". Here's why:
alter FUNCTION dbo.NumberClean1 (@pString VARCHAR(8000))
-- Tally table can be found at http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/
RETURNS VARCHAR(8000) AS
BEGIN
DECLARE @CleanString VARCHAR(8000)
SELECT @CleanString = ISNULL(@CleanString,'')+SUBSTRING(@pString,Number,1)
FROM dbo.Numbers WITH (NOLOCK)
WHERE Number<=LEN(@pString)
AND SUBSTRING(@pString,Number,1) LIKE ('[0-9]')
RETURN @CleanString
END;
go
create function dbo.NumberClean2
(@String_in nvarchar(100))
returns bigint
as
begin
declare @NumberStr nvarchar(100), @Number_out bigint
;with Chars (Seq, Chr) as
(select number, substring(@string_in, number, 1)
from dbo.Numbers
where number between 0 and len(@string_in))
select @numberstr = coalesce(@numberstr + chr, chr)
from chars
where chr like '[0-9]'
select @number_out = @numberstr
return @number_out
end
Creates both functions. I did this on my test box in my ProofOfConcept database.
Here's the test set-up:
create table NumberClean (
Number bigint,
Clean varchar(100))
go
set nocount on
insert into dbo.numberclean(number, clean)
select 100, '100 ' union all
select 5, 'number 5' union all
select 15, '643' union all
select 1234567890123456, '1234567890123456 pounds'
go 1000000
The idea is the Number column is what it should return, based on the string in the Clean column. Each row was repeated 1-million times, giving 4-million rows of data. I deliberately included one where the "expected result" (15) was not at all what should be returned (643), since I strong believe that any test that doesn't check for wrong results from bad input is an incomplete test.
The test:
set statistics io on
set statistics time on
select count(*)
from dbo.numberclean
where dbo.numberclean1(clean) != number
Deliberately forcing a RBAR situation, to stress-test the function at its worst. Again, this is intentional. The second test was identical, just changed the 1 to a 2 at the end of the function name.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 19, 2008 at 11:51 am
(Had to split this post in half to avoid an error on the boards. Not sure why.)
Results:
===================================================
Proc 1 Run 1
===================================================
SQL Server parse and compile time:
CPU time = 1266 ms, elapsed time = 1445 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(1 row(s) affected)
Table 'NumberClean'. Scan count 1, logical reads 15626, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 620422 ms, elapsed time = 1278261 ms.
===================================================
Proc 1 Run 2
===================================================
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(1 row(s) affected)
Table 'NumberClean'. Scan count 1, logical reads 15626, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 619547 ms, elapsed time = 1261285 ms.
===================================================
Proc 2 Run 1
===================================================
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
(1 row(s) affected)
Table 'NumberClean'. Scan count 1, logical reads 15626, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 748844 ms, elapsed time = 1552001 ms.
===================================================
Proc 2 Run 2
===================================================
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
(1 row(s) affected)
Table 'NumberClean'. Scan count 1, logical reads 15626, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 757657 ms, elapsed time = 1548412 ms.
===================================================
As you can see, the IO stats are the same for all four runs, but the second function (my version with the CTE) takes a little more than 10% longer in CPU time. Yours is, indeed, slightly better.
The difference is minor, but definite.
So, when I said it was "better", don't appologize, just realize I mean it. 🙂
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 19, 2008 at 11:56 am
For my next test, to get rid of some of the RBAR (though not eliminating it completely), I added an ID column to the table, and then ran:
;with
CTE1 (ID, Position, Val) as
(select id, numbers.number, substring(reverse(rtrim(clean)), numbers.number, 1)
from dbo.numbercleanb
inner join dbo.numbers
on numbers.number between 1 and len(clean)),
CTE2 (ID, Pos, Val) as
(select id, row_number() over (partition by id order by position), val
from cte1
where val like '[0-9]'),
CTE3 (ID, Number) as
(select id, sum(power(cast(10 as bigint), cast(pos-1 as bigint)) * cast(val as bigint))
from cte2
group by id)
select count(*)
from cte3
inner join dbo.numbercleanb
on cte3.id = numbercleanb.id
and cte3.number != numbercleanb.number
And got:
================================
Script 3 Run 1
================================
SQL Server parse and compile time:
CPU time = 32 ms, elapsed time = 32 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(1 row(s) affected)
Table 'Numbers'. Scan count 4000014, logical reads 12500078, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'NumberCleanB'. Scan count 3, logical reads 12018874, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 174547 ms, elapsed time = 136324 ms.
================================
Script 3 Run 2
================================
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(1 row(s) affected)
Table 'Numbers'. Scan count 4000014, logical reads 12500078, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'NumberCleanB'. Scan count 3, logical reads 12018874, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 175969 ms, elapsed time = 138154 ms.
=====================================
As you can see, a LOT more IO, but it also finishes in a little over 2 minutes instead of a little over 21 minutes (which was the better of the two inline functions; the other inline function took over 25 minutes). The CPU time is approximately 1/4th of the better of the two inline functions.
Assuming the Numbers table is in memory, even that many scans and reads shouldn't hurt too much.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 19, 2008 at 12:25 pm
don't MAKE me whip out a CLR UDF for regex....:)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 19, 2008 at 12:34 pm
Heh... bring it on... you and I have both been bragging about how your Regex stuff is the only CLR that we've seen so far that could actually beat T-SQL solutions. I think this is a great place to prove it.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 81 total)
You must be logged in to reply to this topic. Login to reply