December 2, 2005 at 3:39 am
I am writing a UDF which takes tel numbers and converts them to the appropriate format. The first part of this process needs to strip out all non numeric characters.
I have tried
select replace( '+1-310-414-0909',[^0-9], '')
but it look as though you cant use wild cards with a replace function.
Anyone got a solution?
Many Thanks,
Jules
December 2, 2005 at 5:25 am
Jules,
I had a similar problem and dealt with it using the code below. If there is a better way to do it though I would also be interested to see the answer
declare @string_orig varchar(50)
declare @string_new varchar(50)
set @string_orig = '+1-310-414-0909'
set @string_new = ''
while @string_orig <> ''
begin
if left(@string_orig, 1) in ('0','1','2','3','4','5','6','7','8','9')
begin
set @string_new = @string_new + left(@string_orig, 1)
print @string_new
end
set @string_orig = right(@string_orig, len(@string_orig) -1)
end
print @string_new
dc
December 2, 2005 at 5:32 am
Thanks for the reply
I came up with a similair method
declare @tel
set @tel = '+1-310-414-0909'
declare @len int
declare @position int
declare @character varchar(1)
declare @number varchar (25)
select @len = len(@tel), @position = 1 , @number = ''
while @position <= @len
begin
set @character = case when (substring(@tel, @position, 1)) like '[0-9]'then substring(@tel, @position, 1) else '' end
select @number = @number + @character ,@position = @position +1
end
select @number
Annoying that it cant be dont with replace that it has to be done with a loop you would have thought there was a way to reurn just the numeric values from a string with nested functions?
Jules
December 2, 2005 at 8:45 am
I don't think you can avoid looping in one way or other in this case, and as noted, REPLACE doesn't deal well with wildcards... but PATINDEX does. So, you can use PATINDEX (in a loop) to look for unwanted charachters (which btw can be any combination defined by the wc). And once you get the position - instead of replace, use STUFF to replace the offending charachter with an empty string. Repeat as long as PATINDEX > 0.
If you wrap this idea into a function, it could look like this;
if object_id('dbo.stripAlpha') is not null drop function dbo.stripAlpha
go
create function dbo.stripAlpha( @s-2 varchar(8000) )
returns varchar(8000)
as
begin
while patindex('%[^0-9]%', @s-2) > 0
begin
set @s-2 = stuff(@s, patindex('%[^0-9]%', @s-2), 1, '')
end
return @s-2
end
go
Try it out on some testdata...
create table #x ( junkstring varchar(20) not null )
insert #x select '123 456 789abc0'
insert #x select '123/456,7890'
insert #x select '12-3456.78-9 0'
insert #x select 'xb123!456!7890'
go
select dbo.stripAlpha(junkstring) from #x
go
drop table #x
go
----------------------
1234567890
1234567890
1234567890
1234567890
(4 row(s) affected)
/Kenneth
December 3, 2005 at 12:27 am
Those are all great methods and I applaud the innovations of all... but as Remi G and Adam Mechanic would remind us, you might just want to think about finally creating a numbers table to do these sorts of things...
A numbers table (I call it a "Tally" table 'cause it's easier to say ) is nothing more than a table that contains a list of sequential numbers in a very well indexed format to enhance performance. I use the numbers 1-9999 because it covers the number 8000 for working on VARCHARS and also gives me the capability to generate 4 digit random numbers with ease (more on that later).
The Tally table can be used for a great number of things that would ordinarilly require a loop. What that means is that a lot of things can be done in a very near setbased manner without ever coming close to the performance drain that loops can sometimes cause.
So trust me for a minute... here's how to make a Tally table and notice, I didn't use a loop to make it, either...
--===== Create a tally table of numbers
SELECT TOP 9999 IDENTITY(INT,1,1) AS N --Trust me, leave as INT
INTO dbo.Tally
FROM dbo.SYSCOLUMNS sc1,
dbo.SYSCOLUMNS sc2
--===== Give the tally table a clustered primary key for speed
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
--===== Grant SELECT privs to the public
GRANT SELECT ON dbo.Tally TO PUBLIC
GO
Just a quick note on SYSCOLUMNS... even a brand spankin' new database will have at least 256 rows in SYSCOLUMNS. Since theres no join clause or a WHERE clause, it creates a cross join capable of generating at least 256 x 256 or 65536 rows. The TOP clause limits that, of course. If you need a bigger Tally table, just add more instances of SYSCOLUMNS. You don't need many... 4 instances will create a minimum of 256 x 256 x 256 x 256 or over 4,294,967,296 rows! That's twice as big as the max value for an INT or twice the number of bytes a TEXT field can hold. If you need something bigger, you might just be doing something wrong. Like I said, I keep mine down to 9999.
Ok, so how are we gonna use this stupid little Tally table to solve all our looping problems for this task? Well, let's create a function that will accept a garbage filled monster string and return a numbers only cleaned up version as Jules requested... we'll use the Tally table to "step through" the characters in the original string...
--===== Create a function to clean telephone and other numbers
CREATE FUNCTION dbo.CleanNumber (@StringNumber VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
--===== Declare local variables
DECLARE @NewString VARCHAR(8000)
SET @NewString = ''
--===== Strip out ALL non-digit characters using the Tally table
-- to control the position being worked in the SubStrings
SELECT @NewString = @NewString + SUBSTRING(@StringNumber,N,1)
FROM dbo.Tally WITH (NOLOCK)
WHERE N <= LEN(@StringNumber)
AND SUBSTRING(@StringNumber,N,1) LIKE '%[0-9]%'
ORDER BY N
--===== Return the cleaned string (NULL if no numbers present)
RETURN NULLIF(@NewString,'')
END
GO
--===== Grant EXECUTE privs to the public
GRANT EXECUTE ON dbo.CleanNumber TO PUBLIC
GO
"N" is the name of the numbers column in the Tally table... notice how it is present as the "starting position" in the SUBSTRING references. THAT'S WHAT DOES THE "LOOPING" except it's not a loop... it's a nasty fast near set based alternative to a loop.
Ok, I sense skeptics in the crowd... I used to be one, too, so I understand... but ya just gotta try this... here's some code to try the function out... then do some timing tests...
--===== If the temporary test table exists, drop it
IF OBJECT_ID('TempDB..#MyHead') IS NOT NULL
DROP TABLE #MyHead
--===== Create a test table full of mixed numbers
SELECT '+1-310-414-0909' AS MixedNumber
INTO #MyHead UNION ALL
SELECT 'k3j4j5h6h7h5kj4h4khk343&%@^)&#*(&$%&#%$!@12' UNION ALL
SELECT '^(%$%)(DHFHSDFHOD)3(& )(F^)YFGFYP(DFY_(DYFYF' UNION ALL
SELECT '1234567890' UNION ALL
SELECT 'ABCDEF'
--===== Demo the function
SELECT MixedNumber AS OriginalNumber,
dbo.CleanNumber(MixedNumber) AS CleanedNumber
FROM #MyHead
I'm thinkin' that's pretty swift and I thank Adam and Remi everytime I use the Tally table to solve these types of problems and others. And, yeah, I know... I could have done the COALESCE thing in the SELECT instead of presetting @NewString to an empty string but I'm thinking that removing one function for each set based interation can only improve the performance... and, it does when you're talking about using the function on thousands of records.
Oh yeah... almost forgot... let's say that you want to generate 6 random whole numbers from, say, 10 to 99. (What? I dunno! Maybe you want to run your own lottery or something). We all know what a pain it would be to do that using other methods. How can you use the Tally table to do that? Check THIS out...
SELECT TOP 6 N
FROM dbo.Tally WITH (NOLOCK)
WHERE N BETWEEN 10 AND 99
ORDER BY NEWID()
That's it! That's all the code it takes! Works like a charm, too.
One more wierd one just for yuks... how about we find the first 1000 multiples of 5?
SELECT TOP 1000 (t1.N * t2.N)-1 AS Count5
FROM dbo.Tally t1,
dbo.Tally t2
WHERE (t1.N-1)%5 = 0
Because the Tally table has a clustered primary key on "N", we don't even need to sort that pig! (Trust me, don't put an ORDER BY on it unless you limit the values of "N" on both Tally table instances in the WHERE clause).
You can use the Tally table to parse CSV's, do Proper Casing, remove special characters, and tons more... all without the headache or performance drain of a loop.
Hope this helps...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2005 at 3:38 am
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply