using Replace with wild cards

  • I am writing a UDF which takes tel numbers and converts them to the appropriate format. The first part of this process needs to strip out all non numeric characters.

    I have tried

    select replace( '+1-310-414-0909',[^0-9], '')

    but it look as though you cant use wild cards with a replace function.

    Anyone got a solution?

    Many Thanks,

    Jules

     

    www.sql-library.com[/url]

  • Jules,

    I had a similar problem and dealt with it using the code below. If there is a better way to do it though I would also be interested to see the answer

    declare @string_orig varchar(50)

    declare @string_new varchar(50)

    set @string_orig = '+1-310-414-0909'

    set @string_new = ''

    while @string_orig <> ''

     begin

      if left(@string_orig, 1) in ('0','1','2','3','4','5','6','7','8','9')

       begin 

        set @string_new = @string_new + left(@string_orig, 1)

        print @string_new

       end

      set @string_orig = right(@string_orig, len(@string_orig) -1)

     end

    print @string_new

    dc

     

     

  • Thanks for the reply

    I came up with a similair method

     

    declare @tel

    set @tel = '+1-310-414-0909'

    declare @len int

    declare @position int

    declare @character varchar(1)

    declare @number varchar (25)

    select @len = len(@tel), @position = 1 , @number = ''

    while @position <= @len 

    begin

    set @character = case when (substring(@tel, @position, 1)) like '[0-9]'then substring(@tel, @position, 1) else '' end

    select @number = @number + @character ,@position  = @position +1

    end

    select @number

     

    Annoying that it cant be dont with replace that it has to be done with a loop you would have thought there was a way to reurn just the numeric values from a string with nested functions?

     

    Jules

    www.sql-library.com[/url]

  • I don't think you can avoid looping in one way or other in this case, and as noted, REPLACE doesn't deal well with wildcards... but PATINDEX does. So, you can use PATINDEX (in a loop) to look for unwanted charachters (which btw can be any combination defined by the wc). And once you get the position - instead of replace, use STUFF to replace the offending charachter with an empty string. Repeat as long as PATINDEX > 0.

    If you wrap this idea into a function, it could look like this;

    if object_id('dbo.stripAlpha') is not null drop function dbo.stripAlpha

    go

    create function dbo.stripAlpha( @s-2 varchar(8000) )

    returns varchar(8000)

    as

    begin

     while patindex('%[^0-9]%', @s-2) > 0

      begin

       set @s-2 = stuff(@s, patindex('%[^0-9]%', @s-2), 1, '')

      end

      return @s-2

    end

    go

    Try it out on some testdata...

    create table #x ( junkstring varchar(20) not null )

    insert #x select '123 456 789abc0'

    insert #x select '123/456,7890'

    insert #x select '12-3456.78-9 0'

    insert #x select 'xb123!456!7890'

    go

    select dbo.stripAlpha(junkstring) from #x

    go

    drop table #x

    go

    ----------------------

    1234567890

    1234567890

    1234567890

    1234567890

    (4 row(s) affected)

    /Kenneth

     

  • Those are all great methods and I applaud the innovations of all... but as Remi G and Adam Mechanic would remind us, you might just want to think about finally creating a numbers table to do these sorts of things...

    A numbers table (I call it a "Tally" table 'cause it's easier to say ) is nothing more than a table that contains a list of sequential numbers in a very well indexed format to enhance performance.  I use the numbers 1-9999 because it covers the number 8000 for working on VARCHARS and also gives me the capability to generate 4 digit random numbers with ease (more on that later).

    The Tally table can be used for a great number of things that would ordinarilly require a loop.  What that means is that a lot of things can be done in a very near setbased manner without ever coming close to the performance drain that loops can sometimes cause.

    So trust me for a minute... here's how to make a Tally table and notice, I didn't use a loop to make it, either...

    --===== Create a tally table of numbers

     SELECT TOP 9999 IDENTITY(INT,1,1) AS N --Trust me, leave as INT

       INTO dbo.Tally

       FROM dbo.SYSCOLUMNS sc1,

            dbo.SYSCOLUMNS sc2

    --===== Give the tally table a clustered primary key for speed

      ALTER TABLE dbo.Tally

            ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

    --===== Grant SELECT privs to the public

      GRANT SELECT ON dbo.Tally TO PUBLIC

    GO

    Just a quick note on SYSCOLUMNS... even a brand spankin' new database will have at least 256 rows in SYSCOLUMNS.  Since theres no join clause or a WHERE clause, it creates a cross join capable of generating at least 256 x 256 or 65536 rows.  The TOP clause limits that, of course.  If you need a bigger Tally table, just add more instances of SYSCOLUMNS.  You don't need many... 4 instances will create a minimum of 256 x 256 x 256 x 256 or over 4,294,967,296 rows!  That's twice as big as the max value for an INT or twice the number of bytes a TEXT field can hold.  If you need something bigger, you might just be doing something wrong.  Like I said, I keep mine down to 9999.

    Ok, so how are we gonna use this stupid little Tally table to solve all our looping problems for this task?  Well, let's create a function that will accept a garbage filled monster string and return a numbers only cleaned up version as Jules requested... we'll use the Tally table to "step through" the characters in the original string...

    --===== Create a function to clean telephone and other numbers

     CREATE FUNCTION dbo.CleanNumber (@StringNumber VARCHAR(8000))

    RETURNS VARCHAR(8000)

         AS

      BEGIN

            --===== Declare local variables

            DECLARE @NewString VARCHAR(8000)

                SET @NewString = ''

            --===== Strip out ALL non-digit characters using the Tally table

                 -- to control the position being worked in the SubStrings

             SELECT @NewString = @NewString + SUBSTRING(@StringNumber,N,1)

               FROM dbo.Tally WITH (NOLOCK)

              WHERE N <= LEN(@StringNumber)

                AND SUBSTRING(@StringNumber,N,1) LIKE '%[0-9]%'

              ORDER BY N

            --===== Return the cleaned string (NULL if no numbers present)

             RETURN NULLIF(@NewString,'')

        END

    GO

    --===== Grant EXECUTE privs to the public

      GRANT EXECUTE ON dbo.CleanNumber TO PUBLIC

    GO

    "N" is the name of the numbers column in the Tally table... notice how it is present as the "starting position" in the SUBSTRING references.  THAT'S WHAT DOES THE "LOOPING" except it's not a loop... it's a nasty fast near set based alternative to a loop.

    Ok, I sense skeptics in the crowd... I used to be one, too, so I understand... but ya just gotta try this... here's some code to try the function out... then do some timing tests...

    --===== If the temporary test table exists, drop it

         IF OBJECT_ID('TempDB..#MyHead') IS NOT NULL

            DROP TABLE #MyHead

    --===== Create a test table full of mixed numbers

    SELECT '+1-310-414-0909' AS MixedNumber

      INTO #MyHead UNION ALL

    SELECT 'k3j4j5h6h7h5kj4h4khk343&%@^)&#*(&$%&#%$!@12' UNION ALL

    SELECT '^(%$%)(DHFHSDFHOD)3(&   )(F^)YFGFYP(DFY_(DYFYF' UNION ALL

    SELECT '1234567890' UNION ALL

    SELECT 'ABCDEF'

    --===== Demo the function

     SELECT MixedNumber AS OriginalNumber,

            dbo.CleanNumber(MixedNumber) AS CleanedNumber

       FROM #MyHead

    I'm thinkin' that's pretty swift and I thank Adam and Remi everytime I use the Tally table to solve these types of problems and others.  And, yeah, I know... I could have done the COALESCE thing in the SELECT instead of presetting @NewString to an empty string but I'm thinking that removing one function for each set based interation can only improve the performance... and, it does when you're talking about using the function on thousands of records.

    Oh yeah... almost forgot... let's say that you want to generate 6 random whole numbers from, say, 10 to 99.  (What? I dunno! Maybe you want to run your own lottery or something).  We all know what a pain it would be to do that using other methods.   How can you use the Tally table to do that?   Check THIS out...

     SELECT TOP 6 N

       FROM dbo.Tally WITH (NOLOCK)

      WHERE N BETWEEN 10 AND 99

      ORDER BY NEWID()

    That's it!  That's all the code it takes!  Works like a charm, too.

    One more wierd one just for yuks... how about we find the first 1000 multiples of 5?

     SELECT TOP 1000 (t1.N * t2.N)-1 AS Count5

       FROM dbo.Tally t1,

            dbo.Tally t2

      WHERE (t1.N-1)%5 = 0

    Because the Tally table has a clustered primary key on "N", we don't even need to sort that pig!  (Trust me, don't put an ORDER BY   on it unless you limit the values of "N" on both Tally table instances in the WHERE clause).

    You can use the Tally table to parse CSV's, do Proper Casing, remove special characters, and tons more... all without the headache or performance drain of a loop.

    Hope this helps...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks alot that great. I hadnt come accross this ieda before. Interesting that it uses set operations but relies on the fact that the record sets will be processed row by row. The implementation is actually procedual behide the scenes...

    www.sql-library.com[/url]

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply