Replace non numeric characters in string

  • Peter E. Kierstead (7/3/2008)


    I do agree... at the end of the day performance generally decides the issue. Its just that even a simple language like TSQL allows for a suprising number of ways to git-er-dun. I was just contibuting to the diversity!

    Jeff, nice examples.

    I know and I sure appreciate that. I'm just contributing to performance! I just wanted folks to know that recurrsion is a form of RBAR, that it runs terribly slow compared to other methods, and that a different method other than recurrsion should be used. Driving on the left side of the road in the U.S. is a form of diversity... that's dangerous, too! 😉

    Good luck with that torn page problem, Peter. Like I said, I've not ever had to deal with that. If you find the fix or the steps necessary to correct it, you might want to consider writing an article about it! It would help lots of folks...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I know I'm late to the party, but can anyone point me in the direction of Matt's CLR UDF for regex? I could use it for my Postcode validation project.

    And Peter, did you sort out your torn page problem without resorting to a restore?

    Ta

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • David Jackson (8/12/2008)


    I know I'm late to the party, but can anyone point me in the direction of Matt's CLR UDF for regex? I could use it for my Postcode validation project.

    And Peter, did you sort out your torn page problem without resorting to a restore?

    Ta

    Dave J

    I think Matt might agree with me on this one (maybe not :hehe:)... you don't need the overhead of a RegEx CLR for simple post code validation... just use what's available with LIKE...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Check-out the following SQL function.. this may help you!

    --------------------------------------------------------------

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_NULLS ON

    GO

    CREATE function fn_ExtractNumericValues(@pstrCharValue nvarchar(30))

    returns bigint

    as

    begin

    declare @intLoopCount as int

    declare @intIncrementLoopCount as int

    declare @strReadChar as varchar(1)

    declare @strInputString as nvarchar(30)

    declare @strNumericValue as nvarchar(30)

    declare @intFinalValue as bigint

    select @strNumericValue='',@intLoopCount=len(ltrim(rtrim(@pstrCharValue))),@intIncrementLoopCount=0,@strInputString=ltrim(rtrim(@pstrCharValue))

    if @intLoopCount=0

    begin

    return 1

    end

    while @intLoopCount >= @intIncrementLoopCount

    begin

    select @strReadChar= substring(@strInputString,@intIncrementLoopCount,1)

    if ascii(@strReadChar)>=48 and ascii(@strReadChar)<=57

    begin

    set @strNumericValue=@strNumericValue+@strReadChar

    end

    set @intIncrementLoopCount=@intIncrementLoopCount+1

    end

    set @intFinalValue = cast(@strNumericValue as bigint)

    return @intFinalValue

    end

    GO

    SET QUOTED_IDENTIFIER OFF

    GO

    SET ANSI_NULLS ON

    GO

  • Jeff Moden (8/12/2008)


    I think Matt might agree with me on this one ( maybe not :hehe: )... you don't need the overhead of a RegEx CLR for simple post code validation... just use what's available with LIKE...

    "simple post code validation?" (ducks) 😛

    Methinks you are not aware of the rules over here in the UK 😉

    Format Example Postcode

    AN NAA M1 1AA

    ANN NAA M60 1NW

    AAN NAA CR2 6XH

    AANN NAA DN55 1PT

    ANA NAA W1A 1HQ

    AANA NAA EC1A 1BB

    Please note the following:-

    The letters Q, V and X are not used in the first position.

    The letters I, J and Z are not used in the second position.

    The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, S, T, U and W.

    The only letters to appear in the fourth position are A, B, E, H, M, N, P, R, V, W, X and Y.

    The second half of the Postcode is always consistent numeric, alpha, alpha format and the letters C, I, K, M, O and V are never used.

    These conventions may change in the future if operationally required.

    GIR 0AA is a Postcode that was issued historically and does not confirm to current rules on valid Postcode formats, It is however, still in use.

    All of the above from http://www.govtalk.gov.uk/gdsc/html/frames/PostCode.htm

    So my regex looks like

    select @regExStr = '([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) {0,1}[0-9][A-Za-z]{2})'

    Which I'm sure is not quite there yet, 😉

    And here's a little test case

    declare @regExStr varchar(255)

    select @regExStr = '([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) {0,1}[0-9][A-Za-z]{2})'

    declare @TestTab Table (postcode varchar(12) not null)

    Insert @TestTab values(M1 1AA') --Valid

    Insert @TestTab values('M60 1NW') --Valid

    Insert @TestTab values('CR2 6XH') --Valid

    Insert @TestTab values('DN55 1PT') --Valid

    Insert @TestTab values('W1A 1HQ') --Valid

    Insert @TestTab values('EC1A 1BB') --Valid

    Insert @TestTab values('GIR 0AA') --Valid

    Insert @TestTab values('UK') --Invalid

    Insert @TestTab values('INDIA') --Invalid

    Insert @TestTab values('12345') --Invalid

    select * from @TestTab

    where postcode like @regExStr --This doesn't work!

    Where as a Regex function does. I have one, but it is slow as I am looping round my table as with my current solution I can only test whether it is a match, where as what I would like is to pass a string and extract any valid post code out of it.

    I could then say something like

    Update AddressTable

    Set Postcode = GetPostCodeFunction(Other Column)

    where postCode is Null and GetPostCodeFunction(Other Column) is NOT Null

    But I need the mythical GetPostCodeFunction...

    Cheers

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • Not really a regular expression but instead an inline function

    CREATE FUNCTIONdbo.fnPostCodeUK

    (

    @PostCode VARCHAR(8)

    )

    RETURNS BIT

    AS

    BEGIN

    RETURNCASE

    WHEN @PostCode LIKE '[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHKLMNOPQRSTUVWXY][0123456789][ABEHMNPRVWXY0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]' THEN 1

    WHEN @PostCode LIKE '[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHKLMNOPQRSTUVWXY0123456789][ABCDEFGHJKSTUW0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]' THEN 1

    WHEN @PostCode LIKE '[ABCDEFGHIJKLMNOPRSTUWYZ][0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]' THEN 1

    WHEN @PostCode = 'GIR 0AA' THEN 1

    ELSE 0

    END

    END


    N 56°04'39.16"
    E 12°55'05.25"

  • I have been looking at this

    set NoCount ON

    declare @TestTab Table (postcode varchar(50) not null)

    Insert @TestTab values('SK13 8LY') --Valid

    Insert @TestTab values('M1 1AA') --Valid

    Insert @TestTab values('M60 1NW') --Valid

    Insert @TestTab values('GIR 0AA') --Valid

    Insert @TestTab values('CR2 6XH') --Valid

    Insert @TestTab values('DN55 1PT') --Valid

    Insert @TestTab values('W1A 1HQ') --Valid

    Insert @TestTab values('EC1A 1BB') --Valid

    Insert @TestTab values('India') --Invalid

    Insert @TestTab values('12345') --Invalid

    Insert @TestTab values('Glossop SK13 8LY') --Invalid but contains a valid postcode

    select postcode

    from @TestTab

    where (postcode like '%GIR 0AA%'

    or postcode like '%[ABCDEFGHIJKLMNOPRSTUWYZ][0-9][0-9] [0-9][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]%'

    or postcode like '%[ABCDEFGHIJKLMNOPRSTUWYZ][0-9] [0-9][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]%'

    or postcode like '%[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHIJKLMNOPRSTUWYZ][0-9] [0-9][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]%'

    or postcode like '%[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHKLMNOPQRSTUVWXY][0-9][0-9] [0-9][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]%'

    or postcode like '%[ABCDEFGHIJKLMNOPRSTUWYZ][0-9][A-HJKSTUW] [0-9][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]%'

    or postcode like '%[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHKLMNOPQRSTUVWXY][0-9][ABEHMNPRVWXY] [0-9][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]%')

    What I want to do is extract the postcode out of the address field that incorrectly holds it, and only the postcode bit, and update the postcode column. Need something like patindex to get the start of the string.

    Good grief, I'm heading back into RBAR territory... :w00t:

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • PATINDEX approach as an inline function

    CREATE FUNCTIONdbo.fnExtractPostCodeUK

    (

    @data VARCHAR(8000)

    )

    RETURNS VARCHAR(8)

    AS

    BEGIN

    RETURNCOALESCE(

    SUBSTRING(@Data, NULLIF(PATINDEX('%[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHKLMNOPQRSTUVWXY][0123456789][ABEHMNPRVWXY0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]%', @data), 0), 8),

    SUBSTRING(@Data, NULLIF(PATINDEX('%[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHKLMNOPQRSTUVWXY0123456789][ABCDEFGHJKSTUW0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]%', @data), 0), 7),

    SUBSTRING(@Data, NULLIF(PATINDEX('%[ABCDEFGHIJKLMNOPRSTUWYZ][0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]%', @data), 0), 6),

    SUBSTRING(@Data, NULLIF(PATINDEX('%GIR 0AA%', @data), 0), 7)

    )

    END


    N 56°04'39.16"
    E 12°55'05.25"

  • Peso

    many, many thanks. I owe you a beer. 😀

    declare @TestTab Table (add1 varchar(50) null, postcode varchar(50) null)

    Insert @TestTab values('Glossop', 'SK13 8LY') --Valid

    Insert @TestTab values('M1 1AA',NULL) --Valid but wrong field

    Insert @TestTab values('Manchester', 'M60 1NW') --Valid

    Insert @TestTab values('PostOffice', 'GIR 0AA') --Valid

    Insert @TestTab values('CR2 6XH',NULL) --Valid but wrong field

    Insert @TestTab values('DN55 1PT',NULL) --Valid but wrong field

    Insert @TestTab values('London','W1A 1HQ') --Valid

    Insert @TestTab values('EC1A 1BB',Null) --Valid but wrong field

    Insert @TestTab values(NULL,'India') --Invalid

    Insert @TestTab values(NULL,'12345') --Invalid

    Insert @TestTab values('Glossop SK13 8LY',NULL) --Invalid but contains a valid postcode in wrong field

    select *

    from @TestTab

    Update @TestTab

    Set Postcode = dbo.fnExtractPostCodeUK(add1)

    where postCode is Null and dbo.fnExtractPostCodeUK(add1) is NOT Null

    select * from @TestTab

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • I don't think you need to use function twice in query.

    Update @TestTab

    Set Postcode = coalesce(postcode, dbo.fnExtractPostCodeUK(add1))


    N 56°04'39.16"
    E 12°55'05.25"

  • Try...

    Declare @i Int,@Str VarChar(max);Set @STR=' '+Char(9)+'a1b2c3'+char(0)+'?5.'

    Print '|'+@Str+'|';Print Len(@Str);

    Select @i=PatIndex('%[^0-9]%',@Str)

    While @i>0 Select @STR=Stuff(@Str,@i,1,''),@i=PatIndex('%[^0-9]%',@Str)

    Print '|'+@Str+'|';Print Len(@Str);



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • David Jackson (8/13/2008)


    Peso

    many, many thanks. I owe you a beer. 😀

    declare @TestTab Table (add1 varchar(50) null, postcode varchar(50) null)

    Insert @TestTab values('Glossop', 'SK13 8LY') --Valid

    Insert @TestTab values('M1 1AA',NULL) --Valid but wrong field

    Insert @TestTab values('Manchester', 'M60 1NW') --Valid

    Insert @TestTab values('PostOffice', 'GIR 0AA') --Valid

    Insert @TestTab values('CR2 6XH',NULL) --Valid but wrong field

    Insert @TestTab values('DN55 1PT',NULL) --Valid but wrong field

    Insert @TestTab values('London','W1A 1HQ') --Valid

    Insert @TestTab values('EC1A 1BB',Null) --Valid but wrong field

    Insert @TestTab values(NULL,'India') --Invalid

    Insert @TestTab values(NULL,'12345') --Invalid

    Insert @TestTab values('Glossop SK13 8LY',NULL) --Invalid but contains a valid postcode in wrong field

    select *

    from @TestTab

    Update @TestTab

    Set Postcode = dbo.fnExtractPostCodeUK(add1)

    where postCode is Null and dbo.fnExtractPostCodeUK(add1) is NOT Null

    select * from @TestTab

    Dave J

    Ooooohh! Be careful! Extraction in this manner can create some undesired results...

    SET NOCOUNT ON

    declare @TestTab Table (postcode varchar(50) not null)

    Insert @TestTab values('SK13 8LY') --Valid

    Insert @TestTab values('M1 1AA') --Valid

    Insert @TestTab values('M60 1NW') --Valid

    Insert @TestTab values('GIR 0AA') --Valid

    Insert @TestTab values('CR2 6XH') --Valid

    Insert @TestTab values('DN55 1PT') --Valid

    Insert @TestTab values('W1A 1HQ') --Valid

    Insert @TestTab values('EC1A 1BB') --Valid

    Insert @TestTab values('India') --Invalid

    Insert @TestTab values('12345') --Invalid

    Insert @TestTab values('Glossop SK13 8LY') --Invalid but contains a valid postcode

    INSERT @TestTab VALUES('XA1 1AA') --WHAT WILL THE FUNCTION SAY ABOUT THIS??? WAS JUST A PHAT-PHINGER on "X"

    INSERT @TestTab VALUES('AAA 1AA') --OR HOW ABOUT THIS???

    SELECT PostCode, dbo.fnExtractPostCodeUK(PostCode)

    FROM @TestTab

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David -

    I just noticed your post. Here's the RegexMatch code:

    Imports System

    Imports System.Data

    Imports System.Data.SqlClient

    Imports System.Data.SqlTypes

    Imports Microsoft.SqlServer.Server

    Imports System.Text.RegularExpressions

    Partial Public Class UserDefinedFunctions

    Private Const optionS As RegexOptions = RegexOptions.CultureInvariant

    <Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, IsPrecise:=True)> _

    Public Shared Function RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlString

    ' Add your code here

    Dim rex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern.Value, optionS)

    Return New SqlString(CType(rex.IsMatch(New String(input.Value)), String))

    End Function

    <Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, IsPrecise:=True)> _

    Public Shared Function RegexMatchString(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlString

    ' Add your code here

    Dim rex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern.Value, optionS)

    Return New SqlString(rex.Matches(input.Value)(0).ToString)

    End Function

    <Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, IsPrecise:=True)> _

    Public Shared Function RegexMatchGroup(ByVal input As SqlChars, ByVal pattern As SqlString, byval groupnum as SqlInt32) As SqlString

    Dim rex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern.Value, optionS)

    Return New SqlString(CType(rex.Matches(New String(input.Value))(0).Groups(groupnum.Value).ToString, String))

    End Function

    End Class

    Should get you where you need to be.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • What do you get for the following value?

    'XA1 1AA'

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is an updated version of fnValidatePostCodeUK function.

    Thank you Jeff for spotting the errors.

    //Peso

    CREATE FUNCTION dbo.fnValidatePostCodeUK

    (

    @PostCode VARCHAR(8)

    )

    RETURNS BIT

    AS

    BEGIN

    RETURNCASE

    -- AANN NAA

    WHEN @PostCode LIKE '[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHKLMNOPQRSTUVWXY][0123456789][0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]' THEN 1

    -- AANA NAA

    WHEN @PostCode LIKE '[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHKLMNOPQRSTUVWXY][0123456789][ABEHMNPRVWXY] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]' THEN 1

    -- ANN NAA

    WHEN @PostCode LIKE '[ABCDEFGHIJKLMNOPRSTUWYZ][0123456789][0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]' THEN 1

    -- AAN NAA

    WHEN @PostCode LIKE '[ABCDEFGHIJKLMNOPRSTUWYZ][ABCDEFGHKLMNOPQRSTUVWXY][0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]' THEN 1

    -- ANA NAA

    WHEN @PostCode LIKE '[ABCDEFGHIJKLMNOPRSTUWYZ][0123456789][ABCDEFGHJKSTUW] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]' THEN 1

    -- AN NAA

    WHEN @PostCode LIKE '[ABCDEFGHIJKLMNOPRSTUWYZ][0123456789] [0123456789][ABDEFGHJLNPQRSTUWXYZ][ABDEFGHJLNPQRSTUWXYZ]' THEN 1

    -- Special case GIR 0AA

    WHEN @PostCode LIKE 'GIR 0AA' THEN 1

    --

    ELSE 0

    END

    END


    N 56°04'39.16"
    E 12°55'05.25"

Viewing 15 posts - 61 through 75 (of 81 total)

You must be logged in to reply to this topic. Login to reply