Replace non numeric characters in string

  • one of the ways to do this kind of replacements is to go one level deeper in regex matching using Capturing Groups of a Match:

    consider this input string;

    string MySrtToClean = @"this number sh be intact: 3.4E-10; the rest of stuff d^&*( should go";

    our goal is : to retain number 3.4E-10 AND get rid of the rest of non-digits [\D]

    we do matching using this OR-ed regex:

    ([1-9]\d*(?:\.\d*)?[Ee]-?\d+)|(\D+)

    [capturing groups are in parentheses embracing both parts or the OR-ed *piped* regex]

    we get these 3 matches and CAPTURING GROUPS $1 and $2

    Match $1 $2

    this number sh be intact: this number sh be intact:

    3.4E-10 3.4E-10

    ; the rest of stuff d^&* ( should go ; the rest of stuff d^&*( should go

    now the fun part: we are going to REPLACE with an empty string ONLY GROUP $2 i.e rest of garbage, resulting in this cleaned output:

    3.4E-10

    to code this in C# u'll have to associate .NET Match Evaluator delegate with your CLR regexfunction

    dbo.regexreplace(MySrtToClean, '([1-9]\d*(?:\.\d*)?[Ee]-?\d+)|(\D+)', '')

    Match Evaluator will take both Capturing Groups and replace only the second one with empty string. You can use logic of any depth there.

    Match Evaluator will return modified match back to function dbo.regexreplace.

  • try and see how the above regex works in this online tester:

    http://regexlib.com/RETester.aspx

    it will not do replace, just will show u matches and groups

  • Hi Friends,

    What I am looking for is to replace all non-us special characters with a ' ' (Space)

    This can be for a column or a whole table.

    Help please...

  • Please define what the US non-special characters are...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CREATE function dbo.udfGetNumVal

    (

    @strInput nvarchar(50)

    )

    Returns DECIMAL(18,0)

    AS

    BEGIN

    DECLARE @iCtr INT

    DECLARE @AscVal INT

    DECLARE @NumVal NVARCHAR(100)

    SET @iCtr = 1

    SET @NumVal = ''

    WHILE @iCtr <= Len(@strInput)

    BEGIN

    SET @AscVal = Ascii(substring(@strInput,@iCtr,1))

    IF (@AscVal >= 48) and (@AscVal <= 57)

    BEGIN

    SET @NumVal = @NumVal + char(@AscVal)

    END

    SET @iCtr = @iCtr + 1

    END

    Return Convert(decimal,Coalesce(@NumVal,0))

    END

    BEST OF LUCK...!!!

  • gaurava16fc (7/2/2008)


    CREATE function dbo.udfGetNumVal

    (

    @strInput nvarchar(50)

    )

    Returns DECIMAL(18,0)

    AS

    BEGIN

    DECLARE @iCtr INT

    DECLARE @AscVal INT

    DECLARE @NumVal NVARCHAR(100)

    SET @iCtr = 1

    SET @NumVal = ''

    WHILE @iCtr <= Len(@strInput)

    BEGIN

    SET @AscVal = Ascii(substring(@strInput,@iCtr,1))

    IF (@AscVal >= 48) and (@AscVal <= 57)

    BEGIN

    SET @NumVal = @NumVal + char(@AscVal)

    END

    SET @iCtr = @iCtr + 1

    END

    Return Convert(decimal,Coalesce(@NumVal,0))

    END

    BEST OF LUCK...!!!

    Good solution but you don't need a WHILE loop for this...

    CREATE FUNCTION dbo.udfGetNumVal

    (

    @StrInput NVARCHAR(50)

    )

    RETURNS DECIMAL(18,0)

    AS

    BEGIN

    --===== Return Variable

    DECLARE @NumVal NVARCHAR(50)

    --===== Keep only digits

    SELECT @NumVal = COALESCE(@NumVal,'') + SUBSTRING(@StrInput,N,1)

    FROM dbo.Tally

    WHERE SUBSTRING(@StrInput,N,1) LIKE '[0-9]'

    AND N <= LEN(@StrInput)

    RETURN @NumVal

    END

    If you don't already know what a Tally table is, take a look at the following article...

    http://www.sqlservercentral.com/articles/TSQL/62867/

    Change the operand of the LIKE to suit your needs ...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... no WHILE loop, no TALLY table...

    Declare @OldStr VarChar(max),@NewStr VarChar(max);

    Select @NewStr='',@OldStr='ab123c';

    With NumOnly as

    (

    Select Case when SubString(@OldStr,1,1) like '[0-9]' then SubString(@OldStr,1,1) else '' End[Chr],1[Idx]

    Union All

    Select Case when SubString(@OldStr,Idx+1,1) like '[0-9]' then SubString(@OldStr,Idx+1,1) else '' End,Idx+1

    from NumOnly

    where Idx<Len(@OldStr)

    )

    Select @NewStr=@NewStr+Chr from NumOnly Option (MaxRecursion 32767);

    Print @NewStr;

    And this can be extended into many other areas! 🙂



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • Ups, had to changed a couple of things I missed when I constructed this from a couple of different functions...



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • And here's an OCCURS function using a CTE...

    Declare @STR VarChar(max);

    Select @STR='This is a test...';

    With Occurs as

    (

    Select SubString(@Str,1,1)[Chr],1[Idx]

    Union All

    Select SubString(@Str,Idx+1,1),Idx+1

    from Occurs

    where Idx<Len(@Str)

    )

    Select Count(*) from Occurs where Chr='t' Option (MaxRecursion 32767)



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • Peter E. Kierstead (7/2/2008)


    Ok... no WHILE loop, no TALLY table...

    Declare @OldStr VarChar(max),@NewStr VarChar(max);

    Select @NewStr='',@OldStr='ab123c';

    With NumOnly as

    (

    Select Case when SubString(@OldStr,1,1) like '[0-9]' then SubString(@OldStr,1,1) else '' End[Chr],1[Idx]

    Union All

    Select Case when SubString(@OldStr,Idx+1,1) like '[0-9]' then SubString(@OldStr,Idx+1,1) else '' End,Idx+1

    from NumOnly

    where Idx<Len(@OldStr)

    )

    Select @NewStr=@NewStr+Chr from NumOnly Option (MaxRecursion 32767);

    Print @NewStr;

    And this can be extended into many other areas! 🙂

    Many have come up with a similar solution... try it on a million rows and then try the Tally table... see what happens for yourself. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, every technique has its achilles heel... one of the CTE's is the recursion level...

    But this works great for most things I used to use WHILE loops or TALLY tables for. I like things that are self-contained with no external components (TALLY tables).

    Hey :cool:, I'm just trying to present alternatives 🙂



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • Peter E. Kierstead (7/2/2008)


    Yes, every technique has its achilles heel... one of the CTE's is the recursion level...

    But this works great for most things I used to use WHILE loops or TALLY tables for. I like things that are self-contained with no external components (TALLY tables).

    Hey :cool:, I'm just trying to present alternatives 🙂

    Heh... no problem... I've just found recurrsion to be as slow or slower than a loop and wanted to make sure that folks understood the performance risks of using, as you say, the achilles heel of CTE's.

    Speaking of "alternatives"... this will blow most recursive methods away for creating sequenced sets of numbers... will actually beat some of the larger Tally tables... IN FACT, here's some test code for just some of the methods ... this one shows just how slow recurrsion really is...

    SET NOCOUNT ON

    --=============================================================================

    -- Recursive CTE does the count

    --=============================================================================

    PRINT '========== Recursive CTE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Top INT

    SET @Top = 100000

    ;WITH cteTally

    AS (

    SELECT 1 AS N

    UNION ALL

    SELECT N+1 FROM cteTally WHERE N<@Top

    )

    SELECT N

    INTO #Test1

    FROM cteTally

    OPTION (MAXRECURSION 0)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100)

    GO

    --=============================================================================

    -- ROW_NUMBER CTE does the count

    --=============================================================================

    PRINT '========== ROW_NUMBER CTE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Top INT

    SET @Top = 100000

    ;WITH cteTally

    AS (

    SELECT TOP(@Top) ROW_NUMBER() OVER(ORDER BY sc1.Type) AS N

    FROM Master.dbo.spt_Values sc1,

    Master.dbo.spt_Values sc2

    )

    SELECT *

    INTO #Test2

    FROM cteTally

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100)

    GO

    --=============================================================================

    -- ROW_NUMBER query does the job directly

    --=============================================================================

    PRINT '========== ROW_NUMBER Query =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Top INT

    SET @Top = 100000

    SELECT TOP(@Top) ROW_NUMBER() OVER(ORDER BY sc1.Type) AS N

    INTO #Test3

    FROM Master.dbo.spt_Values sc1,

    Master.dbo.spt_Values sc2

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100)

    GO

    --=============================================================================

    -- IDENTITY query does the job directly

    --=============================================================================

    PRINT '========== IDENTITY =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Top INT

    SET @Top = 100000

    SELECT TOP(@Top) IDENTITY(INT,1,1) AS N

    INTO #Test4

    FROM Master.dbo.spt_Values sc1,

    Master.dbo.spt_Values sc2

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100)

    GO

    DROP TABLE #Test1,#Test2,#Test3,#Test4

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I do agree... at the end of the day performance generally decides the issue. Its just that even a simple language like TSQL allows for a suprising number of ways to git-er-dun. I was just contibuting to the diversity!

    Jeff, nice examples.



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • and... CRAP :angry:

    while running Jeff's examples I discovered torn pages in my master database...

    Msg 824, Level 24, State 2, Line 12

    SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0xffffffff; actual signature: 0xdd88ffff). It occurred during a read of page (1:353) in database ID 1 at offset 0x000000002c2000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\master.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • Hooo-boy! I've not had to deal with that one but I know it's serious and not so easy to fix. Good luck, Peter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 46 through 60 (of 81 total)

You must be logged in to reply to this topic. Login to reply