PATINDEX and Regular Expression

  • Hello there,

    I have this table having a varchar column which contains

    numeric data + all kind of junk data (non-numeric chars) - not something in our environment control.

    For some data processing I want to convert these values to numeric whenever possible -

    so need to filter records which contain numeric values only.

    As numeric value rules, the value should *only* have

    - one or more digits [0-9]*

    - zero or more comma before decimal point [,]+

    - zero or one decimal point [.]?

    - an optional(zero or one) minus/plus sign *in the beginning* [\+\-]?

    I tried to use PATINDEX() as

    WHERE PATINDEX('[\+\-]?[0-9,]*[.]?[0-9]*', varcharValue) > 0

    But this doesn't seem to work 🙁

    It seems PATINDEX() doesn't allow this kind of pattern matching and I need to go via CLR route.

    Can anyone please confirm.

  • The following will determine which rows have valid numerics. If you need to convert the string to a number you may need another routine such as my BigInt converter.

    declare @t table (strval varchar(10))

    insert into @t values ('+123.45');

    insert into @t values ('-12,345.67');

    insert into @t values ('-1-2');

    select isnumeric(strval) from @t;

    /*

    -- Tests pass isnumeric AND fail IsBigInt AND fail cast(vc as bigint)

    -- range

    SELECT IsNumeric('-9223372036854775809'), dbo.IsBigInt('-9223372036854775809')

    SELECT IsNumeric('9223372036854775808'), dbo.IsBigInt('9223372036854775808')

    -- invalid chars

    SELECT IsNumeric('-5d2'), dbo.IsBigInt('-5d2')

    SELECT IsNumeric('-5e2'), dbo.IsBigInt('-5e2')

    SELECT IsNumeric('+3,4'), dbo.IsBigInt('+3,4')

    SELECT IsNumeric('+3.4'), dbo.IsBigInt('+3.4')

    -- pass this strange case

    SELECT IsNumeric('00000000000000000000000000001'), dbo.IsBigInt('00000000000000000000000000001')

    */

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.IsBigInt') AND type IN (N'FN', N'IF', N'TF', N'FS', N'FT'))

    DROP FUNCTION dbo.IsBigInt

    GO

    CREATE FUNCTION dbo.IsBigInt (@a varchar(30))

    returns bit

    AS

    BEGIN

    -- Submitted to SqlServerCentral by William Talada

    DECLARE

    @s-2 varchar(30),

    @i int,

    @IsNeg bit,

    @valid int

    -- assume the best

    SET @valid = 1

    SET @IsNeg=0

    SET @s-2 = ltrim(rtrim(@a))

    -- strip OFF negative sign

    IF len(@s) > 0

    AND LEFT(@s, 1) = '-'

    BEGIN

    SET @IsNeg=1

    SET @s-2 = RIGHT(@s, len(@s) - 1)

    END

    -- strip OFF positive sign

    IF len(@s) > 0

    AND LEFT(@s, 1) = '+'

    BEGIN

    SET @s-2 = RIGHT(@a, len(@a) - 1)

    END

    -- strip leading zeros

    while len(@s) > 1 and left(@s,1) = '0'

    set @s-2 = right(@s, len(@s) - 1)

    -- 19 digits max

    IF len(@s) > 19 SET @valid = 0

    -- the rest must be numbers only

    SET @i = len(@s)

    WHILE @i >= 1

    BEGIN

    IF charindex(substring(@s, @i, 1), '0123456789') = 0 SET @valid = 0

    SET @i = @i - 1

    END

    -- check range

    IF @valid = 1

    AND len(@s) = 19

    BEGIN

    IF @isNeg = 1 AND @s-2 > '9223372036854775808' SET @valid = 0

    IF @IsNeg = 0 AND @s-2 > '9223372036854775807' SET @valid = 0

    END

    RETURN @valid

    END

    go

  • This was removed by the editor as SPAM

  • Yes, that's what we are currentlly using, but it is giving unexpected result sometimes -

    for instance -

    - IsNumeric(x) returns true - when varchar value has both comma and a tab character (Char(9)),

    - IsNumeric(x) returns false - when varchar value has only tab character and no comma

  • This was removed by the editor as SPAM

  • SELECT *

    FROM (SELECT ID = 1, somecolumn = '399,243.01' UNION ALL -- YES

    SELECT ID = 2, '£399,243.01' UNION ALL -- NO

    SELECT ID = 3, '39924301' UNION ALL -- YES

    SELECT ID = 4, '399243.01' UNION ALL -- YES

    SELECT ID = 5, '399,24301' UNION ALL -- YES

    SELECT ID = 6, '399.243,01' UNION ALL -- YES

    SELECT ID = 7, '399 243,01' UNION ALL -- NO

    SELECT ID = 8, '-399.243,01' UNION ALL -- YES

    SELECT ID = 9, '+399.243,01' UNION ALL -- YES

    SELECT ID = 10, '_399.243,01' -- NO

    ) a

    WHERE somecolumn NOT LIKE '%[^,.0-9+-]%'

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • If you go via a CLR regex function, you can try this regex to see if it satisfies your spec:

    (?xm)^[+-]?\d{1,3}(,\d{3})+(\.\d+)?$

    |

    ^[+-]?\d{1,3}(\.\d+)?$

    tested in RegexBuddy vs this input:

    -4.012

    6berry

    +8

    -234,345,234.012

    7

    pbo.PoBox34

    got mathes:

    -4.012

    +8

    -234,345,234.012

    7

    without seeing your real input strings it's hard to write a regex that would fit it all though.

  • I think this gets you pretty close, w/o having to resort to CLR:

    WHERE

    --one +/- optional, but must be first if used

    PATINDEX('%[+-]%', varcharValue) IN (0, 1) AND

    varcharValue NOT LIKE '%[+-]%[+-]%' AND

    --only one decimal allowed

    varcharValue NOT LIKE '%.%.%' AND

    --only chars shown allowed; '-' must be LAST, or it will be taken as a range indicator, not a char

    varcharValue NOT LIKE '%[^0-9,.+-]%'

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply