How to cast bad data

  • I am querying a table that has a columnA with character data in it. I am using CAST(columnA AS int) in my query. This works up until the data is bad. Like the value '000w'. Is there a query syntax that can attempt the cast and when it fails on a row just give me the number 0 instead? Kind of like using COALESCE() ?

    Any hints greatly apreciated

  • There is not a function that does that, but you can use the isnumeric function and a case like this:

    Select

    Case

    When IsNumeric(column) = 1 Then Cast(column as int)

    Else 0

    End

    From

    table

  • That worked Thanks a bunch

  • Really? Try this and see if ISNUMERIC is actually what you want...

    SELECT ISNUMERIC('3D2'), ISNUMERIC('3E2')

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Mmmm.. I see what you mean so what would you do instead?

  • I was going to ask Jeff the same question. I knew there were issues with IsNumeric, try IsNumeric('$'), but I've usually been able to deal with those issues in a separate manner. I had never thought about the Hex numbers. I would venture to guess that happening is rare, but Jeff is right it could be an issue. I don't know what Jeff would suggest for taking care of this maybe something like:

    [font="Courier New"]DECLARE @test TABLE(data VARCHAR(5))

    INSERT INTO @test

       SELECT

           '10'

       UNION ALL

       SELECT

           '12'

       UNION ALL

       SELECT

           '$'

       UNION ALL

       SELECT

           '3E2'

       UNION ALL

       SELECT

           '3.2'

       UNION ALL

       SELECT

           '3,000'

    SELECT

       *,

       ISNUMERIC(data) ,

       CASE

           -- contains anything but a number

           WHEN data LIKE '%[^0-9]%' THEN 0

           ELSE CAST(data AS INT)

       END

    FROM

       @test

    [/font]

  • Jack Corbett (10/24/2008)


    I was going to ask Jeff the same question. I knew there were issues with IsNumeric, try IsNumeric('$'), but I've usually been able to deal with those issues in a separate manner. I had never thought about the Hex numbers. I would venture to guess that happening is rare, but Jeff is right it could be an issue. I don't know what Jeff would suggest for taking care of this maybe something like:

    [font="Courier New"]DECLARE @test TABLE(data VARCHAR(5))

    INSERT INTO @test

       SELECT

           '10'

       UNION ALL

       SELECT

           '12'

       UNION ALL

       SELECT

           '$'

       UNION ALL

       SELECT

           '3E2'

       UNION ALL

       SELECT

           '3.2'

       UNION ALL

       SELECT

           '3,000'

    SELECT

       *,

       ISNUMERIC(data) ,

       CASE

           -- contains anything but a number

           WHEN data LIKE '%[^0-9]%' THEN 0

           ELSE CAST(data AS INT)

       END

    FROM

       @test

    [/font]

    Use ISNUMERIC(data+'.d0') in place of ISNUMERIC(data) to get correct results


    Madhivanan

    Failing to plan is Planning to fail

  • Madhivan,

    Can you explain where you found this and why it works?

    I did not find anything about this is BOL.

    Thanks,

  • Sorry... didn't mean to do a stab'n'run...

    Madhivanan's method works for checking for integers or, if you will, is an "IsAllDigits" check. Adding the .d0 to an integer still produces an integer of the same value with d0 as a label. Try it...

    SELECT 1.d0

    If it's a decimal number, then there will be more than 1 decimal point and it can't be numeric with that. Just about any other stray character including any hex characters will be rejected because of the "decimal place" format

    The "d" and "e" in the examples I gave aren't "hex"... they're two different engineering notations. "d" stands for "decimal point" and "e" stands for "exponent of 10".

    Does this mean that ISNUMERIC has bugs in it? Absolutely not... it works exactly as advertised. I didn't include the method Madhivanan showed, but here's part of an article I wrote on it on a different forum...


    ISNUMERIC is not “ALL DIGITS”

    Submitted by Jeff Moden, 03 Jun 2006

    All rights reserved.

    Introduction:

    There are many cases where you need to ensure that the string data you are working with includes only numeric digits. Most Developers will use the built in ISNUMERIC function to make such a check. Here’s why that’s a bad idea and what to do about it.

    What is ISNUMERIC?

    “Books OnLine” summarizes the description of the ISNUMERIC function as:

    “Determines whether an expression is a valid numeric type.”

    and that’s a 100% accurate description that leaves much to be desired. Just what is a “valid numeric type”? Reading further in BOL (short for “Books OnLine), we find additional information:

    “ISNUMERIC returns 1 when the input expression evaluates to a valid integer, floating point number, money or decimal type; otherwise it returns 0. A return value of 1 guarantees that expression can be converted to one of these numeric types.”

    Again, read the wording… “when the input expression evaluates to a valid integer”, etc, etc. And, that’s the catch. There are many different things that you may not expect that will evaluate to one of the data types listed in the description of ISNUMERIC and a lot of them are NOT the digits 0-9. ISNUMERIC will return a “1” for all of them.

    Let’s consider the most obvious… what will ISNUMERIC(‘-10’) return? What will ISNUMERIC(‘1,000’) return? And how about the not-so-obvious… what will ISNUMERIC('0d1234') or ISNUMERIC('13e20') return? There are many different combinations of letters, numbers, and symbols that can actually be converted to numeric data types and ISNUMERIC will return a “1” for all of them. It’s not a flaw… that’s the way it’s supposed to work!

    What IS Actually Considered “Numeric” by ISNUMERIC?

    This code will show all of the single characters that ISNUMERIC thinks of as “Numeric”…

    --===== Return all characters that ISNUMERIC thinks is numeric

    -- (uses values 0-255 from the undocumented spt_Values table

    -- instead of a loop from 0-255)

    SELECT [Ascii Code] = STR(Number),

    [Ascii Character] = CHAR(Number),

    [ISNUMERIC Returns] = ISNUMERIC(CHAR(Number))

    FROM Master.dbo.spt_Values

    WHERE Name IS NULL

    AND ISNUMERIC(CHAR(Number)) = 1

    That code produces the following list of characters…

    Ascii Code Ascii Character ISNUMERIC Returns

    ---------- --------------- -----------------

    9 1

    10

    1

    11

    1

    12 1

    13

    1

    36 $ 1

    43 + 1

    44 , 1

    45 - 1

    46 . 1

    48 0 1

    49 1 1

    50 2 1

    51 3 1

    52 4 1

    53 5 1

    54 6 1

    55 7 1

    56 8 1

    57 9 1

    128 € 1

    160 1

    163 £ 1

    164 ¤ 1

    165 ¥ 1

    What are these characters?

    Ascii 9 is a TAB character and is included because a column of numbers is frequently delimited by a TAB.

    Ascii 10 is a Line Feed character and is included because the last column of numbers is frequently terminated by a Line Feed character.

    Ascii 11 is a Vertical Tab character and is included because the last column of numbers is frequently terminated by a Vertical Tab character.

    Ascii 12 is a Form Feed character and is included because the last column numbers of the last row is sometimes terminated by a Form Feed character.

    Ascii 13 is a Carriage Return character and is included because the last column of numbers is frequently terminated by a Carriage Return character.

    Ascii 36 (Dollar sign), 128 (Euro sign), 163 (British Pound sign), and 164 (Yen sign) are included because they are frequently used as enumerators to identify the type of number or, in this case, the currency type the number is meant to represent.

    Ascii 43 (Plus sign), 44 (Comma), 45 (Minus sign), and 46 (Decimal place) are included because they are frequently included in numeric columns to mark where on the number line the number appears and for simple formatting.

    Ascii 160 is a special "hard space" and is included because it is frequently used to left pad numeric columns so the column of numbers appears to be right justified.

    Ascii 32 is a "soft space" and is not included because a single space does not usually represent a column of numbers. Ascii 32 is, however, a valid numeric character when used to create right justified numbers as is Ascii 160 but a single Ascii 32 character is NOT numeric. In fact, a string of Ascii 32 spaces is not considered to be numeric but a string of spaces with even a single digit in it is considered to be numeric.

    Ascii 164 is a special character and is included because it is frequently used by accountants and some software to indicate a total or subtotal of some type. It is also used by some to indicate they don't know what the enumerator is.

    Ascii 48-59 are included because they represent the digits 0 through 9

    Set of Characters Treated as “Numeric” by ISNUMERIC

    Do notice that "e" and "d" (everybody forgets about this) are not included as numeric in the results because a single "e" or "d is NOT considered to be numeric. HOWEVER, these letters are for two different forms of scientific notation. So, if you have anything that looks like the following, ISNUMERIC will identify them as “Numeric”…

    SELECT ISNUMERIC('0d2345')

    SELECT ISNUMERIC('12e34')

    The “Rational” Solution

    Hopefully, I’ve proven that ISNUMERIC is NOT the way to determine if a value or a column of values IS ALL DIGITS. So, what to do? We could write something really complex that loops through each character to see if it’s a digit… or … we can use a very simple rational expression to do the dirty work for us. The formula is…

    NOT LIKE '%[^0-9]%'

    … and it can be used directly (preferred method for performance reasons)…

    SELECT *

    FROM sometable

    WHERE somecolumn NOT LIKE '%[^0-9]%'

    … or, if you don’t mind the performance hit, you can create your own “IsAllDigits” function…

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Interesting, I'd never seen the .d0 trick before. At least I came up with the LIKE solution on my own. I always seem to forget the %[range]% functionality available with LIKE.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply