stripping numeric text out of a field

  • Hi, all, I hope you can help.  I need to be able to strip numeric data out of a column.  For example, in one column, I have all of the following occurrences of data:

    *CA 1000

    CA 5000

    *CA4000

    I basically need only the 1000, 5000 and 4000 from each of these.  Any ideas on how to do this?  I didn't think I could use SUBSTRING since it needed a certain position...

    Thanks!

    Amy

     

  • There may be a better way to do this, but here is a method using substring that will work:

    declare @mychar varchar(20),

     @Index int,

     @NumericText varchar(20)

    set @mychar = '*CA 4000'

    Set @NumericText = ''

    set @index = 1

    while @NumericText = ''

    begin

    if isnumeric(substring(@mychar,@index,len(@mychar))) = 1

      set @numerictext = ltrim(substring(@mychar,@index,len(@mychar)))

    else

      set @index = @index + 1

    end

    print @numerictext

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Is it just prefixes you need to consider ?

    If so:

    Declare @TestData varchar(20)

    Set @TestData = 'CA 5000'

    Select Substring(@TestData, patindex('%[0-9]%', @TestData), Len(@TestData))

  • See, I knew there was a better way!!

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Great!  Both work perfectly and are exactly what I needed... thanks!

     

  • Hmmm... well, I know this is going to come up, too.  What about if I need the alpha characters only?

    For example, starting in the 4th position, I need to pick up everything until I hit a number:

    U2-GED 34039 (I need GED)

    U2-GE 00638 (I need GE)

    U2-GESE38027 (I need GESE)

    I tried changing this to charindex, but couldn't get it to return what I needed.  Any other ideas?  Thanks for the help!

  • You can do the same kind of thing:

    --select between position 4 and the first number after position 4

    Declare @TestData varchar(20)

    Set @TestData = 'U2-GED 34039'

    Select Substring(@TestData, 4, patindex('%[0-9]%', Substring(@TestData, 4, Len(@TestData)))-1)

     

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • I ran into this often enough to create a general-purpose function.  The @type parameter should have a bit set for every class of character you want returned; 1=alphabetic, 2=numeric, 3=alphanumeric, 7=alphanumeric and spaces, 15=all standard ASCII chars, etc.

    CREATE

    FUNCTION [dbo].[udf_extract] (

        @input varchar(500),

        @type tinyint

    )

    RETURNS varchar(500)

    -- Removes unwanted characters from a string

    -- Set bits in @type to indicate which types of characters to keep

    -- 1=Alpha, 2=Numeric, 4=Space, 8=Other Printable Characters, 16=Control & Graphic characters

    -- Example: SELECT master.dbo.udf_extract(' ABC def 321 ', 1|2|4|8)

    AS

    BEGIN

        DECLARE @output varchar(500), @i smallint, @C tinyint

        SET @output = ''

        SET @i = DATALENGTH(@input) -- Use DATALENGTH instead of LEN to handle @input of all spaces

        WHILE @i > 0 BEGIN

            SET @C = ASCII(SUBSTRING(@input, @i, 1))

            IF 0 < ( @type & 

                CASE 

                WHEN @C BETWEEN 65 AND 90 OR @C BETWEEN 97 AND 122 THEN

                WHEN @C BETWEEN 48 AND 57 THEN

                WHEN @C = 32 THEN

                WHEN @C BETWEEN 33 AND 126 THEN

                ELSE 16 END)

                SET @output = CHAR(@C) + @output

            SET @i = @i - 1

        END

        RETURN @output

    END

  • I have a bunch of functions for string cleaning. As far as execution plans go, is it more optimal to use different functions?

  • I guess it would depend on how much the specific functions optimize the operation compared to a general-purpose function.  Run some tests and see if there is a noticable time difference.  Typically the time spent executing a small function is much less than the time spent reading and writing data, but there may be room for improvement.

    String-manipulation functions can be a performance issue because using a lot of SUBSTRING, CHAR, and other text functions requires memory allocation & cleanup for all the string objects.  It's still much faster than disk I/O but much slower than manipulating integers.

    These are the kind of functions that I'm anxious to recode in VB.NET in SQL 2005 CLR.

  • Hello,

    I am working on a similar problem, however, this solution is not working right for me. I am hoping someone might be able to see why. When I use this statement, the value that is returned is 419, the first three digits of the string. I need to return the whole number, as it is the ID that I need to join on. The value that is returned in the @TestData select statement is iFuturePriceID=N'4194582', that how it is in the table, and I need to be able to select only the number. Thanks in Advance, and FYI, the other solution in this thread also returns the same, 419.

     

    Declare @TestData varchar(20)

    Set @TestData = (SELECT DISTINCT(PRIMARY_KEY_DATA) FROM AUDIT_LOG_DATA WHERE PRIMARY_KEY_DATA = ("iFuturePriceID=N'4194582'"))

    Select Substring(@TestData, patindex('%[0-9]%', @TestData), Len(@TestData))

  • What size is the column PRIMARY_KEY_DATA. It's more than 20 characters, right, but you're trying to stuff it into @Testdata which is a varchar(20). The reason you only see '419' is that '4582' got truncated before you even got to the Substring() function.

     

  • yeah, similarly easy problem, it was the variable length I was giving it. Now I have another problem. It is returning the single quote at the end of the string, i.e. 4194582'

     

    Any ideas why it would return the single quote on the end? since it's not a numeric seems weird. Thanks again.

    Declare @TestData varchar(29)

    Set @TestData = (SELECT DISTINCT(PRIMARYKEY) FROM vAUDITTableChanges WHERE PRIMARYKEY = ("iFuturePriceID=N'4194582'"))

    Select Substring(@TestData, patindex('%[0-9]%', @TestData), Len(@TestData))

  • Not wierd, just working as per the original requirements - which were to strip non-numerics from the beginning of the string.

    There was nothing in the original poster's question or the posted solutions that dealt with trailing non-numerics.

     

  • ah, I see. how do you remove the non-numerics from the beginning and the end of a string? i.e. iFuturePriceID=N'4194582'

    Thanks again.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply