Extracting Numeric values from addresses

  • Hello all.

    I'm working with address data and I need to extract the numeric values from an address field.

    For example:

    1313 Mockingbird Lane would yield 1313

    1600 State St #5 would yield 16005.

    I've tried various combinations of instr and substr and isnumeric, but my code has gotten hairy and I'm resorting to loops.

    Does anybody out there have any good user defined functions to get me started?

    Thanks.

  • Here's a start:

    declare @test-2 varchar(20),

        @C char(1),

        @numstr varchar(20)

    select @test-2 = '123 Add7ress 56',

        @numstr = ''

    while len(@test) > 0

    begin

      select @C = Left(@test,1)

      if ISNUMERIC(@c) = 1

        select @numstr = @numstr + @C

      if Len(@test) > 1

        select @test-2 = Right(@test,Len(@test) - 1)

      else

        select @test-2 = ''

     

    end

    select @numstr

     

    a word of caution from BOL:

    ISNUMERIC returns 1 for some characters that are not numbers, such as dollar sign ($), plus (+), and minus (-).

    so you might want to change that to:

    if CHARINDEX(@numstr, @numbers) > 0

    where you have @numbers = '0123456789'

    or perhaps use a regular expression...

  • Pam is on the right track.  You will have to loop through the string, but it doesn't have to be 'hairy'

    CREATE FUNCTION ExtractNumbers ( @String varchar(4000))

     RETURNS varchar(4000)

    AS

    BEGIN

     DECLARE @ReturnString varchar(4000),

      @SubString char(1),

      @index int,

      @Numbers char(10)

     SELECT @SubString = '', @ReturnString = '', @index = 1, @Numbers = '0123456789'

     WHILE @index <= LEN(@String)

     BEGIN

      SET @SubString = substring(@string,@Index,1) 

      IF CHARINDEX(@SubString, @numbers) > 0

       SET @ReturnString = @ReturnString + @SubString

      SET @index = @index + 1 

     END

     RETURN @ReturnString

    END

    PRINT dbo.ExtractNumbers('1600 State St #5')

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks guys!  I'll give these a try.

  • You could try something like this to avoid looping:

    --Build a table of numbers, you may want to make it a real table because it is a good tool to use

    select top 8001 Identity(int, 1,1) As ID

    into #Number

    from master.dbo.syscomments a

    cross join master.dbo.syscomments b

    declare @address varchar(255)

    select @address = '1300 teseter 234'

    declare @numbers varchar(255)

    select @numbers = ''

    select @numbers = @numbers + substring(@address, n.id, 1)

    from #number n

    where n.ID <= len(@address)

    and isnumeric(substring(@address, n.id, 1)) = 1

    select @numbers

    drop table #Number

  • Thanks for your help. John & Pam.

    Your right it doesn't have to be hairy. I got a bit carried away with the while loop.

    Have a good one.

  • Jeff,

    Great use of a Tally/Numbers table... is more than twice as fast as the WHILE LOOP method...

    Shifting gears.... in your code, change this...

    select @address = '1300 teseter 234'

    ...to this...

    select @address = '1300 $t,e.s-e+ter 234'

    ... and run it again... see anything strange?  That's because of what other folks have said about ISNUMERIC... it is NOT the same as "IS ALL DIGITS".

    Instead, change the line with ISNUMERIC to the following and run the code again...

    and substring(@address, n.id, 1) NOT LIKE '%[^0-9]%'

    ...absolutely guaranteed to always be just the digits 0-9 and still provides nasty fast performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree, great use of numbers table.  I have seen this concept explained before, but never put to use and because of that, I have not tried to use it before.  It's funny that I have just recently begun teaching our programmers alternate methods to row-by-row processing and looping.  Thanks for a great practical example....

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Hi all,

    I have a function which is uncannily like JeffB's with Jeff Moden's modification. I won't post it because it's pretty much the same.

    But, in that function I have a comment to investigate the approach below for speed - as an alternative. I've never got around to it, but if someone out there does, please let me know!

    Here's the alternative...

    declare @s-2 varchar(500)

    set @s-2 = '1300 $t,e.s-e+ter 234'

    while @s-2 like '%[^0-9]%'

      set @s-2 = replace(@s, substring(@s, patindex('%[^0-9]%', @s-2), 1), '')

    select @s-2

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Very unscientific, quick and dirty on my desktop, but there seems to be consistent anyway...

    Out of a table with ca 3.7 million rows, do the 'extract only digits' task on a varchar(35) column containing addresses.

    First run where the address like 'AB%' (result returned was ~2.700 rows), second run where address like 'A%' (result returned was ~114.000 rows)

    Method 1 was with the numbers (tally) table approach, method 2 with the while loop and replace approach, and... if found a variant on the latter, a while loop using stuff instead of replace.

    Three functions looking like this:

    -- 1) nums table variant function

    create function dbo.parse1(@s as varchar(50))

    returns varchar(50)

    as

    begin

    declare @r varchar(50)

    select  @r = ''

    select @r = @r + substring(@s, n.n, 1)

    from dbo.nums n

    where n.n <= len(@s)

    and substring(@s, n.n, 1) not like '%[^0-9]%'

    return @r

    end

    go

    -- 2) while loop w replace 

    create function dbo.parse2(@s as varchar(50))

    returns varchar(50)

    as

    begin

    while @s-2 like '%[^0-9]%'

      set @s-2 = replace(@s, substring(@s, patindex('%[^0-9]%', @s-2), 1), '')

    return @s-2

    end

    go

    -- 3) while loop w stuff 

    create function dbo.parse3(@s as varchar(50))

    returns varchar(50)

    as

    begin

    while @s-2 like '%[^0-9]%'

      set @s-2 = stuff( @s-2, patindex('%[^0-9]%', @s-2), 1, '' )

    return @s-2

    end

    go

    The results was maybe a bit surprising. The only thing measured was the time to complete an insert into a table, so time is the only unit looked at.

    Anyway, the best performing was #2) - while loop with replace.

    For the small result, it took one second all three times.

    For the larger result, the average was 14 seconds.

    Tightly behind was #3) - while loop with stuff

    For the small result, it also clocked one second all three times.

    For the larger result, the average was 20 seconds.

    Last, and thusly also least, came the numbers/tally table variant.

    For the small result, it took 4 seconds

    For the larger result, the average was a stunning(?) 2 minutes 20 seconds.

    /Kenneth

     

  • That's great - thanks Kenneth!

    I think I must've suspected as much, which is why I had a note to change my function (or at least look into it). I'll go for it now (well, when I get around to it!).

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • ... and a question for you now jgljgl ...

    "Who lives at 1313 Mockingbird Lane "

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • You think that was a random choice of address?

    The Munsters of course! Oh that Lily!

  • I had to respond since no one else did ... I guess that we have both dated ourselves pretty well now ... maybe we ought to talk to 'Grandpa' about a potion for the issue ...

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Well, speak for yourself, but I'm not that dated.

    I caught the Munsters on re-runs in the 80's.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply