Parse street

  • Hi Guys,

    First post for me here, so i dont know if i am at the right place.

    I have a street that i need to dissect a streetname in 3 columns

    adres=molenweg 9a

    streetname = molenweg

    housenumber = 9

    suffix = a

    I have found the following on the internet

    SELECT top 1 LEFT(Adres,PATINDEX('% [0-9]%',Adres)-1)'Street'

    , SUBSTRING(Adres,PATINDEX('% [0-9]%',Adres)+1,PATINDEX('%[0-9],%',Adres+ ',')-PATINDEX('% [0-9]%',Adres))'House Number'

    FROM t

    it works when i dont have a suffix.

    Can someone point me in the right direction ?

  • maikel01 (10/4/2016)


    Hi Guys,

    First post for me here, so i dont know if i am at the right place.

    I have a street that i need to dissect a streetname in 3 columns

    adres=molenweg 9a

    streetname = molenweg

    housenumber = 9

    suffix = a

    I have found the following on the internet

    SELECT top 1 LEFT(Adres,PATINDEX('% [0-9]%',Adres)-1)'Street'

    , SUBSTRING(Adres,PATINDEX('% [0-9]%',Adres)+1,PATINDEX('%[0-9],%',Adres+ ',')-PATINDEX('% [0-9]%',Adres))'House Number'

    FROM t

    it works when i dont have a suffix.

    Can someone point me in the right direction ?

    Can you provide some specific parsing rules, please? Eg,

    * Streetname = the first word after the equals sign

    * House number = the penultimate character in the string

    * Suffix = the final character in the string

    Note: I do not think that these rules are sophisticated enough, though they work for your example. I presented them in order to make you think about other cases.

    Once we have the detailed rules, we can get you up and running.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • The following partial solution is dependant on PatExclude8K available here ... http://www.sqlservercentral.com/scripts/T-SQL/117890/

    If your data is "adres=molenweg 9a" then you will need to strip out the "adres=" first.

    DECLARE @SourceTable TABLE (

    ID INT IDENTITY(1,1)

    , adres VARCHAR(255)

    );

    INSERT INTO @SourceTable (adres)

    VALUES (' molenweg 9a '), (' molenweg 9 ')

    ;WITH cteSplitOne AS (

    SELECT ID, adres

    -- Find the first number with a preceding space, and use that to extract the street name.

    , StreetName = LTRIM(RTRIM(LEFT(adres,PATINDEX('% [0-9]%',adres)-1)))

    -- The rest of the adress is the housenumber and suffix

    , HouseAndSuffix = LTRIM(RTRIM(RIGHT(adres, LEN(adres) - PATINDEX('% [0-9]%',adres) +1)))

    FROM @SourceTable

    )

    SELECT cte.ID, cte.adres

    , cte.StreetName

    , HouseNumber = num.NewString

    , Suffix = suff.NewString

    FROM cteSplitOne AS cte

    CROSS APPLY dbo.PatExclude8K(HouseAndSuffix, '%[^0-9]%') AS num

    CROSS APPLY dbo.PatExclude8K(HouseAndSuffix, '%[^A-Za-z]%') AS suff

  • Here's an option that doesn't need additional functions.

    This will fail without warning if the street name has a number.

    DECLARE @SourceTable TABLE (

    ID INT IDENTITY(1,1)

    , adres VARCHAR(255)

    );

    INSERT INTO @SourceTable (adres)

    VALUES (' molenweg 9a '), (' molenweg 9 '), ('Two words 5423 X')

    SELECT ID,

    adres,

    streetname,

    LEFT( CompoNum, PATINDEX( '%[^0-9]%', CompoNum+'.')-1) AS housenumber,

    STUFF( CompoNum, 1, PATINDEX( '%[^0-9]%', CompoNum+'.')-1, '') AS suffix

    FROM @SourceTable

    CROSS APPLY (SELECT LEFT( adres, PATINDEX( '%[0-9]%', adres)-1) AS streetname,

    STUFF( adres, 1, PATINDEX( '%[0-9]%', adres)-1, '') CompoNum)x;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hi Phil,

    First of thanks for your reply i have read through the links u provided and it helped me understand what you mean.

    The rules are the following:

    * Streetname = the first word in the string there are a few rules it can have special chars in it e.g. '&', '.', ',','-','/','`'

    * Housenumber = the first numeric character in the string it can look like '48' or '48-50'

    * Suffix = the final character in the string

    I would like to thank you already for giving me the pointer of giving more information.

  • Thanks for your reply DesNorton and Luis Cazares. It worked for a lot of addresses but i did not give not enough information how it could be build up that it does not work for all.

  • based on experience, you won't find a one-function-to-rule-them-all address cleanup. break it up into smaller pieces that complement each other.

    there's too many exceptions to the patterns;

    what I've done previously is create a separate table with "cleaned" addresses.

    and changed the logic of the join criteria to use cleaned addresses if found, else original address.

    with that in place, create a waterfall type of cleanup of addresses...clean records for like 70% with one method, then a second method works on what was left to clean up maybe 10%, with as many as ten methods to finally complete the "cleaned" addresses initiative until the diminishing returns made me spend time on other projects.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Following the rules you posted, here's an alternative.

    DECLARE @SourceTable TABLE (

    ID INT IDENTITY(1,1)

    , adres VARCHAR(255)

    );

    INSERT INTO @SourceTable (adres)

    VALUES (' molenweg 9a '), (' molenweg 9 '), ('Two words 48-50')

    SELECT ID,

    adres,

    LEFT( adres, numberstart - 1) AS streetname,

    SUBSTRING( adres, numberstart, LEN(adres) - CASE WHEN RTRIM(adres) LIKE '%[^0-9]' THEN 1 ELSE 0 END) AS housenumber,

    CASE WHEN RTRIM(adres) LIKE '%[^0-9]' THEN RIGHT(RTRIM(adres), 1) ELSE '' END AS suffix

    FROM @SourceTable

    CROSS APPLY (SELECT PATINDEX( '%[0-9]%', adres) AS numberstart)x;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I think you are right Lowell wil need to do it that way. Thanks luis this is gonna help me a lot.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply