String extraction

  • Hello all,

    I am new to sql server so please bare with me.

    I have a table call GEO

    create table GEO

    (supplied_Adrressline nvarchar(100),

    Clean_country nvarchar(100),

    Clean_city nvarchar(100),

    clean_region nvarchar(100),

    clean_postcode nvarchar(100),

    clean_addressline nvarchar(100)

    );

    insert into geo(supplied_addressline,clean_country,clean_city,clean_region,clean_postcode)

    values ( '17 churchstreet 7000','Australia','Melbourne','ME','7000')

    ('10 Downing London HA2 9DR Street','United Kingdom','London','COF','HA2 9DR')

    ('NG6 ILP Middlesex 29 Minehead Road Sao Paulo','Brazil','Sao Paulo','Middlesex','NG6 1LP');

    I want to update GEO with clean_addresslines set to

    17 churchstreet

    10 Downing Street

    29 MInehead Road.

    so want clean_country,clean_city,clean_region,clean_postcode columns to look in supplied_addressline and clean the addressline by removing the bits which are already in clean_columns and set to clean_addressline.

    Any help is really appreciated.

    Thanks

  • Deepthy (11/12/2010)


    create table GEO

    (supplied_Adrressline nvarchar(100),

    Clean_country nvarchar(100),

    Clean_city nvarchar(100),

    clean_region nvarchar(100),

    clean_postcode nvarchar(100),

    clean_addressline nvarchar(100)

    );

    insert into geo(supplied_addressline,clean_country,clean_city,clean_region,clean_postcode)

    values ( '17 churchstreet 7000','Australia','Melbourne','ME','7000')

    ('10 Downing London HA2 9DR Street','United Kingdom','London','COF','HA2 9DR')

    ('NG6 ILP Middlesex 29 Minehead Road Sao Paulo','Brazil','Sao Paulo','Middlesex','NG6 1LP');

    I want to update GEO with clean_addresslines set to

    17 churchstreet

    10 Downing Street

    29 MInehead Road.

    so want clean_country,clean_city,clean_region,clean_postcode columns to look in supplied_addressline and clean the addressline by removing the bits which are already in clean_columns and set to clean_addressline.

    First thing first, provided insert script is not functional - it doesn't parse and doesn't show what the original data is supposed to look like.

    Secondly... you have to come up with a set of rules to slice the original data, don't think UK and Brazilian address can be processed by the same algorithm.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Hello Paul,

    Thanks for the reply.

    insert into geo(addressline,clean_country,clean_city,clean_region,clean_postcode)

    values ( '17 churchstreet 7000','Australia','Melbourne','ME','7000')

    insert into geo(addressline,clean_country,clean_city,clean_region,clean_postcode)

    values ('10 Downing London HA2 9DR Street','United Kingdom','London','COF','HA2 9DR')

    insert into geo(addressline,clean_country,clean_city,clean_region,clean_postcode)

    values('NG6 ILP Middlesex 29 Minehead Road Sao Paulo','Brazil','Sao Paulo','Middlesex','NG6 1LP')

    I result table should look like

    create table clean_geo

    (Adrressline nvarchar(100),

    Clean_country nvarchar(100),

    Clean_city nvarchar(100),

    clean_region nvarchar(100),

    clean_postcode nvarchar(100),

    clean_addressline nvarchar(100)

    );

    insert into geo(addressline,clean_country,clean_city,clean_region,clean_postcode,clean_addressline)

    values ( '17 churchstreet 7000','Australia','Melbourne','ME','7000','17 Church Street')

    insert into geo(addressline,clean_country,clean_city,clean_region,clean_postcode,clean_addressline)

    values ('10 Downing London HA2 9DR Street','United Kingdom','London','COF','HA2 9DR','10 Downing Street')

    insert into geo(addressline,clean_country,clean_city,clean_region,clean_postcode,Clean_addressline)

    values('NG6 ILP Middlesex 29 Minehead Road Sao Paulo','Brazil','Sao Paulo','Middlesex','NG6 1LP','29 Minehead Road').

    What sort of rules should I set up to slice the clean columns from the address line?

    Thanks

  • I am trying with something like this

    this slices the right side part

    update geo

    set clean_addressline=

    substring(addressline,len(clean_country)+1,len(addressline))

    where charindex(clean_country,addressline) = 1 and cleaned_addressline is null

    to slice the left side

    update geo

    set cleaned_addressline = substring(addressline1,0,charindex(clean_country,addressline)-1)

    where charindex(clean_country,addressline1)>1 and cleaned_addressline is null

    the concatenate both left and right parts.

    similarly want to scan with clean_city,clean_region and clean_postcode.

    But no idea how to add right and left substrings, with out adding a new column.

    Thanks

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply