Parse City and State

  • Hi guys,

    Need help, Below are sampl data that i have in my source data.

    ADDRESS

    215 EAST 5TH ST DULUTH MN 55805

    6080 CANT ROAD DULUTH MN 55804

    C/O KURT BARTELL WEICHERT REALTORS TWIN PORTS 11 E SUPEIOR ST #290 DULUTH, MN 55802

    13657 ELKWOOD DR APPLE VALLEY MN 55124 *SEND ALL MAIL TO OWNER*

    2516 N 28TH ST SUPERIOR WI 54880

    2110 JEFFERSON ST DULUTH MN 55812

    7333 GALLAGHER DRIVE EDINA MN 55435

    PO BOX 3144 DULUTH MN 55803

    2231 E 2ND ST DULUTH MN 55812

    Note:- I want to parse City/State and zip

    Total i have 7k records in my source table i would say 5% are bad data, i have no problem if those 5% come out bad. I am using for zip code expression in deried column

    ZIP = RIGHT([ADDRESS],@[User::ZIP]) *In variable i am using value of ZIP = 5*

    I am getting 96% good zip and 5% those have bad data getting value

    But i need help to parse city and state.

    Please help me out in expression to get CITY AD STATE from the address (sample data listed above). Thanks for Help.

  • As you can see, unless you know the names of the cities, it's going to be just about impossible. My recommendation would be to buy a CASS certification program to do the splits for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for your prompt reply, I know my 95% data is good and 5% is bad, and i am sure i can use expressions to get what i want but i am not very good in expression.

  • You don't understand what I'm saying. There is no way to know if a word is a part of the city or part of the address for any of the strings you've posted. How many words is a city? 1? 2? 3? More? The only thing you can do with this is to get a list of States, find the State in the string, and maybe the word before the state will be a one word city. That's the best you're going to be able to do with this unless you can find a list of city names by State.

    A CASS certification program to do this split will cost a whole lot less than what you're getting ready to spend in time and errors. And I can almost guarantee that if you try to do it yourself without a list of city names by State, addresses with more than a 1 word city name are going to be close to 100% wrong.

    I'm not trying to be mean or difficult here... I'm just stating facts because I've been through this so many times.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • rocky_498 (3/3/2011)


    Thanks for your prompt reply, I know my 95% data is good and 5% is bad, and i am sure i can use expressions to get what i want but i am not very good in expression.

    As Jeff said, you must know all the names of cities and states and then perform fuzzy matching, as you must also take into account that some names may be misspelled.

    If you're going to implement it yourself: good luck.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • As other knowledgeable people have said before me, parsing address information is not easy task and it is not a matter of one or two "smarty pants" expressions.

    If you can use third-party solutions, check the commercial CozyRoc Address Parse component. It allows advanced address parsing functionality and you can extract the individual elements. In addition to that we also offer integration with USPS Address Database, so you can check if specific address exists in the official post office data.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply