Cleaning city name

  • Hi,

    I would like to know what is the best way to clean the addresses/city name in our data warehouse. In many case we have many different entries for a city name and this is causing problem in our reporting tools. I also would like to validate the zip code/address combination to be sure of the validity of the address. Do someone has some information here?

    Which database provider should I get to have the zip code?

    Thanks

  • Search for "CASS CERTIFICATION" software... it does the whole thing from womb-to-tomb and provides updates as new address information comes available from the US Postal Service. I believe "ZP4" is a pretty good one according to the folks at my previous job.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Check out Melissa Data - they are fairly well known and seem to be reliable.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks for all your hints.

    What about USPS CASS CERTIFICATION, any experiences with it?

  • Rem (9/22/2008)


    Thanks for all your hints.

    What about USPS CASS CERTIFICATION, any experiences with it?

    None here... but there's nothing like going to the "source". I'm thinking that it should be pretty darned good.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If this is anything like my experience up here in Canada, it should be pretty straightforward.

    We get a monthly CD from CanadaPost with a basic flat ASCII file that has postal codes, street names, and a whole bunch of other fields. Prior to CanadaPost's scheduled deployment date when new postal codes become live, we upload it to one of our QA databases via a DTS package (it's an older SQL 2000 server) that maps those columns to our address table in our postalcode database, test it a bit, then upload it at night to our production server a day or two after the postal codes become "live".

    I assume USPS has something equivalent. Just make sure you create a clustered index on the postalcodes after the data is uploaded to a table, and in our case, we created non-clustered indices on municipality and some other area indicators.

    If you create your own way of uploading data, I recommend upload to a preliminary table such as newAddressTable, then rename currentAddressTable AND it's indices to either _old or something with a date. Remember to rename the indices, including when you change newAddressTable to currentAddressTable.

    Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein

  • There are a number of products from the USPS to handle this type of thing. I think they even have an API that you can just pass them your address/zip info and get back if it's a good address or not.

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • A pretty good product to work with for cleaning up/certifying Canadian addresses is a product called "Street Sweeper". I forget who it's made by because the boys in Ops always did the updates for us.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We just came across this question at work. While the USPS would be the 'authority' on US zip codes, whether it is correct for your business depends on how you're using them. ( I can see everyone scratching their heads and calling me an idiot, let me explain)

    We have business within the state of Ohio, and only within certain regions of Ohio (defined by county boundaries). USPS doesn't care about county or state boundaries when assigning zip codes, and consequently there are zip codes in West Virginia and Kentucky (according to address assignment) that are entirely within the physical boundaries of Ohio. I'm sure vice versa, but I don't have to deal with those.

    Melissa Data (as mentioned above in the thread) is being recommended to us, because they have pre-cleaned the data to make sense for those of us who use county/state boundaries, breaking out the addresses and correcting cities, etc. Oddly enough, the postal service delivers much more efficiently using the 'corrected' addresses rather than the ones they came up with themselves. Go figure.

    So I would suggest if you're blanketing the US, great, go with USPS, otherwise consider Melissa Data.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • We use http://www.satorisoftware.com/default.aspx. The quality of the cleansed address is pretty good.We receive an update cd every 2 months once or sometimes every month which we push to our servers.We use it in our SSIS cleansing packages by calling it in the Execute Process task [it support command line execution].

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply