About cleaning data (address normalization)

  • Hi, could you give me some hints about address normalization with sql server 2005 database?

    strategies, algorithms, tools... anything!

    thanks.

  • I assume that you mean "postal" addresses. You best and least inexpensive bet is to buy what is known as "CASS Certification Software" and the good ones have postal service approval. MyMailManager and ZP4 are a couple of good ones but you may want to spend a bit of time doing a search for "CASS Certification".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh.. I forgot to mention.... if you're actually doing physical bulk mailings , there's a pretty good postage savings to be had if you show that you've used only CASS Certified addresses and presort the mail according to the program.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you want to roll your own, you'll need to make multiple passes through the table to clean things up (potentially). You'll need to set rules for how you handle mutli-part names, various abbreviations, zip codes (5 or 5+4), etc.

    In general, using a service is a better idea if you can. If not, you'll need to work through your rules and then implement them in data cleansing routines. Use sets, and build rules to handle a whole set of data at a time, don't work through row by row.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply