December 3, 2008 at 2:56 pm
Hi,
I have a table with lots of addresses and some are obvious duplicates, but with differences, i.e.
1486 SOUTH PAGOSA STREET
1486 S. Pagosa St.
1486 So. Pagosa Street
Is there any method that can be used to know that these are duplicates?
December 3, 2008 at 3:18 pm
One way is to process them in SSIS using the FUZZY Lookup component. I don't think it works all that well, but it does get some. Depending on how many addresses you have you may have to do a lot manually.
What I have done is used the Fuzzy lookup to load a table I then use a cross reference and then manually go through the table, fixing the ones that aren't matches.
I don't know of a real good way.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 3, 2008 at 4:31 pm
Use the Google Maps API and see if they return the same location?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 4, 2008 at 6:17 am
Thanks, I will check that out. I am trying to do as much as I can programmatically.
December 4, 2008 at 6:36 am
Please (please!) report back on this. I think that this would be the coolest thing ever if you get it working. 🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 4, 2008 at 9:31 am
I've never tried the Fuzzy logic or Google API approach, but I did make a fairly decent tool to standardize addresses using the US Post office abbreviations (there should be simmilar standards for other countries):
http://pe.usps.gov/text/pub28/28apc_001.html#NL508_2
And then strip out periods, commas, etc, and compare the standardized addresses. Attached is a script that creates the word-abbreviation table I used.
December 4, 2008 at 12:35 pm
Thanks so much for all of the responses. We are looking into how we can use this information to accomplish this.
December 4, 2008 at 1:35 pm
Your best option would be to use a tool that is built specifically for this. Melissa data (http://www.melissadata.com) is one of the better recognized providers.
They have the ability to integrate with multiple programming languages as well as with SQL Server.
This kind of software pays for itself in no time at all.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
December 4, 2008 at 3:48 pm
I have to do this kind of thing all day every day (actually, I have automated procedures that do exactly this). Since I work for a direct mail company, that makes sense.
What I use is Satori Software's Mailroom Toolkit. It standardizes the addresses, verifies they are mailable, and then it's easy to dedupe the list.
That, of course, costs money. I don't remember how much, since the company I work for paid for it.
It's a great product for handling mailing lists. Has all the stuff for getting reduced postage rates (what used to be called bulk rates), etc. They even have a product for checking if people have moved, based on the Post Office National Change of Address registry.
It's fully automatable with SQL and .NET.
http://www.satorisoftware.com is their web address.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
December 5, 2008 at 7:24 am
We maintain seperate fields for our Property Addresses and then assemble them when we wish to use them. It probably doesn't help for your situation but it helps us to keep reasonably clean addresses. Some of the fields use a lookup table for validation.
-------------------- ---------- ------ ----- ---- --------------------------------------
house_no Int 10 0 YES House Number.
-------------------- ---------- ------ ----- ---- --------------------------------------
house_fra Varchar 3 YES House number fraction, ie '1/2'.
-------------------- ---------- ------ ----- ---- --------------------------------------
st_dir_pre Char 2 YES Prefix Direction.
-------------------- ---------- ------ ----- ---- --------------------------------------
st_name Varchar 40 YES Street Name.
-------------------- ---------- ------ ----- ---- --------------------------------------
st_type Char 4 YES Street Type or Suffix (in the
str_type_lu table).
-------------------- ---------- ------ ----- ---- --------------------------------------
st_dir_suf Char 2 YES Suffix Direction.
-------------------- ---------- ------ ----- ---- --------------------------------------
unit_type Varchar 4 YES Type of unit.
APT = Apartment.
STE = Suite.
UNIT = Unit.
-------------------- ---------- ------ ----- ---- --------------------------------------
unit_num Varchar 9 YES Unit (Apartment, Suite) Number.
-------------------- ---------- ------ ----- ---- --------------------------------------
bldg Varchar 6 YES Building Number.
-------------------- ---------- ------ ----- ---- --------------------------------------
city Varchar 30 YES City Name (in the cityzip_lu view).
-------------------- ---------- ------ ----- ---- --------------------------------------
state Char 2 YES State code (IA).
-------------------- ---------- ------ ----- ---- --------------------------------------
zip Char 5 YES Five digit zip code (in the cityzip_lu
view).
-------------------- ---------- ------ ----- ---- --------------------------------------
zip4 Char 4 YES Four digit Zip Code Extension.
-------------------- ---------- ------ ----- ---- --------------------------------------
Steve
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply