July 21, 2009 at 3:07 pm
A colleague needs to develop a query that identifies "similar" mailing addresses. I realize "similar" is subjective but hey, let's give this a shot. The address table is pretty basic and consists of columns (ADDRESS_ID, ADDR1, ADDR2, ADDR3, CITY, STATE, ZIP).
The requierment calls for listing groups of mailing addresses where there are 6 or more occurrences of a "similar" address. I'm not particulalry knowledgeable of the requirement but I assume a scenario like "123 Easy Street" and "123 Easy St." and "123 EasyStreet" would be a reasonable if not simplistic example.
I have some ideas but defintely don't want to reinvent the wheel. I haven't used full-text search and I'm wondering if that would be useful in this situation. Ideally, I'm hoping someone may have a sample query/procedure/udf they would be willing to share.
Thanks in advance,
Chris.
July 21, 2009 at 3:17 pm
Something quick and dirty might be the SOUNDX or DIFFERENCE functions.
If you are looking for something better do a search for "sql fuzzy match" There are bunches of articles on the topic.
July 21, 2009 at 3:31 pm
Have a list company run the list through a CASS process. That'll get you the best results with the least effort.
CASS standardizes addresses. It'll turn "123 Easy Street" into "123 Easy St". It'll find ones that don't match Post Office standards and either correct them to match the standards, or give you an error code that tells you what's wrong with them if it can't fix them itself. (You'd get an error code on "123EasyStreet".)
It's usually pretty cheap to get a list processed. Like fractions of a penny per name on a big enough list.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 21, 2009 at 3:46 pm
there was a similar thread, where someone wanted to cleanup misspellings int eh database to get teh addresses normalized;
take a look at this thread:http://www.sqlservercentral.com/Forums/Topic743750-338-1.aspx
we threw together some interesting scripts to do that, if you care to play with cleaning up the data.
Lowell
July 21, 2009 at 4:31 pm
You could also use the Fuzzy Grouping Transformation in Integration Services for this.
[font="Arial Narrow"]bc[/font]
July 21, 2009 at 4:35 pm
Lamprey13 (7/21/2009)
Something quick and dirty might be the SOUNDX or DIFFERENCE functions.If you are looking for something better do a search for "sql fuzzy match" There are bunches of articles on the topic.
I have heard that this is also available as a commercial service via as a web service call so that you can clean up data as it is entered into a front end application.
I don't have any specific information on vendors, but Google is your friend.
July 21, 2009 at 4:49 pm
Michael Valentine Jones (7/21/2009)
Lamprey13 (7/21/2009)
Something quick and dirty might be the SOUNDX or DIFFERENCE functions.If you are looking for something better do a search for "sql fuzzy match" There are bunches of articles on the topic.
I have heard that this is also available as a commercial service via as a web service call so that you can clean up data as it is entered into a front end application.
I don't have any specific information on vendors, but Google is your friend.
That is true.
I've worked with Dun and Bradstreet to do some data cleansing in the past. The, potential, downside is that they cost money. But, they offer more than just address cleansing data..
July 22, 2009 at 7:33 am
Thanks to all for your responses and suggestions, I really appreciate it.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply