Finding "similar" customer addresses

Question

Post reply

Finding "similar" customer addresses

Chris W

SSChasing Mays

Points: 625
More actions
July 21, 2009 at 3:07 pm

#134452

A colleague needs to develop a query that identifies "similar" mailing addresses. I realize "similar" is subjective but hey, let's give this a shot. The address table is pretty basic and consists of columns (ADDRESS_ID, ADDR1, ADDR2, ADDR3, CITY, STATE, ZIP).
The requierment calls for listing groups of mailing addresses where there are 6 or more occurrences of a "similar" address. I'm not particulalry knowledgeable of the requirement but I assume a scenario like "123 Easy Street" and "123 Easy St." and "123 EasyStreet" would be a reasonable if not simplistic example.
I have some ideas but defintely don't want to reinvent the wheel. I haven't used full-text search and I'm wondering if that would be useful in this situation. Ideally, I'm hoping someone may have a sample query/procedure/udf they would be willing to share.
Thanks in advance,
Chris.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Lamprey13 SSCertifiable Points: 6210 More actions · Answer 1

Something quick and dirty might be the SOUNDX or DIFFERENCE functions.

If you are looking for something better do a search for "sql fuzzy match" There are bunches of articles on the topic.

GSquared SSC Guru Points: 260824 More actions · Answer 2

Have a list company run the list through a CASS process. That'll get you the best results with the least effort.

CASS standardizes addresses. It'll turn "123 Easy Street" into "123 Easy St". It'll find ones that don't match Post Office standards and either correct them to match the standards, or give you an error code that tells you what's wrong with them if it can't fix them itself. (You'd get an error code on "123EasyStreet".)

It's usually pretty cheap to get a list processed. Like fractions of a penny per name on a big enough list.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Lowell SSC Guru Points: 323755 More actions · Answer 3

there was a similar thread, where someone wanted to cleanup misspellings int eh database to get teh addresses normalized;

take a look at this thread:http://www.sqlservercentral.com/Forums/Topic743750-338-1.aspx

we threw together some interesting scripts to do that, if you care to play with cleaning up the data.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

bc_ Hall of Fame Points: 3175 More actions · Answer 4

You could also use the Fuzzy Grouping Transformation in Integration Services for this.

[font="Arial Narrow"]bc[/font]

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 5

Lamprey13 (7/21/2009)
Something quick and dirty might be the SOUNDX or DIFFERENCE functions.
If you are looking for something better do a search for "sql fuzzy match" There are bunches of articles on the topic.

I have heard that this is also available as a commercial service via as a web service call so that you can clean up data as it is entered into a front end application.

I don't have any specific information on vendors, but Google is your friend.

Lamprey13 SSCertifiable Points: 6210 More actions · Answer 6

Michael Valentine Jones (7/21/2009)
Lamprey13 (7/21/2009)
Something quick and dirty might be the SOUNDX or DIFFERENCE functions.
If you are looking for something better do a search for "sql fuzzy match" There are bunches of articles on the topic.
I have heard that this is also available as a commercial service via as a web service call so that you can clean up data as it is entered into a front end application.
I don't have any specific information on vendors, but Google is your friend.

That is true.

I've worked with Dun and Bradstreet to do some data cleansing in the past. The, potential, downside is that they cost money. But, they offer more than just address cleansing data..

Chris W SSChasing Mays Points: 625 More actions · Answer 7

Thanks to all for your responses and suggestions, I really appreciate it.