May 20, 2004 at 3:42 pm
I have a Client list of about 175,000 plus records. It is not my own. We are sure there are repeated clients, (e.g. A & A Plumbing, A and A Plumbing, A and A Plumbing Inc., A & A Plumbing, Incorporated, etc).
I have written a routine that parses out each word in each record. I then use SoundEx. I then parse out each SoundEx return and use the ASCII to get a numeric value. I then add the numeric values until all of the words are completed for each record.
Note: I have a list of REPLACE for things like Inc, Incorp, Incorporated, etc since that would only complicate this worse.
I am getting a return set of about 50,000 records; some are look to be similar, some are not close at all.
Has anyone done a SQL routine that tries to capture these kinds of occurences and if so, what approach did you take? If not, does anyone have a suggestion as to how I could further refine these 50,000 records?
Thank you.
I wasn't born stupid - I had to study.
May 20, 2004 at 9:42 pm
I has done a SQL routing that tries to capture Malyutin Slava and Malyutin Salava. Within it I can set a number of errors. That script is not small. If you want I can to e-mail it.
August 30, 2004 at 8:20 am
Can you send me the copy of your SQL Script, using SoundEx to find similar Client Names. My email address is naimsyed@hotmail.com
Thanks a bundle
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply