March 1, 2007 at 3:42 am
Hi I am running searches on a table that I have.
What is the best function or method to perform fuzzy matching. I have experimented with full-text search, soundex etc.. but none of them provide exactly what I need.
For example I want to search a table where column name like '%company%'
Now I also want to pull out any variations or mispelt characters of company i.e. cmpany, campany, compony etc...
Is there a way to do this without requiring another look up table with different variations of the word company as a look up?
The reason is, that it could be any word to search for.
Thanks!
March 1, 2007 at 5:52 am
I can't think of any way using T-SQL which achieves exactly what you want. If you have SQL 2005, the fuzzy lookup transformation in SSIS should deliver what you want. See also http://www.sqlteam.com/item.asp?ItemID=26541
Markus
[font="Verdana"]Markus Bohse[/font]
March 1, 2007 at 6:40 am
You might investigate Regular Expressions (called "wildcards" in BOL) for use in the WHERE. See "LIKE" & "Wildcard Characters" in BOL.
March 2, 2007 at 1:20 am
Don't know if this helps but it's an excellent article on a related subject
http://www.sqlservercentral.com/columnists/mcoles/soundmatchingandaphonetictoolkit.asp
March 2, 2007 at 2:31 am
Actually, I have already done this in Oracle. The trick consists of "reducing" the names that you are looking for to a value that is the same for all variations.
Judging from your case:
- there may be a variation on the vowels
----> eliminate them
- the order of characters may differ
----> use a function that does not depend on the order of characters. You could for example add up the ASCII-values of the consonants.
Some further refinement could consist of mapping "similar" consonants to the same value (M->N, B->P...), reducing digraphs and trigraphs (CK->K, SH->S...).
Note that you could get false positives.
March 2, 2007 at 3:23 am
1. Have a look at http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=13574
2. Build you own dictionary of "related" words....including legit variations and misspellings
a) there's a finite number of words of interest to you
b) by recursively processing your search queries, after the initial population exercise, the effort of improving the list will be minimal.
March 2, 2007 at 4:12 am
Searching for '%company%' you'll get all records having 'accompany', 'accompanying', etc.
Is it what are you ready to accept?
Spaces before and after would not help because you probably need options '"Company"', ',company;', etc. to be included.
So, as Andrew suggested, build a dictionary.
_____________
Code for TallyGenerator
March 2, 2007 at 9:32 am
Thanks for all your help.
Was just wondering if there was a 'magic' way to do fuzzy searches without the use of dictionaries. To an extent I guess not.
Cheers
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply