Fuzzy Searching...

  • Hi I am running searches on a table that I have.

    What is the best function or method to perform fuzzy matching.  I have experimented with full-text search, soundex etc.. but none of them provide exactly what I need.

    For example I want to search a table where column name like  '%company%'

    Now I also want to pull out any variations or mispelt characters of company i.e.  cmpany, campany,  compony   etc...

    Is there a way to do this without requiring another look up table with different variations of the word company as a look up?

    The reason is, that it could be any word to search for.

    Thanks!

     

  • I can't think of any way using T-SQL which achieves exactly what you want. If you have SQL 2005, the fuzzy lookup transformation in SSIS should deliver what you want. See also http://www.sqlteam.com/item.asp?ItemID=26541

    Markus

    [font="Verdana"]Markus Bohse[/font]

  • You might investigate Regular Expressions (called "wildcards" in BOL) for use in the WHERE.  See "LIKE" & "Wildcard Characters" in BOL.

  • Don't know if this helps but it's an excellent article on a related subject

    http://www.sqlservercentral.com/columnists/mcoles/soundmatchingandaphonetictoolkit.asp

  • Actually, I have already done this in Oracle. The trick consists of "reducing" the names that you are looking for to a value that is the same for all variations.

    Judging from your case:

    - there may be a variation on the vowels

    ----> eliminate them

    - the order of characters may differ

    ----> use a function that does not depend on the order of characters. You could for example add up the ASCII-values of the consonants.

    Some further refinement could consist of mapping "similar" consonants to the same value (M->N, B->P...), reducing digraphs and trigraphs (CK->K, SH->S...).

    Note that you could get false positives.

  • 1. Have a look at http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=13574

    2. Build you own dictionary of "related" words....including legit variations and misspellings

     a) there's a finite number of words of interest to you

     b) by recursively processing your search queries, after the initial population exercise, the effort of improving the list will be minimal.

  • Searching for '%company%' you'll get all records having 'accompany', 'accompanying', etc.

    Is it what are you ready to accept?

    Spaces before and after would not help because you probably need options '"Company"', ',company;', etc. to be included.

    So, as Andrew suggested, build a dictionary.

    _____________
    Code for TallyGenerator

  • Thanks for all your help.

    Was just wondering if there was a 'magic' way to do fuzzy searches without the use of dictionaries.  To an extent I guess not.

    Cheers

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply