Sound like matching

  • I have data in text field of a table, now I want to have sound like matching with that field. Any suggestion.

  • Try full text indexing

  • I wasnt sure exactly what you mean, David Poole has one suggestion, I read your request from a different point of view, and took it as meaning a sounds like match such as

    Pool sounds like Poole, etc.

    If you want that type of sound-matching, there is a function called Soundex that creates (i think 4 char) 'sound code' of the word and you can use that to match against the soundex of the other text field

    select soundex('Robbie') as Text1

    select soundex('Robby') as Text2

    you can then say if text1 = text2 them match...

    ...

    HTH

    Alex Wilson

     

     

     

  • I'm really not a fan of soundex for a couple of reasons:

    1.  You don't always get what you expect - even where you'd think it was simple.  Try

    select soundex('knight')     -- K523

    select soundex('night')      -- N230

    Anything with a silent first letter will bring you undone.  Likewise any homophonic first letters - 'Kook' and 'Cook'

    2.  It's a bit skewed to Anglo pronunciation - if you're trying to do soundex comparisons in a mixed-origin database you may be out of luck.

    It probably depends what effect you're after, and what sort of data is in this field (i.e. is it name data, descriptive text, what?), but full-text indexing may give a better result.

     

  • Yes, I agree - there are better versions on the same principle available, for example double-metaphone works pretty well (atleast its better) and it handles ethnic names alot more gracefully.

     

     

  • I am also a fan of Metaphone.  The guy who wrote it (Lawrence Phillips) went on to work for Verity, the search engine people so that should tell you something.

    Unfortunately it is not part of the SQL toolset so you will either have to buy an external code library or write it yourself.

    http://umlslex.nlm.nih.gov/lvg/2002/docs/designDoc/metaphone/

     

  • yes, i should have mentioned that.

    i do believe there is already quite a few SQL udf source available on the web, but whether they violates any implied licenses from Lawrence Phillips im not sure.

     

    -- Alex Wilson

  • There are a bunch of SQL scripts that are better than the soundex functions.  For example, the script at this site:

    http://www.tek-tips.com/gfaqs.cfm/lev2/4/lev3/27/pid/183/fid/4110

    correctly equates "Knight" to "night".

    select dbo._SoundexAplhaFunction ('knight'),

        dbo._SoundexAplhaFunction ('night')

    returns "NAGHT" for each

    There are many other options, including one I saw on this site.  Unfortunately I can't find it though (is that because their search is powered by Microsoft?

     

    Signature is NULL

  • I did consider writing some matching functions for SQL Server having written a number of deduplication programs.

    In the end I shyed away from it because it involves heavy string manipulation and can be very resource hungry.

    One possibility is to have a specific searching table that simply holds your SOUNDEX or metaphone keys.  That means updates/inserts are done on a record by record basis rather than trying to generate the keys on the fly.

  • Absolutely, David.

    That would probably be the most efficient way of doing it, although it will add 1 more column to your insert.

    if you have many other tools accessing this particular table,  you may be better off adding a trigger on the table to soundex the contents of inserted and deleted.

    Although bare (bear?) in mind if you Bulk Load the table, the trigger will fire only once, and you might end up with a stack of NULL soundex keys - of course that could be overcome with a nightly scheduled script (if you're not a 24 hour shop) that updates all the nulls with the soundex of whatever field is appropriate.

    Cheers,

    Alex

     

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply