June 21, 2004 at 12:15 pm
I have data in text field of a table, now I want to have sound like matching with that field. Any suggestion.
June 21, 2004 at 12:56 pm
Try full text indexing
June 22, 2004 at 12:56 am
I wasnt sure exactly what you mean, David Poole has one suggestion, I read your request from a different point of view, and took it as meaning a sounds like match such as
Pool sounds like Poole, etc.
If you want that type of sound-matching, there is a function called Soundex that creates (i think 4 char) 'sound code' of the word and you can use that to match against the soundex of the other text field
select soundex('Robbie') as Text1
select soundex('Robby') as Text2
you can then say if text1 = text2 them match...
...
HTH
Alex Wilson
June 22, 2004 at 3:57 am
I'm really not a fan of soundex for a couple of reasons:
1. You don't always get what you expect - even where you'd think it was simple. Try
select soundex('knight') -- K523
select soundex('night') -- N230
Anything with a silent first letter will bring you undone. Likewise any homophonic first letters - 'Kook' and 'Cook'
2. It's a bit skewed to Anglo pronunciation - if you're trying to do soundex comparisons in a mixed-origin database you may be out of luck.
It probably depends what effect you're after, and what sort of data is in this field (i.e. is it name data, descriptive text, what?), but full-text indexing may give a better result.
June 22, 2004 at 4:08 am
Yes, I agree - there are better versions on the same principle available, for example double-metaphone works pretty well (atleast its better) and it handles ethnic names alot more gracefully.
June 22, 2004 at 4:17 am
I am also a fan of Metaphone. The guy who wrote it (Lawrence Phillips) went on to work for Verity, the search engine people so that should tell you something.
Unfortunately it is not part of the SQL toolset so you will either have to buy an external code library or write it yourself.
http://umlslex.nlm.nih.gov/lvg/2002/docs/designDoc/metaphone/
June 22, 2004 at 4:20 am
yes, i should have mentioned that.
i do believe there is already quite a few SQL udf source available on the web, but whether they violates any implied licenses from Lawrence Phillips im not sure.
-- Alex Wilson
June 22, 2004 at 7:09 pm
There are a bunch of SQL scripts that are better than the soundex functions. For example, the script at this site:
http://www.tek-tips.com/gfaqs.cfm/lev2/4/lev3/27/pid/183/fid/4110
correctly equates "Knight" to "night".
select dbo._SoundexAplhaFunction ('knight'),
dbo._SoundexAplhaFunction ('night')
returns "NAGHT" for each
There are many other options, including one I saw on this site. Unfortunately I can't find it though (is that because their search is powered by Microsoft?
Signature is NULL
June 23, 2004 at 1:21 am
I did consider writing some matching functions for SQL Server having written a number of deduplication programs.
In the end I shyed away from it because it involves heavy string manipulation and can be very resource hungry.
One possibility is to have a specific searching table that simply holds your SOUNDEX or metaphone keys. That means updates/inserts are done on a record by record basis rather than trying to generate the keys on the fly.
June 23, 2004 at 1:48 am
Absolutely, David.
That would probably be the most efficient way of doing it, although it will add 1 more column to your insert.
if you have many other tools accessing this particular table, you may be better off adding a trigger on the table to soundex the contents of inserted and deleted.
Although bare (bear?) in mind if you Bulk Load the table, the trigger will fire only once, and you might end up with a stack of NULL soundex keys - of course that could be overcome with a nightly scheduled script (if you're not a 24 hour shop) that updates all the nulls with the soundex of whatever field is appropriate.
Cheers,
Alex
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply