May 1, 2003 at 4:23 am
I have an interesting problem that I hope someone can help me with. I need to produce a query that will show the results of a name search on my database.
For example if the user types in the word 'Lopez' I need to list all people who's name is
like '%lopez%'.
This works fine until I encounter special characters in a name such as 'López'.
I need to be able to return both names as part of the same query.
I have users in different countries entering this information using different language settings hence the problem when trying to do a like comparison.
This apllies also for other language specific characters.
Any help will be greatly appreciated.
May 1, 2003 at 4:27 am
May 1, 2003 at 4:30 am
One solution would be to perform a REPLACE of your search variable before performing the search. This would allow you to replace 'o' with '[oó]'.
set @strName = replace(@strName, 'o', '[oó]')
Obviously you could add in other acented characters etc.
May 1, 2003 at 4:34 am
Soudex is anytime better options !! than replace as replace restricts you to all the similar pattern which may be entered from anyone across globe.
regards
sanjeev
What ever a Human can imagine , it can be achieved !!
What ever a Human can imagine , it can be achieved !!
May 1, 2003 at 4:39 am
I agree that Soundex functionality will return the all the various characters used in this 'Lopez' example. However, due to the way in which soundex works it is possible for it to return names similar to 'Lopez'.
So it comes down to how precise do you want your match to be? (Yes I know soundex has 4 levels of match......(ONLY 4!!))
May 1, 2003 at 4:49 am
Example: These are the results I get back from one of my databases using SOUNDEX.
select distinct Surname
from tblEmployee
where difference(Surname, 'Smith') = 4
Results:
Sainty
Samani
Samina Seikh
Sammimi
Sammut
Sanani
Sandhi
Sandhu
Sandow
Sandy
Sant
Saundh
Seaman
Seamen
Seemann
Seemon
Shanahan
Shand
Shando
Shanhan
Shannon
Shant
Shimmin
Siannot
Simeon
Simmnet
Simmonds
Simmonite
Simon
Simonds
Simonite
Sinnot
Sinnott
Smit
Smith
Smith - Left
Smith-Taylor
Smith?
Smooth
Smoothy
Smoth
Smout
Smyth
Smythe
Snead
Sneyd
Snoad
Somani
Someone
Sonnet
Soundy
Summon
Symmonds
Symon
Symonds
I'm not saying that SOUNDEX is the wrong choice, but you have to be aware that even at the highest level of soundex match (i.e. 4), you will get back names which bear no resemblance to the name being searched.
May 1, 2003 at 9:34 am
Thanks for all of the replies.
Does anyone know of any other method, Soundex appears to be a little too vague for the level of accuracy that I require.
May 6, 2003 at 12:14 am
DoubleMetaphone is a kind of Soundex, but more powerful.
The original article describing the method can be found at:
http://www.cuj.com/articles/2000/0006/0006d/0006d.htm?topic=articles
A translation of this function in T-SQL can be found at:
http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=519&lngWId=5
It is quite an elaborate function, but the nice thing is you can improve it yourself.
I myself use this method in a VB-variant and works fine.
Erik
May 6, 2003 at 3:08 am
Thanks evorsten, these links are very helpful. Just a side note, I have been looking into the collation settings of my SQL installation and notice that there is an option setting for 'ACCENT-INSENSITIVE/SENSITIVE'. Before I start looking into rebuilding all of my databases in order to change the collation settings, would having this option set to 'ACCENT-INSENSITIVE' maybe solve my problem? i.e. would 'o' and 'ó' now get treated the same? Any thoughts would be greatly appreciated.
May 8, 2003 at 8:05 am
By the way, when I use LIKE I get worse performance than using charindex('searchstring', column) > 0 - Has anyone else noticed this?
May 15, 2003 at 9:45 am
Scout7
quote:
Before I start looking into rebuilding all of my databases in order to change the collation settings, would having this option set to 'ACCENT-INSENSITIVE' maybe solve my problem? i.e. would 'o' and 'ó' now get treated the same? Any thoughts would be greatly appreciated.
It would solve your problem. è = e = é and so on in a ACCENT-INSENSITIVE collation.
May 23, 2003 at 4:15 am
quote:
By the way, when I use LIKE I get worse performance than using charindex('searchstring', column) > 0 - Has anyone else noticed this?
This is because of LIKE's flexibility, it has to run it's "rule-set" for every row
When you use CHARINDEX you are specifying that you only want to find characters within the field
LIKE is easier to read for general programming, but you should always check speed with alternatives, it's only good programming
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply