September 20, 2005 at 7:19 am
Hey Guru's,
I'm wanting to do a "Sounds Like" comparison to compare a list of addresses of one of my databases. Is SOUNDEX the best way to go? I'm just not getting the tyoe of results that I had hoped for.
Thanks for all the help
Shane
September 20, 2005 at 7:24 am
You can also check out difference.
September 20, 2005 at 7:56 am
I'm still having problems.
Do you know a way using either SOUNDEX or DIFFERNCE to get the following results?
I'm wanting to take to addresses (a parent record and a child record), if the addresses are similar I don't want to display them. Similar being, one person may have typed in Road a while the other person typed in Rd or Rd.
I only want to show records where the addresses are significantly different.
Hopefully this doesn't sound confusing.
September 20, 2005 at 8:00 am
How is the adress stored in the db?
September 20, 2005 at 8:02 am
As a VARCHAR(100)
September 20, 2005 at 8:05 am
You'll have to split it into its smaller parts for the task... and I suggest you do that for the db too. It's easier if you have the street number, stree name and zip code in different columns. YOu can use soundex where like and can't do the job.
September 20, 2005 at 8:09 am
OK
Currently I have it stored as
street_line1 VARCHAR(100)
street_line2 VARCHAR(100)
street_line3 VARCHAR(100)
city VARCHAR(30)
state VARCHAR(2)
zipcode VARCHAR(10)
So if I wanted to use the SOUNDEX in place of a LIKE or '', can you give me an example?
September 20, 2005 at 8:14 am
You'd have to use difference. Check it out under bols it's pretty good, but I don't think it was made for such a task.
September 21, 2005 at 2:23 am
You could try using the Levenshtein edit distance - there are plenty of examples on the net. You would have to write your own function for this but it should be OK if you just are comparing a couple of values. Performance would be a problem if you were doing it in a query.
September 21, 2005 at 2:43 am
from BOL :
The DIFFERENCE function compares the SOUNDEX values of two strings and evaluates the similarity between them, returning a value from 0 through 4, where 4 is the best match.
Here is a litle test I've done a while ago :
select 'bijnens', soundex('bijnens')
union all
select 'bainens', soundex('baainens')
union all
select 'beinens',soundex('beinens')
union all
select 'beijnens', soundex('beijnens')
union all
select 'bynens', soundex('bynens')
union all
select 'bijnans', soundex('bijnans')
union all
select 'bijkans', soundex('bijkans')
union all
select 'janssen', soundex('janssen')
union all
select 'janssens', soundex('janssens')
union all
select 'jansen', soundex('jansen')
union all
select 'jans', soundex('jans')
union all
select 'sjans', soundex('sjans')
union all
select 'chans', soundex('chans')
resulted at our servers (default installation English)
bijnens B255
bainens B552
beinens B552
beijnens B255
bynens B552
bijnans B255
bijkans B252
janssen J525
janssens J525
jansen J525
jans J520
sjans S252
chans C520
I geuss it's obvious the results are language dependant ! (phonetics)
You should do propre testing and evaluation.
you could use a case statement to conditionaly select your data :
declare @tmptest table (idnr int not null identity(1,1) primary key, LName varchar(50) not null )
insert into @tmptest (LName)
select lname
from (
select 'bijnens' as LName, soundex('bijnens') as SoundexValue
union all
select 'bainens', soundex('baainens')
union all
select 'beinens',soundex('beinens')
union all
select 'beijnens', soundex('beijnens')
union all
select 'bynens', soundex('bynens')
union all
select 'bijnans', soundex('bijnans')
union all
select 'bijkans', soundex('bijkans')
union all
select 'janssen', soundex('janssen')
union all
select 'janssens', soundex('janssens')
union all
select 'jansen', soundex('jansen')
union all
select 'jans', soundex('jans')
union all
select 'sjans', soundex('sjans')
union all
select 'chans', soundex('chans')
) A
Select T1.IdNr, T1.LName , T2.LName
from @tmptest T1
left join @tmptest T2
on T1.IdNr <> T2.IdNr
and soundex(T1.Lname) = soundex(T2.Lname)
order by T1.IdNr
--- 2
Select T1.IdNr, T1.LName , T2.LName
from @tmptest T1
left join @tmptest T2
on T1.IdNr <> T2.IdNr
and soundex(T1.Lname) = soundex(T2.Lname)
where DIFFERENCE(T1.LName, T2.LName) = 4
order by T1.IdNr
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
September 21, 2005 at 8:07 am
i though soundex didn't work if the value started with a number? so an address like 123 main street vs 321 main St was ineffective?
select soundex('123 main street') = 0000
select soundex('321 main street') = 0000
select soundex('main street') = M500
I've got a stack of REPLACE functions that i used to add a column that hadd a "cleaned address" based on the original address if htat might help;
it just copied the addr1 field to a new column, and then updated that new column to replace all variations of POBOX for example (ie ''PO','p.o.box','p.o. box') with the default .
it was a stack of like 100 replaces, but seemed to give me pretty clean results when compared to the original. let me know if it sounds useful and i'll post it here.
Lowell
September 21, 2005 at 8:11 am
September 21, 2005 at 8:23 am
You are better off using an external app for address matching.
SOUNDEX is OK but it always retains the 1st character and it deals with letters on an individual basic.
Philistine an Filistine are clearly intended to be the same word but will return different SOUNDEXs.
I am a great fan of the Lawrence Philips Metaphone for address matching but you would never want to build that functionality into SQL Server.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply