October 11, 2012 at 12:06 pm
hi people.
I have a address data base and I have problems with replicated name of streets.
for example:
souza street
souza street ,
due comma, t-sql replicates that two addresses. I know soundex command, however that datas are in portuguese language.
is there any way to compare strings in non-english language?
October 11, 2012 at 12:19 pm
Can you strip out punctuation before you compare them? Replace commas, periods and a few others (I don't know what's common in Portugal/Brazil/wherever else you might be), then compare them.
In US addresses, hyphens need to stay in, and slashes, but not a lot of other punctuation marks. That might help.
What I ended up having to do for multinational addresses was use a full-text index with a custom thesaurus, then use that to compare the strings using Contains(). That handles things like "123 Main Street" vs "123 Main St", or "Apt 31" vs "#31". It can get slow, though, so don't do that for an OLTP system. Mine stages the data in raw format, then cleans it up and compares it to existing addresses, etc., in an off-hours data load.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply