December 21, 2011 at 11:08 pm
Jeff Moden (12/21/2011)
Dev (12/21/2011)
"Doctor & Mrs. Johnathan van Huron Esq. - DO, MD, MVP"
Title FName LName Degree
Agree on Names but Titles & Degrees shouldn't be in source data.
"Shouldn't be" and "aren't" are two totally different worlds. We just saw the op point out that they are in the source data and you have to deal with it. Dealing with it may be getting the provider of the data to turn to and deliver proper data but I trust such providers about as far as I can throw a truck. π
ha ha..
I have little bit confidence (may be overconfidence) that Title will not appear here. The reason for my assumption is, table has two columns (FName, LName) which have precise meaning. I wouldn't bet on it if he might have 'FullName' as column. I have seen myself creapy data that I cleansed in ETL (dirty job).
December 22, 2011 at 12:26 am
Yeah... that's why "LastName" had a degree indication in it. π
--Jeff Moden
Change is inevitable... Change for the better is not.
December 22, 2011 at 1:45 am
Jeff Moden (12/21/2011)
...Even though that works, what are you going to do with first names that begin with a salutation, last names that end with other titles and suffixes, and first names that look like "Doctor & Mrs. Johnathan van Huron Esq. - DO, MD, MVP" ???? πBefore you create any code for such a problem, you really need to identify the full extent of the problem.
No kidding. Postnominal processing also has to account for known short surnames like 'Al','An','Au','Bi','Do','Du','Fu','Gu','Ha','He'. If the data really is this dirty, then lookups help distinguish between different elements of the name strings. Titles, postnominals, short surnames - and firstnames too if possible. So;
'Doctor & Mrs.' matches in column [Rawtitle] of Titles lookup and returns [Cleantitle] 'Dr & Mrs'
'Johnathan' matches in Firstnames lookup
'Esq', 'DO', 'MD', 'MVP' each match in Postnominals lookup
Remainder of string 'van Huron' is assumed to be surname - another lookup of surname prefixes ('van', 'von') helps.
At this point, a data sample would go a long way - the dirtier, the better.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
December 22, 2011 at 1:53 am
Dev (12/21/2011)
"Doctor & Mrs. Johnathan van Huron Esq. - DO, MD, MVP"
Title FName LName Degree
Agree on Names but Titles & Degrees shouldn't be in source data.
Try telling that to a direct mailing company!
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
December 22, 2011 at 3:52 am
ChrisM@home (12/22/2011)
Dev (12/21/2011)
"Doctor & Mrs. Johnathan van Huron Esq. - DO, MD, MVP"
Title FName LName Degree
Agree on Names but Titles & Degrees shouldn't be in source data.
Try telling that to a direct mailing company!
Itβs my lucky day. Two geniuses crossed my post...
I have seen postal / mailing company database. Their naming & address storage is normalized and they have strict rules for data entry as well. I have worked for a telecom customer where company was spending quite good amount on de-duplicating the address to identify the customer uniquely based on address.
But there is always a scope for goof up. :hehe:
December 22, 2011 at 4:03 am
Dev (12/22/2011)
ChrisM@home (12/22/2011)
Dev (12/21/2011)
"Doctor & Mrs. Johnathan van Huron Esq. - DO, MD, MVP"
Title FName LName Degree
Agree on Names but Titles & Degrees shouldn't be in source data.
Try telling that to a direct mailing company!
Itβs my lucky day. Two geniuses crossed my post...
I have seen postal / mailing company database. Their naming & address storage is normalized and they have strict rules for data entry as well. I have worked for a telecom customer where company was spending quite good amount on de-duplicating the address to identify the customer uniquely based on address.
But there is always a scope for goof up. :hehe:
You know Jeff AND Johnathan van Huron? π
The last task at my last gig was writing a name cleansing sproc for a DM company. You can't imagine how rubbish some of that data was. DM companies generally buy up lists and load them to their db, cleansing on route, but the algorithms are sometimes cr@p.
Let's see what the OP comes up with. It's impossible to write a simple solution without seeing the shape of the data.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
Viewing 6 posts - 16 through 20 (of 20 total)
You must be logged in to reply to this topic. Login to reply