July 11, 2013 at 2:54 pm
I'm trying to split a full name column up into First / Middle / Last....but some of the names are like "Joe B. W. LastName" or "Jenny MiddleName MaidenName MarriedName" and so on...
In the past I've used parsename() and charindex() to solve this with "cleaner" data. How would I go about handling these?
Thanks
July 11, 2013 at 3:13 pm
I think this is nearly impossible using TSQL only as there's no way of knowing if a space means the start of the last name, or if it is a seperator in the first name. Or if it seperates a middle name, and so on.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
July 11, 2013 at 3:56 pm
brickpack (7/11/2013)
I'm trying to split a full name column up into First / Middle / Last....but some of the names are like "Joe B. W. LastName" or "Jenny MiddleName MaidenName MarriedName" and so on...In the past I've used parsename() and charindex() to solve this with "cleaner" data. How would I go about handling these?
Thanks
For any large list it's a nightmare! :crazy:
However, here's a community-inspired function thats been kicking around for years (not mine so I can't take any credit for it) that will make a good start on parsing names. Usually it will still take a bit of tweaking to cover unusual combinations (which can be added to the function ad infinitum). And then several passes through the data as well as visual inspection...like I said: a nightmare.
ALTER FUNCTION [dbo].[svfFormatName]
(
@NameString VARCHAR(100)
,@NameFormat VARCHAR(20)
)
RETURNS VARCHAR(100)
AS
BEGIN
--svfFormatName decodes a NameString into its component parts and returns it in a requested format.
--@NameString is the raw value to be parsed.
--@NameFormat is a string that defines the output format. Each letter in the string represents
--a component of the name in the order that it is to be returned.
-- [H] = Full honorific
-- [h] = Abbreviated honorific
-- [F] = First name
-- [f] = First initial
-- [M] = Middle name
-- [m] = Middle initial
-- [L] = Last name
-- [l] = Last initial
-- = Full suffix
-- = Abbreviated suffix
-- [.] = Period
-- [,] = Comma
-- [ ] = Space
--Sample Syntax : select dbo.svfFormatName('President Barack Hussein Obama Senior', 'h. F m. L s.')
--Returns : 'Pres. Barack H. Obama Sr.'
DECLARE @Honorific VARCHAR(20)
DECLARE @FirstName VARCHAR(20)
DECLARE @MiddleName VARCHAR(30)
DECLARE @LastName VARCHAR(30)
DECLARE @Suffix VARCHAR(20)
DECLARE @TempString VARCHAR(100)
DECLARE @TempString2 VARCHAR(100)
DECLARE @IgnorePeriod CHAR(1)
--Prepare the string
--Make sure each period is followed by a space character.
SET @NameString = RTRIM(LTRIM(REPLACE(@NameString,'.','. ')))
--Replace numeric suffixes
SET @NameString = REPLACE(@NameString,'2nd','II')
SET @NameString = REPLACE(@NameString,'3rd','III')
SET @NameString = REPLACE(@NameString,'4th','IV')
--Remove disallowed characters
DECLARE @PatternString VARCHAR(50)
SET @NameString = REPLACE(@NameString,'-','¬') --Replace dashes we want to save, as patindex does not allow escaping characters.
SET @PatternString = '%[^a-z ¬()`,'''']%~' --'''' includes single quote in permitted character list.
WHILE PATINDEX(@PatternString,@NameString) > 0
SET @NameString = STUFF(@NameString,PATINDEX(@PatternString,@NameString),1,' ')
SET @NameString = REPLACE(@NameString,'¬','-') --Put the dashes back
--Remove telephone ext
SET @NameString = LTRIM(RTRIM(REPLACE(' ' + @NameString + ' ',' EXT ',' ')))
--Make sure there is at least one space after commas
SET @NameString = REPLACE(@NameString,',',', ')
--Eliminate double-spaces.
WHILE CHARINDEX(' ',@NameString) > 0
SET @NameString = REPLACE(@NameString,' ',' ')
--Eliminate periods
WHILE CHARINDEX('.',@NameString) > 0
SET @NameString = REPLACE(@NameString,'.','')
--Remove spaces around hyphenated names
SET @NameString = REPLACE(REPLACE(@NameString,'- ','-'),' -','-')
--Join Irish surnames
SET @NameString = REPLACE(@NameString,'O'' ','O''')
--Remove commas before suffixes
SET @NameString = REPLACE(@NameString,', CLU',' CLU')
SET @NameString = REPLACE(@NameString,', CNP',' CNP') --Certified Notary Public
SET @NameString = REPLACE(@NameString,', ESQ',' ESQ')
SET @NameString = REPLACE(@NameString,', Jr',' Jr')
SET @NameString = REPLACE(@NameString,', LPN',' LPN')
SET @NameString = REPLACE(@NameString,', RN',' RN')
SET @NameString = REPLACE(@NameString,', Sr',' Sr')
SET @NameString = REPLACE(@NameString,', II',' II')
SET @NameString = REPLACE(@NameString,', III',' III')
SET @NameString = REPLACE(@NameString,', IV',' IV')
--Temporarily join multi-word firstnames
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Ann Marie ',' Ann~Marie '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Anna Marie ',' Anna~Marie '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Barbara Jo ',' Barbara~Jo '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Betty Lou ',' Betty~Lou '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Billie Jo ',' Billie~Jo '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Bobbi Jo ',' Bobbi~Jo '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Dee Dee ',' Dee~Dee '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Fannie Mae ',' Fannie~Mae '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Lisa Marie ',' Lisa~Marie '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Mary Beth ',' Mary~Beth '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Mary Ellen ',' Mary~Ellen '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Mary Jane ',' Mary~Jane '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Mary Jo ',' Mary~Jo '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Mary Lou ',' Mary~Lou '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Rose Mary ',' Rose~Mary '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Rose Marie ',' Rose~Marie '))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Sugar Rae ',' Sugar~Rae '))
--For compound names ending in Ann, also include Anne variation.
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Beth Ann',' Beth~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Dee Ann',' Dee~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Dorothy Ann',' Dorothy~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Ellen Ann',' Ellen~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Jo Ann',' Jo~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Lea Ann',' Lea~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Lee Ann',' Lee~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Leigh Ann',' Leigh~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Lu Ann',' Lu~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Mary Ann',' Mary~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Rae Ann',' Rae~Ann'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Ruth Ann',' Ruth~Ann'))
--Temporarily join multi-word surnames
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Dos ',' Dos~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' St ',' St.~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' St. ',' St.~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Da ',' Da~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Di ',' Di~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Del ',' Del~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Des ',' Des~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Vanden ',' Vanden~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Van De ',' Van~De~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Van Den ',' Van~Den~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Vander ',' Vander~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Van ',' Van~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Ver ',' Ver~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Van Der ',' Van~Der~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Von Der ',' Von~Der~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Von ',' Von~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Mc ',' Mc~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' Mac ',' Mac~'))
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' La ',' La~')) --Must be checked before "De", to handle "De La [Surname]"s.
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' De ',' De~'))
--Temporarily join 2nd lastname and nicknames defined within parethesis
SET @NameString = LTRIM(REPLACE(' ' + @NameString,' (','~('))
--If the lastname is listed first, strip it off.
SET @TempString = RTRIM(LEFT(@NameString,CHARINDEX(' ',@NameString)))
--Search for suffixes trailing the LastName
SET @TempString2 = LTRIM(RIGHT(@NameString,LEN(@NameString) - LEN(@TempString)))
SET @TempString2 = RTRIM(LEFT(@TempString2,CHARINDEX(' ',@TempString2)))
IF RIGHT(@TempString2,1) = ','
BEGIN
SET @Suffix = LEFT(@TempString2,LEN(@TempString2) - 1)
SET @LastName = LEFT(@TempString,LEN(@TempString))
END
IF RIGHT(@TempString,1) = ','
SET @LastName = LEFT(@TempString,LEN(@TempString) - 1)
IF LEN(@LastName) > 0
SET @NameString = LTRIM(RIGHT(@NameString,LEN(@NameString) - LEN(@TempString)))
IF LEN(@Suffix) > 0
SET @NameString = LTRIM(RIGHT(@NameString,LEN(@NameString) - LEN(@TempString2)))
--Get rid of any remaining commas
WHILE CHARINDEX(',',@NameString) > 0
SET @NameString = REPLACE(@NameString,',','')
--Get Honorific and strip it out of the string
SET @TempString = RTRIM(LEFT(@NameString,CHARINDEX(' ',@NameString + ' ')))
IF @TempString IN ('Admiral','Adm','Captain','Cpt','Capt','Commander','Cmd','Corporal','Cpl','Doctor','Dr','Father','Fr','General','Gen','Governor','Gov','Honorable','Hon','Lieutenant','Lt','Madam','Mdm','Madame','Mme','Mademoiselle','Mlle','Major','Maj','Miss','Ms','Mr','Mrs','President','Pres','Private','Pvt','Professor','Prof','Rabbi','Reverend','Rev','Senior','Sr','Seniora','Sra','Seniorita','Srta','Sergeant','Sgt','Sir','Sister')
SET @Honorific = @TempString
IF LEN(@Honorific) > 0
SET @NameString = LTRIM(RIGHT(@NameString,LEN(@NameString) - LEN(@TempString)))
--Get Suffix and strip it out of the string
IF @Suffix IS NULL
BEGIN
SET @TempString = LTRIM(RIGHT(@NameString,CHARINDEX(' ',REVERSE(@NameString) + ' ')))
WHILE @TempString IN ('Attorney','Att','Atty','BA','BS','CPA','CNP', --Certified Notary Public
'DDS','DVM','Esquire','Esq','II','III','IV','Junior','Jr','LPN','MBA','MD','OD','PHD','RN','Senior','Sr','ASA','SRA','CLU'--Realestate Certifictaions
)
BEGIN
SET @Suffix = @TempString + COALESCE(' ' + @Suffix,'')
SET @NameString = RTRIM(LEFT(@NameString,LEN(@NameString) - LEN(@TempString)))
SET @TempString = LTRIM(RIGHT(@NameString,CHARINDEX(' ',REVERSE(@NameString) + ' ')))
END
END
IF @LastName IS NULL
BEGIN
--Get LastName and strip it out of the string
SET @LastName = LTRIM(RIGHT(@NameString,CHARINDEX(' ',REVERSE(@NameString) + ' ')))
SET @NameString = RTRIM(LEFT(@NameString,LEN(@NameString) - LEN(@LastName)))
END
--Get FirstName and strip it out of the string
SET @FirstName = RTRIM(LEFT(@NameString,CHARINDEX(' ',@NameString + ' ')))
SET @NameString = LTRIM(RIGHT(@NameString,LEN(@NameString) - LEN(@FirstName)))
--Anything remaining is MiddleName
SET @MiddleName = @NameString
--Create the output string
SET @TempString = ''
WHILE LEN(@NameFormat) > 0
BEGIN
IF @IgnorePeriod = 'F'
OR LEFT(@NameFormat,1) <> '.'
BEGIN
SET @IgnorePeriod = 'F'
SET @TempString = @TempString
+ CASE ASCII(LEFT(@NameFormat,1))
WHEN '32' THEN
CASE RIGHT(@TempString,1) --Space
WHEN ' ' THEN ''
ELSE ' '
END
WHEN '44' THEN
CASE RIGHT(@TempString,1) --Comma
WHEN ' ' THEN ''
ELSE ','
END
WHEN '46' THEN
CASE RIGHT(@TempString,1) --Period
WHEN ' ' THEN ''
ELSE '.'
END
WHEN '70' THEN ISNULL(@FirstName,'') --F
WHEN '72' THEN
CASE @Honorific --H
WHEN 'Adm' THEN 'Admiral'
WHEN 'Capt' THEN 'Captain'
WHEN 'Cmd' THEN 'Commander'
WHEN 'Cpl' THEN 'Corporal'
WHEN 'Cpt' THEN 'Captain'
WHEN 'Dr' THEN 'Doctor'
WHEN 'Fr' THEN 'Father'
WHEN 'Gen' THEN 'General'
WHEN 'Gov' THEN 'Governor'
WHEN 'Hon' THEN 'Honorable'
WHEN 'Lt' THEN 'Lieutenant'
WHEN 'Maj' THEN 'Major'
WHEN 'Mdm' THEN 'Madam'
WHEN 'Mlle' THEN 'Mademoiselle'
WHEN 'Mme' THEN 'Madame'
WHEN 'Ms' THEN 'Miss'
WHEN 'Pres' THEN 'President'
WHEN 'Prof' THEN 'Professor'
WHEN 'Pvt' THEN 'Private'
WHEN 'Sr' THEN 'Senior'
WHEN 'Sra' THEN 'Seniora'
WHEN 'Srta' THEN 'Seniorita'
WHEN 'Rev' THEN 'Reverend'
WHEN 'Sgt' THEN 'Sergeant'
ELSE ISNULL(@Honorific,'')
END
WHEN '76' THEN ISNULL(@LastName,'') --L
WHEN '77' THEN ISNULL(@MiddleName,'') --M
WHEN '83' THEN
CASE @Suffix --S
WHEN 'Att' THEN 'Attorney'
WHEN 'Atty' THEN 'Attorney'
WHEN 'Esq' THEN 'Esquire'
WHEN 'Jr' THEN 'Junior'
WHEN 'Sr' THEN 'Senior'
ELSE ISNULL(@Suffix,'')
END
WHEN '102' THEN ISNULL(LEFT(@FirstName,1),'') --f
WHEN '104' THEN
CASE @Honorific --h
WHEN 'Admiral' THEN 'Adm'
WHEN 'Captain' THEN 'Capt'
WHEN 'Commander' THEN 'Cmd'
WHEN 'Corporal' THEN 'Cpl'
WHEN 'Doctor' THEN 'Dr'
WHEN 'Father' THEN 'Fr'
WHEN 'General' THEN 'Gen'
WHEN 'Governor' THEN 'Gov'
WHEN 'Honorable' THEN 'Hon'
WHEN 'Lieutenant' THEN 'Lt'
WHEN 'Madam' THEN 'Mdm'
WHEN 'Madame' THEN 'Mme'
WHEN 'Mademoiselle' THEN 'Mlle'
WHEN 'Major' THEN 'Maj'
WHEN 'Miss' THEN 'Ms'
WHEN 'President' THEN 'Pres'
WHEN 'Private' THEN 'Pvt'
WHEN 'Professor' THEN 'Prof'
WHEN 'Reverend' THEN 'Rev'
WHEN 'Senior' THEN 'Sr'
WHEN 'Seniora' THEN 'Sra'
WHEN 'Seniorita' THEN 'Srta'
WHEN 'Sergeant' THEN 'Sgt'
ELSE ISNULL(@Honorific,'')
END
WHEN '108' THEN ISNULL(LEFT(@LastName,1),'') --l
WHEN '109' THEN ISNULL(LEFT(@MiddleName,1),'') --m
WHEN '115' THEN
CASE @Suffix --s
WHEN 'Attorney' THEN 'Atty'
WHEN 'Esquire' THEN 'Esq'
WHEN 'Junior' THEN 'Jr'
WHEN 'Senior' THEN 'Sr'
ELSE ISNULL(@Suffix,'')
END
ELSE ''
END
--The following honorifics and suffixes have no further abbreviations, and so should not be followed by a period:
IF (
(
ASCII(LEFT(@NameFormat,1)) = 72
AND @Honorific IN ('Rabbi','Sister')
)
OR (
ASCII(LEFT(@NameFormat,1)) = 115
AND @Suffix IN ('ASA','BA','BS','CLU','CNP','DDS','DVM','II','III','IV','V','MBA','MD','PHD','RN','LPN','SRA')
)
)
SET @IgnorePeriod = 'T'
--If the FirstName or MiddleName is not an initial, then do not follow with a period.
IF ASCII(LEFT(@NameFormat,1)) = '70'
AND LEN(@FirstName) > 1
SET @IgnorePeriod = 'T'
IF ASCII(LEFT(@NameFormat,1)) = '77'
AND LEN(@MiddleName) > 1
SET @IgnorePeriod = 'T'
END
SET @NameFormat = RIGHT(@NameFormat,LEN(@NameFormat) - 1)
END
--select replace(@TempString, '~', ' ')
RETURN REPLACE(@TempString, '~', ' ')
END
July 11, 2013 at 4:00 pm
Thanks! I actually just found that same function. 😀
It looks like it will handle almost anything...
July 12, 2013 at 11:57 am
the thing to do, IMHO, is have a series of passes in the data, with a few procs/functions, that handle different items. This one posted looks good, but I bet you'll find some new ones, and you want to be able to plug in some new cleaning functions over time.
I've seen this done in T-SQL, with specialized functions, and it can usually catch 90+% of the issues, but a few always slip through when someone does something weird in data entry, like Steve Van. Dyke, Jr.
December 2, 2015 at 1:00 pm
You're a genius!!!!!!
February 28, 2018 at 2:52 pm
Hi, I am new to SQL but I would love to use your wonderful code to my usecase. May I ask how can I apply to my case as I tried paste in my VBA editor but most code turns red...
February 28, 2018 at 9:49 pm
cosette.leung - Wednesday, February 28, 2018 2:52 PMHi, I am new to SQL but I would love to use your wonderful code to my usecase. May I ask how can I apply to my case as I tried paste in my VBA editor but most code turns red...
The code is SQL code. It does not work in VBA.
March 1, 2018 at 9:34 am
brickpack - Thursday, July 11, 2013 2:54 PMI'm trying to split a full name column up into First / Middle / Last....but some of the names are like "Joe B. W. LastName" or "Jenny MiddleName MaidenName MarriedName" and so on...In the past I've used parsename() and charindex() to solve this with "cleaner" data. How would I go about handling these?Thanks
"the pumpkin is a failure as a shad tree" - Mark Twain
SQL is the wrong tool for semantic problems. Get a tool like Melissa Data. Names are too tricky.
Please post DDL and follow ANSI/ISO standards when asking for help.
November 8, 2018 at 3:42 pm
I've added some code around 'middle name' to handle <First> <MI> <Last> <Surname> <Mother's Maiden> as well as a name with two middle names that aren't hyphenated. The original code is the commented out lines below.
--Anything remaining is MiddleName
-- SET @MiddleName = @NameString
if (LEN(@NameString) > 2 and CHARINDEX(' ',@NameString) = 2)
BEGIN
set @MiddleName = LEFT(@NameString,CHARINDEX(' ',@NameString)-1)
SET @LastName = SUBSTRING(@NameString,CHARINDEX(' ',@NameString),LEN(@NameString) - CHARINDEX(' ',@NameString)+1) + ' ' + @LastName
END
else
BEGIN
SET @MiddleName = @NameString
END
November 8, 2018 at 9:28 pm
brickpack - Thursday, July 11, 2013 2:54 PMI'm trying to split a full name column up into First / Middle / Last....but some of the names are like "Joe B. W. LastName" or "Jenny MiddleName MaidenName MarriedName" and so on...In the past I've used parsename() and charindex() to solve this with "cleaner" data. How would I go about handling these?Thanks
Whenever I see someone request that they need help with splitting a full name column, I have to ask "WHY"? Why do you need to do such a thing in your case? What is the business reason for doing so?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 9, 2018 at 7:39 am
For anyone in Europe doing this as part of a MiFID II requirement, a useful list of titles and multi-part name prefixes is below. Unfortunately, I couldn't find the original source for this list, so there could have been additions. Our approach was to replace each multi-part prefix with a unique token (that doesn't contain spaces) making it easier to use spaces to parse the rest of the name. The tokens then get replaced by the appropriate prefix afterwards.
Removing titles
Any prefixes to the names that denote titles, position, profession or academic qualifications, are to be removed. This includes, but is not limited to the following list; this list is not case sensitive:
atty, coach, dame, dr, fr, gov, honourable, madam(e), maid, master, miss, monsieur, mr, mrs, ms, mx, ofc, ph.d, pres, prof, rev, sir
Removing prefixes
am, auf, auf dem, aus der, d, da, de, de l’, del, de la, de le, di, do, dos, du, im, la, le, mac, mc, mhac, mhíc, mhic giolla, mic, ni, ní, níc, o, ó, ua, ui, uí, van, van de, van den, van der, vom, von, von dem, von den, von der
Prefixes to surnames that are not included above, or prefixes attached to the name, i.e. McDonald, MacChrystal, O'Brian, O'Neal, should not be removed;
but note that the apostrophes will be removed in the next step. The above list is not case sensitive.
November 9, 2018 at 8:09 am
Jeff Moden - Thursday, November 8, 2018 9:28 PMbrickpack - Thursday, July 11, 2013 2:54 PMI'm trying to split a full name column up into First / Middle / Last....but some of the names are like "Joe B. W. LastName" or "Jenny MiddleName MaidenName MarriedName" and so on...In the past I've used parsename() and charindex() to solve this with "cleaner" data. How would I go about handling these?ThanksWhenever I see someone request that they need help with splitting a full name column, I have to ask "WHY"? Why do you need to do such a thing in your case? What is the business reason for doing so?
Agree with this. If you don't already have business logic around data entry of names to standardize things and make this unnecessary, then you have no way of knowing if you are splitting the names correctly. If you're combining names where mother's maiden name is commonly included with other names that do not typically include it, how do you know which one is correct? What if they want their name to be Van Buren instead of VanBuren? etc.
Most of the time, you end up passing data back to someone to manually clean it up anyway.
-------------------------------------------------------------------------------------------------------------------------------------
Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses
November 9, 2018 at 8:32 am
Chris Wooding - Friday, November 9, 2018 7:39 AMFor anyone in Europe doing this as part of a MiFID II requirement, a useful list of titles and multi-part name prefixes is below. Unfortunately, I couldn't find the original source for this list, so there could have been additions. Our approach was to replace each multi-part prefix with a unique token (that doesn't contain spaces) making it easier to use spaces to parse the rest of the name. The tokens then get replaced by the appropriate prefix afterwards.
Removing titles
Any prefixes to the names that denote titles, position, profession or academic qualifications, are to be removed. This includes, but is not limited to the following list; this list is not case sensitive:
atty, coach, dame, dr, fr, gov, honourable, madam(e), maid, master, miss, monsieur, mr, mrs, ms, mx, ofc, ph.d, pres, prof, rev, sir
Removing prefixes
am, auf, auf dem, aus der, d, da, de, de l’, del, de la, de le, di, do, dos, du, im, la, le, mac, mc, mhac, mhÃc, mhic giolla, mic, ni, nÃ, nÃc, o, ó, ua, ui, uÃ, van, van de, van den, van der, vom, von, von dem, von den, von derPrefixes to surnames that are not included above, or prefixes attached to the name, i.e. McDonald, MacChrystal, O'Brian, O'Neal, should not be removed;
but note that the apostrophes will be removed in the next step. The above list is not case sensitive.
Lordy. While you can probably get to some decent level of accuracy, there's not much hope that it will be error free and so you do need to plan on that. If it's the regular requirement in Europe that it appears to be, there should be a fundamental change in how such data is created, received, and processed. Yeah... I know you already know that but I had to say it out loud. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
November 9, 2018 at 8:35 am
p.s. The list of titles is really far from being inclusive especially when it comes to such things as Microsoft and other certifications and awards (like MVP).
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply