November 15, 2018 at 11:05 am
Hi there! I have this code:
SELECT
CASE WHEN CHARINDEX(' ',name) = 0 THEN name
WHEN CHARINDEX(' ',name) = PATINDEX('% _[., ]%',name) THEN
RTRIM(SUBSTRING(name, 1, CHARINDEX(' ',name) + 2))
ELSE SUBSTRING(name,1, CHARINDEX(' ',name))END [firstname],
CASE
WHEN CHARINDEX(' ',name) = 0 THEN ''
WHEN CHARINDEX(' ',name) = PATINDEX('% _[., ]%',name)
THEN LTRIM(SUBSTRING(name,CHARINDEX(' ',name) + 3,1000))
ELSE SUBSTRING(name,CHARINDEX(' ',name)+1,1000)END [lastname]
FROM [myTable]
And it works just fine... however, my full name field has names all sorts of ways. This code takes care of names like this: First M. Last but if I have First Middle Last it doesn't parse it correctly. I thought the ELSE statement took care of that but I can't seem to troubleshoot how to get it to do the last name correctly if the middle name is spelled out instead of abbreviated with a period. I have a third variation as well that is just First M Last without the period after the middle initial.
Thanks!
November 15, 2018 at 11:59 am
amy26 - Thursday, November 15, 2018 11:05 AMHi there! I have this code:
SELECT
CASE WHEN CHARINDEX(' ',name) = 0 THEN name
WHEN CHARINDEX(' ',name) = PATINDEX('% _[., ]%',name) THEN
RTRIM(SUBSTRING(name, 1, CHARINDEX(' ',name) + 2))
ELSE SUBSTRING(name,1, CHARINDEX(' ',name))END [firstname],
CASE
WHEN CHARINDEX(' ',name) = 0 THEN ''
WHEN CHARINDEX(' ',name) = PATINDEX('% _[., ]%',name)
THEN LTRIM(SUBSTRING(name,CHARINDEX(' ',name) + 3,1000))
ELSE SUBSTRING(name,CHARINDEX(' ',name)+1,1000)END [lastname]
FROM [myTable]And it works just fine... however, my full name field has names all sorts of ways. This code takes care of names like this: First M. Last but if I have First Middle Last it doesn't parse it correctly. I thought the ELSE statement took care of that but I can't seem to troubleshoot how to get it to do the last name correctly if the middle name is spelled out instead of abbreviated with a period. I have a third variation as well that is just First M Last without the period after the middle initial.
Thanks!
Have you checked to see if you also have titles and salutations and multiword first, middle, and last names?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 15, 2018 at 12:07 pm
Yes, I removed the titles and this code is putting any suffixes into the last name field which I'm removing after the process runs (ie Jr.).
Its just the middle name part I'm having an issue with.
November 15, 2018 at 12:39 pm
amy26 - Thursday, November 15, 2018 11:05 AMbut I can't seem to troubleshoot how to get it to do the last name correctly
If you take the substring of the reversed string up to the first space, then reverse it again to put it the right way round, you should have the surname.
select reverse(substring(reverse(name),1,charindex(' ',reverse(name)))) LastName
or even simpler with right:select right(name,charindex(' ',reverse(name))) LastName
November 15, 2018 at 2:14 pm
Another issue with name parsing to be aware of is how different cultures parse names. You can't always assume that the "first" part of a name is the given name and the "last" part of a name is the surname.
https://en.wikipedia.org/wiki/Surname
November 16, 2018 at 7:17 am
I'm just going to leave this here... https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
Thomas Rushton
blog: https://thelonedba.wordpress.com
November 16, 2018 at 7:24 am
Thank you I appreciate the culture thing... but that is neither here nor there regarding this. I have to do it.
I think I got it like 98% working and that will have to be good enough. Thank you all!
November 16, 2018 at 12:42 pm
To begin with, trying to parse names is incredibly difficult and is something you should buy as a service from companies that have invested a lot of effort into it. A quick internet search came up with this, but there are many others: https://www.nameapi.org/en/demos/name-parser/%5B/url%5D
You can't really solve the general problem with a finite amount of T-SQL text parsing, but if you are determined or forced to try anyway, I would try to split it up into a series of transformations rather than attempt to create single handle-any-case expressions. I came up with this code to handle a small number of cases:
Names may be First, First Last, First Middle Last, "Last, First", or "Last, First Middle". Middle names may be a complete name or an initial optionally followed by a period.SELECT OrigFirst = n.FirstName,
OrigMiddle = n.MiddleName,
OrigLast = n.LastName,
MangledName,
-- NoCommaName, Parse1.CommaPos, Parse2.FirstSpace, Parse2.LastSpace,
Mod2.FirstName,
Mod2.MiddleName,
Mod2.LastName
FROM #names n
CROSS APPLY ( SELECT CommaPos = CHARINDEX(',', MangledName) ) Parse1
CROSS APPLY ( SELECT NoCommaName = CASE WHEN CommaPos = 0 THEN MangledName
ELSE LTRIM(RTRIM(STUFF(MangledName, 1, CommaPos, '') + ' ' + LEFT(MangledName, CommaPos - 1))) END ) Mod1
CROSS APPLY ( SELECT FirstSpace = CHARINDEX(' ', NoCommaName), LastSpace = LEN(NoCommaName) - CHARINDEX(' ', REVERSE(NoCommaName)) + 1 ) Parse2
CROSS APPLY (
SELECT FirstName = CASE WHEN FirstSpace = 0 THEN NoCommaName ELSE LEFT(NoCommaName, FirstSpace-1) END,
MiddleName = REPLACE(CASE WHEN FirstSpace = LastSpace THEN '' ELSE SUBSTRING(NoCommaName, FirstSpace + 1, LastSpace - FirstSpace - 1) END, '.', ''),
LastName = CASE WHEN FirstSpace = 0 THEN '' ELSE RIGHT(NoCommaName, LEN(NoCommaName) - LastSpace) END
) Mod2
It defines FirstName as the first word, LastName as the last word, and MiddleName is everything in between.
There is a lot it doesn't handle. It screws up names with suffixes like Jr or III, but you say you've already removed those.
It doesn't handle compound last names, but you didn't specify any rules for names with four or more parts. The web site linked above gives José Maria "Josema" Fernández de la Torre as an example, which I would hate to have to define the rules for.
So it is probably not the final version of the code you're looking for. It is a way to approach the problem that allows you to find the exceptions, decide what rules need to be added to handle them, and then add additional transformation steps. I don't think it can be done only by creating increasingly complex expressions.
You may want to add rules that depend on the number of name parts. If one step defined NameParts = LEN(Name) - LEN(REPLACE(Name, ' ', '') + 1 (assuming the name was trimmed and all multiple spaces were compressed), you could use something like
SELECT FN = a, MN = '', LN = '' WHERE NameParts = 1
UNION ALL SELECT FN = a, MN = '', LN = b WHERE NameParts = 2
UNION ALL SELECT FN = a, MN = b, LN = c WHERE NameParts = 3
UNION ALL ... (compound name rules)
November 16, 2018 at 12:47 pm
I should point out that the query had the original split name parts because the test data I used came from an Employee table with names that were already split. I merged them into a mangled name with 20% dotted initials and 33% as Last, First. This made it easy to spot the failures.
SELECT TOP 500
EmployeeMasterId, FirstName, MiddleName, LastName,
MangledName = RTRIM(CASE WHEN EmployeeMasterId % 6 < 2
THEN ISNULL(LastName + ', ', '') + ISNULL(FirstName + ' ', '') + ISNULL(ModMid + ' ', '')
ELSE ISNULL(FirstName + ' ', '') + ISNULL(ModMid + ' ', '') + ISNULL(LastName, '')
END)
INTO #names
FROM dbo.Employee_Master
CROSS APPLY ( SELECT ModMid = CASE WHEN EmployeeMasterId % 5 = 1 AND LEN(MiddleName) = 1 THEN MiddleName + '.' ELSE MiddleName END ) m1
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply