Parsing First and Last Names from Full Name

  • Hi there!  I have this code:


    SELECT
    CASE   WHEN CHARINDEX(' ',name) = 0 THEN name 
    WHEN CHARINDEX(' ',name) = PATINDEX('% _[., ]%',name) THEN
    RTRIM(SUBSTRING(name, 1, CHARINDEX(' ',name) + 2))  
    ELSE SUBSTRING(name,1, CHARINDEX(' ',name))END [firstname],
    CASE 
    WHEN CHARINDEX(' ',name) = 0 THEN '' 
    WHEN CHARINDEX(' ',name) = PATINDEX('% _[., ]%',name)
    THEN LTRIM(SUBSTRING(name,CHARINDEX(' ',name) + 3,1000))  
    ELSE SUBSTRING(name,CHARINDEX(' ',name)+1,1000)END [lastname]
    FROM [myTable]

    And it works just fine... however, my full name field has names all sorts of ways.  This code takes care of names like this:  First M. Last but if I have First Middle Last it doesn't parse it correctly.  I thought the ELSE statement took care of that but I can't seem to troubleshoot how to get it to do the last name correctly if the middle name is spelled out instead of abbreviated with a period.  I have a third variation as well that is just First M Last without the period after the middle initial.

    Thanks!

  • amy26 - Thursday, November 15, 2018 11:05 AM

    Hi there!  I have this code:


    SELECT
    CASE   WHEN CHARINDEX(' ',name) = 0 THEN name 
    WHEN CHARINDEX(' ',name) = PATINDEX('% _[., ]%',name) THEN
    RTRIM(SUBSTRING(name, 1, CHARINDEX(' ',name) + 2))  
    ELSE SUBSTRING(name,1, CHARINDEX(' ',name))END [firstname],
    CASE 
    WHEN CHARINDEX(' ',name) = 0 THEN '' 
    WHEN CHARINDEX(' ',name) = PATINDEX('% _[., ]%',name)
    THEN LTRIM(SUBSTRING(name,CHARINDEX(' ',name) + 3,1000))  
    ELSE SUBSTRING(name,CHARINDEX(' ',name)+1,1000)END [lastname]
    FROM [myTable]

    And it works just fine... however, my full name field has names all sorts of ways.  This code takes care of names like this:  First M. Last but if I have First Middle Last it doesn't parse it correctly.  I thought the ELSE statement took care of that but I can't seem to troubleshoot how to get it to do the last name correctly if the middle name is spelled out instead of abbreviated with a period.  I have a third variation as well that is just First M Last without the period after the middle initial.

    Thanks!

    Have you checked to see if you also have titles and salutations and multiword first, middle, and last names?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, I removed the titles and this code is putting any suffixes into the last name field which I'm removing after the process runs (ie Jr.).
    Its just the middle name part I'm having an issue with.

  • amy26 - Thursday, November 15, 2018 11:05 AM

    but I can't seem to troubleshoot how to get it to do the last name correctly

    If you take the substring of the reversed string up to the first space, then reverse it again to put it the right way round, you should have the surname.

    select reverse(substring(reverse(name),1,charindex(' ',reverse(name)))) LastName
    or even simpler with right:
    select right(name,charindex(' ',reverse(name))) LastName

  • Another issue with name parsing to be aware of is how different cultures parse names.  You can't always assume that the "first" part of a name is the given name and the "last" part of a name is the surname.
    https://en.wikipedia.org/wiki/Surname

  • I'm just going to leave this here...  https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

    Thomas Rushton
    blog: https://thelonedba.wordpress.com

  • Thank you I appreciate the culture thing... but that is neither here nor there regarding this.  I have to do it.  

    I think I got it like 98% working and that will have to be good enough.  Thank you all!

  • To begin with, trying to parse names is incredibly difficult and is something you should buy as a service from companies that have invested a lot of effort into it.  A quick internet search came up with this, but there are many others:  https://www.nameapi.org/en/demos/name-parser/%5B/url%5D

    You can't really solve the general problem with a finite amount of T-SQL text parsing, but if you are determined or forced to try anyway, I would try to split it up into a series of transformations rather than attempt to create single handle-any-case expressions.  I came up with this code to handle a small number of cases:
    Names may be First, First Last, First Middle Last, "Last, First", or "Last, First Middle".  Middle names may be a complete name or an initial optionally followed by a period.

    SELECT    OrigFirst = n.FirstName,
            OrigMiddle = n.MiddleName,
            OrigLast = n.LastName,
            MangledName,
            -- NoCommaName, Parse1.CommaPos, Parse2.FirstSpace, Parse2.LastSpace,
            Mod2.FirstName,
            Mod2.MiddleName,
            Mod2.LastName
    FROM #names n
    CROSS APPLY ( SELECT CommaPos = CHARINDEX(',', MangledName) ) Parse1
    CROSS APPLY ( SELECT NoCommaName = CASE WHEN CommaPos = 0 THEN MangledName
                    ELSE LTRIM(RTRIM(STUFF(MangledName, 1, CommaPos, '') + ' ' + LEFT(MangledName, CommaPos - 1))) END ) Mod1
    CROSS APPLY ( SELECT    FirstSpace = CHARINDEX(' ', NoCommaName), LastSpace = LEN(NoCommaName) - CHARINDEX(' ', REVERSE(NoCommaName)) + 1 ) Parse2
    CROSS APPLY (
        SELECT    FirstName = CASE WHEN FirstSpace = 0 THEN NoCommaName ELSE LEFT(NoCommaName, FirstSpace-1) END,
                MiddleName = REPLACE(CASE WHEN FirstSpace = LastSpace THEN '' ELSE SUBSTRING(NoCommaName, FirstSpace + 1, LastSpace - FirstSpace - 1) END, '.', ''),
                LastName = CASE WHEN FirstSpace = 0 THEN '' ELSE RIGHT(NoCommaName, LEN(NoCommaName) - LastSpace) END
    ) Mod2

    It defines FirstName as the first word, LastName as the last word, and MiddleName is everything in between.
    There is a lot it doesn't handle.  It screws up names with suffixes like Jr or III, but you say you've already removed those.
    It doesn't handle compound last names, but you didn't specify any rules for names with four or more parts.  The web site linked above gives José Maria "Josema" Fernández de la Torre as an example, which I would hate to have to define the rules for.

    So it is probably not the final version of the code you're looking for.  It is a way to approach the problem that allows you to find the exceptions, decide what rules need to be added to handle them, and then add additional transformation steps.  I don't think it can be done only by creating increasingly complex expressions.
    You may want to add rules that depend on the number of name parts.  If one step defined NameParts = LEN(Name) - LEN(REPLACE(Name, ' ', '') + 1 (assuming the name was trimmed and all multiple spaces were compressed), you could use something like

    SELECT FN = a, MN = '', LN = '' WHERE NameParts = 1
    UNION ALL SELECT FN = a, MN = '', LN = b WHERE NameParts = 2
    UNION ALL SELECT FN = a, MN = b, LN = c WHERE NameParts = 3
    UNION ALL ... (compound name rules)

  • I should point out that the query had the original split name parts because the test data I used came from an Employee table with names that were already split.  I merged them into a mangled name with 20% dotted initials and 33% as Last, First.  This made it easy to spot the failures.

    SELECT TOP 500
            EmployeeMasterId, FirstName, MiddleName, LastName,
            MangledName = RTRIM(CASE WHEN EmployeeMasterId % 6 < 2
                        THEN ISNULL(LastName + ', ', '') + ISNULL(FirstName + ' ', '') + ISNULL(ModMid + ' ', '')
                        ELSE ISNULL(FirstName + ' ', '') + ISNULL(ModMid + ' ', '') + ISNULL(LastName, '')
                        END)
    INTO #names
    FROM dbo.Employee_Master
    CROSS APPLY    ( SELECT ModMid = CASE WHEN EmployeeMasterId % 5 = 1 AND LEN(MiddleName) = 1 THEN MiddleName + '.' ELSE MiddleName END ) m1

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply