extract name to 3 single columns

  • Is there a better way to extract to 3 single columns from one full name column: Lastname, FirstName MiddleInitials,

    Examples:

    Lyle, Julie C

    King, Joanne P

    Healy, Dena     (without initial)

    Allen Leslie, Lori V

    Thanks

    • This topic was modified 3 years, 9 months ago by  sqlfriend.
  • This ends up being a very complex task.  There are a number of threads about it on this site.

    Here's a doozy (from history): Claus Philipp Maria Justinian Schenk Graf von Stauffenberg.

    The "von" is an honorary German thing, and it's considered part of the last name.  So you potentially have those types of issues as well.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • The query could split the input into left and right strings based on the location of the comma.  Then determine if the trailing character of the right string is a middle initial.  Something like this works with the examples provided

    declare
    @strings table(string varchar(200));

    insert @strings values
    ('Lyle, Julie C'),
    ('King, Joanne P'),
    ('Healy, Dena'),
    ('Allen Leslie, Lori V'),
    ('Claus Philipp Maria Justinian Schenk Graf von Stauffenberg');

    select lr.l fname, calc.lname, calc.m_init
    from @strings
    cross apply (values (charindex(',', string),
    len(string))) st(cloc, lng)
    cross apply (values (case when st.cloc>0
    then left(string, st.cloc-1)
    else string end,
    case when st.cloc>0
    then right(string, st.lng-st.cloc-1)
    else '' end)) lr(l, r)
    cross apply (values (case when lr.r like '% [a-zA-Z]'
    then left(lr.r, len(lr.r)-2)
    else lr.r end,
    case when lr.r like '% [a-zA-Z]'
    then right(lr.r, 1)
    else '' end)) calc(lname, m_init);
    fnamelnamem_init
    LyleJulieC
    KingJoanneP
    HealyDena
    Allen LeslieLoriV
    Claus Philipp Maria Justinian Schenk Graf von Stauffenberg

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thanks, what does st(cloc, lng) mean?

    I join with my table, but it says  'Incorrect syntax near 'st'.'

    SELECT employeeID

    ,DisplayName

    , lr.l fname, calc.lname, calc.m_init

    FROM dbo.ADStaff a

    cross apply (values (charindex(',', DisplayName,

    len(DisplayName))) st(cloc, lng)

    cross apply (values (case when st.cloc>0

    then left(DisplayName, st.cloc-1)

    else DisplayName end,

    case when st.cloc>0

    then right(DisplayName, st.lng-st.cloc-1)

    else '' end)) lr(l, r)

    cross apply (values (case when lr.r like '% [a-zA-Z]'

    then left(lr.r, len(lr.r)-2)

    else lr.r end,

    case when lr.r like '% [a-zA-Z]'

    then right(lr.r, 1)

    else '' end)) calc(lname, m_init);

  • This was removed by the editor as SPAM

  • Your missing the bracket at the end of display name.

    Cloc is the character location

    lng in the length of that particular segment

  • This is going to be more work than you think it is. I would strongly recommend that you get a package designed to handle names and addresses, such as Melissa data.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • It's extremely difficult to get even mostly right, and tends to be a trial and error process of discovering and fixing the logic for those that were mishandled... without breaking it for formats that already work.

    Does the data contain ay suffixes like Jr, Sr, III, Esq, etc. ? Or prefixes/honorifics  like Dr,, Mr., Ms. Mrs.,  Sir, Rev, etc. (and all the variations of full or abbreviation, with or without punctuation)?

    Is this a one-time conversion? How many rows of data do you have?  Name and address parsing are one of the cases where I've found making multiple passes (sometimes even in a spreadsheet!) to be helpful.

     

  • Yes I mostly agree with ratbak and I even narrowly agree with Joe.  If you've got access to pro software then use that 🙂  The code provided does sort of make "multiple passes" because the CROSS APPLY's are applied sequentially.  More rules could be layered on hierarchically with more JOIN's or OUTER/CROSS APPLY's within the same FROM clause.

    One check that seems to make a lot of sense and I probably should've mentioned is to make sure there's at most 1 comma.  If that's NOT the case, then this code would quite likely return nonsensical results.  The old-school way to check was to REPLACE the comma with 2 spaces and compare the string LEN's before and after.  The new-school way is to split the string and count the resultant VALUES.

    select lr.l fname, calc.lname, calc.m_init, cma_check.cnt
    from @strings s
    cross apply (select count(*)-1
    from string_split(s.string, ',')) cma_check(cnt)
    cross apply (values (charindex(',', string),
    len(string))) st(cloc, lng)
    cross apply (values (case when st.cloc>0
    then left(string, st.cloc-1)
    else string end,
    case when st.cloc>0
    then right(string, st.lng-st.cloc-1)
    else '' end)) lr(l, r)
    cross apply (values (case when lr.r like '% [a-zA-Z]'
    then left(lr.r, len(lr.r)-2)
    else lr.r end,
    case when lr.r like '% [a-zA-Z]'
    then right(lr.r, 1)
    else '' end)) calc(lname, m_init)
    where cma_check.cnt<=1;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thanks. The reason we want to do this is to update  firstname, middle and last name in a table with the displayname in active directory. but in active directory the format is lastname, firstname  middle initials.  Yes there could be a lot of different cases.

    For now, I will make it simple.  I will just need to figure out those that has format of Lastname, firstname. All others I will keep the same as original table's first and last.

    So basically there is only one format we want to solve, that is before the comma, it is last name, after the comma and space, only one word and it is the firstname.

    For example only solve this:

    Adam, Julie

    Baleto Ferguson, Rebecca

    How could I add a filter in where clause to handle only this format to single out lastname and firstname?

    Thanks,

  • I think you're looking for something like this.  The first CROSS APPLY 'cma_check(cnt)' counts the number of commas.  The second CROSS APPLY 'wrd_check(cnt)' counts the number of words in the first name.  To get the word count of the first name it uses Eirikur's very fast dbo.DelimittedSplit8k_LEAD ordinal splitter and then splits the 2nd item (the first name) by space delimiter.   The WHERE clause restricts the results set to rows where the comma count is 1 and the word count of the first name is 1.

    declare
    @strings table(string varchar(200));

    insert @strings values
    ('Lyle, Julie C'),
    ('King, Joanne P'),
    ('Healy, Dena'),
    ('Allen Leslie, Lori V'),
    ('Allen Leslie, Lori'),
    ('Claus Philipp Maria Justinian Schenk Graf von Stauffenberg');

    select left(string, st.cloc-1) lname, right(string, st.lng-st.cloc-1) fname
    from @strings s
    cross apply (select count(*)-1
    from string_split(s.string, ',')) cma_check(cnt)
    cross apply (select count(*)-1
    from dbo.DelimitedSplit8K_LEAD(s.string, ',') dsl
    cross apply string_split(dsl.Item, ' ')
    where dsl.ItemNumber=2) wrd_check(cnt)
    cross apply (values (charindex(',', string),
    len(string))) st(cloc, lng)
    where cma_check.cnt=1
    and wrd_check.cnt=1;
    lnamefname
    HealyDena
    Allen LeslieLori

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thank you so much, that works.

    But I need to understand this more, esp,  those functions and cross apply. Will come back with questions.

  • Could you please explain what this part mean?

    cross apply (values (charindex(',', string),

    len(string))) st(cloc, lng)

    also can the cross apply replace with inner join?

    thanks

     

  • cross apply (values (charindex(',', string),
    len(string))) st(cloc, lng)

    The purpose of these 2 lines are to D.R.Y. (don't repeat yourself) out the code.  CROSS APPLY enables code to be evaluated for each row of the query results.  VALUES is the Table Value Constructor (or "creator of virtual tables") and it precedes (or announces to the SQL Engine) the definition of column(s), in this case 'cloc' (the offset location of the comma within the input string as returned by the CHARINDEX function) and 'lng' (the character length of the input string as returned by the LEN function).  So for each row of the query results a virtual table called 'st' is created with 2 columns, 'cloc' and 'lng'.  Because the LEFT and RIGHT functions in the SELECT list would otherwise require the underlying functions (of 'cloc' and 'lng') to be called multiple times, the query references the virtual table 'st' instead.

    No, CROSS APPLY cannot be replaced by INNER JOIN.  CROSS APPLY does not use the keyword ON to specify join conditions.  Column values from each row of the query results could be passed into the "virtual table" referenced by the CROSS APPLY as parameters.

    • This reply was modified 3 years, 9 months ago by  Steve Collins.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply