Help with Derived Column To Split-Up An Individual's Name into Separate Parts

  • I'm working on an SSIS package for a client. They receive files exported from their clients and they don't put any restrictions on how the data is provided to them via the Excel files they send, so 99.9% of the files arrive with a "Full Name" field instead of individual "First Name, "Middle Initial/Name", "Last Name", and "Surname" columns.

    The names are generally in this pattern, which has a few possible variances as you can see as far as how the names are present in the column:

    Adams, Jr., Dwight

    ([Last Name], [Surname], [First Name])

    Allen, Frederick

    ([Last Name], [First Name])

    Bishop, Joseph A.

    ([Last Name], [First Name] [Middle Initial])

    Butler, III, Lamarrick J.

    ([Last Name], [Surname], [First Name] [Middle Initial])

    Nicholson, Tina Michelle

    ([Last Name], [First Name] [Middle Name])

    Cagle, Sr., Kimzey Alan

    ([Last Name], [Surname], [First Name] [Middle Name])

    I'm trying to determine exactly how to handle these varying scenarios as far as how the names will be present in the single column as to appropriate & accurately split them into their various parts.

    I wasn't sure if I should go with parsing formulas for each component of the full name directly within the derived column task, or if I should write a function that accepts the full name as an incoming parameter and then the parts of the name are parsed that way and somehow returned to the derived column task. I'm trying to tackle this the best way possible and I'm seeking suggestions.

    I've seen a few code samples out there for parsing through a full name into the various individual parts of the name, but I haven't run across any specific examples yet that deal with full names structured exactly the way that I've shown above in the examples that they can be structured in the incoming files.

    Thanks in advance.

  • As you probably know already, this is a very complex problem, especially when you are dealing with multipart names, non-western names, etc.

    You may want to invest in commercial software that is designed to handle this problem:

    http://www.softwarecompany.com/netgender-api.html

  • The complexity of the problem means that a derived column solution is unlikely to be practical.

    A Script Component would allow you to write code with as much complexity as you need, to handle the various combinations you have described, as long as you can come up with a set of logic rules to apply in order to perform the parsing.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply