Parsing Data

  • I have files coming in through DTS packages from Excel which have one field for name--including last, first, and middle (some also have a suffix--MD, etc..).  I wish to parse this information into F_Name, L_Name,M_Initial, and Degree.  I also have a field, Whole_Name, which is where this data gets imported to.  Should stored procedures be written to parse this data or should a function be written that will handle this?  If one of these is a yes--does anyone have any examples?

    Thanks for your assistance.


    Paula

  • If I understand you correctly, you want to split a single column of data contaning first name, surname etc into separate columns in a database as well as one straightforward column copy.

    In order to keep everything inside the package for ease of maintenance and debugging, the option I would choose is to write an activex script. It will be marginally slower than a function or stored procedure, but it will give you a lot more control on the import process.

    However, you need to have separating characters in the data in order to determine which column of the data you should be importing into. I presume you have this.

    You just need to write an activex script to split one line of data into separate coulmns and then copy across.

    Example:

    DTSDestination("destinationname") = DTSSource("sourcename")


    ------------------------------
    The Users are always right - when I'm not wrong!

  • IT would be helpfull if you could provide a few examples of the data. Such as

     

    Turner,Bob S MD

    Or

    Bob S Turner MD

     

    Just so we know what format you are dealing with.

  • Here are some examples

    City,State (just a comma no spaces)

    Pittsburgh,PA

    Whole Name (Unfortunately this combination is all within one spreadsheet--whole middle name mixed with initial)

    Struthers, Sally J.

    Barker, Bob P.

    Carmike, James Anthony

     

    Whole Name another file

    Parks David J, MD

    Struthers Sally Ann, PA-C

    Barker Robert H Jr, MD

    Hope this helps.

    Thanks,


    Paula

  • I agree with Jonathan, use activex in DTS like this

    Function Main()

        strName = DTSSource("Whole_Name")

        strName = Replace(strName, ",", " ", 1, -1)

        strName = Replace(strName, ".", " ", 1, -1)

        strName = Replace(strName, "  ", " ", 1, -1)

        intIndex = InStr(1, strName, " ")

        strLast = Left(strName, intIndex - 1)

        strName = Mid(strName, intIndex+1, Len(strName)-intIndex)

        intIndex = InStr(1, strName, " ")

        strFirst = Left(strName, intIndex - 1)

        strName = Mid(strName, intIndex+1, Len(strName)-intIndex)

        strInitial = Left(strName, 1)

        intIndex = InStr(1, strName, " ")

        If intIndex = 0 Then

            strName = ""

        Else

            strName = Mid(strName, intIndex+1, Len(strName)-intIndex)

        End If

        intIndex = InStr(1, strName, " ")

        If intIndex = 0 Then

            strDegree = strName

        Else

            strLast = strLast & " " & Left(strName, intIndex - 1)

            strDegree = Mid(strName, intIndex+1, Len(strName)-intIndex)

        End If

        DTSDestination("Whole_Name") = DTSSource("Whole_Name")

        DTSDestination("L_Name") = strLast

        DTSDestination("F_Name") = strFirst

        DTSDestination("M_Initial") = strInitial

        DTSDestination("Degree") = strDegree

        Main = DTSTransformStat_OK

    End Function

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Here is an example of several methods you can use for each case within an ActiveX transformation.

    Option Explicit

    Dim strName, arValues, strSuffix, strLastName

    '------------------------------------------------------------'

    '/                       City, State                        /'

    strName = "Pittsburgh,PA"

    Comma_Pair strName, arValues

    MsgBox arValues(0) ' City

    MsgBox arValues(1) ' State

    '------------------------------------------------------------'

    '/                 Last First Middle, Title                 /'

    strName = "Struthers Sally Ann, PA-C"

    Name_Title strName, arValues, strSuffix

    MsgBox strSuffix   ' Title

    MsgBox arValues(0) ' Last Name

    MsgBox arValues(1) ' First Name

    MsgBox arValues(2) ' Middle Name

    '------------------------------------------------------------'

    '/                    Last, First Middle                    /'

    strName = "Struthers, Sally J."

    Name strName, arValues, strLastName

    MsgBox strLastName ' Last Name

    MsgBox arValues(0) ' First Name

    MsgBox arValues(1) ' Middle Name

    '/**********************************************************/'

    '/                        Functions                         /'

    '/----------------------------------------------------------/'

    '/             Split 2 comma seperated values.              /'

    Function Comma_Pair(strIn, arOut)

     arOut = split(strIn,",")

    End Function

    '/----------------------------------------------------------/'

    '/      Split name in format Last First Middle, Title       /'

    Function Name_Title(strIn, arOut, strOut)

     Dim arTemp

     Call Comma_Pair(strIn, arTemp)

     arOut = split(trim(arTemp(0))," ")

     If UBound(arTemp) > 0 Then

      strOut = trim(arTemp(1))

     End If

    End Function

    '/----------------------------------------------------------/'

    '/         Split name in format Last, First Middle          /'

    Function Name(strIn, arOut, strOut)

     Dim arTemp

     Call Comma_Pair(strIn, arTemp)

     strOut = trim(arTemp(0))

     If UBound(arTemp) > 0 Then

      arOut = split(trim(arTemp(1))," ")

     End If

    End Function

    Note you can still use these when there is no middle portion just make sure you test for if the middle portion is in the array.

    As well if the cases are mixed you could write a function which can test for which possible name function you are dealing with and perform conditionally.

    Hoepfully thou you have a straight forward situation.

  • Thank you for the examples.  I would like to use the one offered by David, however, I am not a programmer and the code doesn't seem to work.  Is there anywhere I can get some assistance with understanding the function and the code associated?  I have a SQL Server 2000 DTS book and it is not helpful in this instance.

    Thanks,


    Paula

  • What do you mean by does not work, do you get a syntax error or are the results not what you expected.

    DTS ActiveX transformations can be written in either VBScript (Visual Basic Script) or JScript (Java Script), VBScript in my case.

    I do not know any books on VBScript but if you search the web you will find descriptions of all the functions and examples, Microsoft MSDN is pretty good. For example in Google try

    VBScript Replace site:microsoft.com

    You may have search through some of the links but I normally find everything I need is there.

    Maybe someone else can suggest other media.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I made another stab at the Active X transformation and did the following.  First, I used sql to remove all of the Jr., Sr. I, II, III, medical degrees, etc... from the end of the names.  I then used the following Active X script:  (the name is in the Last Name, First Name MI or Last Name, First Name Middle Name format in this file.)

    Function Main()

     Dim sIn

     Dim sFirst

     Dim sLast

     Dim sMiddleInit

      

     Dim sHold  ' Holding place while processing string

     

     Dim iLoc

     sIn = DTSSource("Whole_NAME")

     

     'look for comma in input string,  first level format check

     ' if not found instr will return 0 or -1

     

     iLoc = instr(1,sIn,",",0)

     if iLoc > 0 then

      

      'select the portion of the string before the location of the comma

      'this is the last name

      'trim the leading and traiing spaces from the string

      sLast = trim(mid(sIn,1,iLoc -1))

      

      

      'load segment of the string after the comma ioto hold string

      'trim leading and trailing spaces

      sHold = trim(mid(sIn,iLoc + 1))

      

      'check for middle initial in last position after space

      'get the last two characters from the hold var

      'trim the leading and trailing zeros out if this string matches format this should leave only the MI

      'test strign length for 1

      if len(trim(right(sHold,2))) = 1 then

      

       'trim leading and trailing spaces form values

       sFirst = trim(mid(sHold,1,len(sHold)-2))

       sMiddleInit = trim(right(sHold,2))

      else

       'no middle initial so set it to blank and set sFirstrist to sHold

       sFirst = trim(sHold)

       sMiddleInit = ""

      end if

     

      else

                 sLast = DTSSource("Whole_Name")

     sFirst="See Last Name"

     

        

     end if

     

     

     DTSDestination("L_NAME") = sLast

     DTSDestination("F_NAME") = sFirst

     DTSDestination("M_Name") = sMiddleInit

     

     

     Main = DTSTransformStat_OK

    End Function

    The result of this script was that the Last Name goes into the L_Name column.  However, the First and Middle names are still combined in the F_Name column.  Is there something that I should add to this script to also move the first name or alternatively, is there a way to tweak this script to look for a space and basically do a second transformation on the F_Name column to break apart the First Name and the Middle Name.  The problem is that sometime there is a middle initial and sometimes there is a middle name, so I would need to move everything after the space in the F_Name column to the M_Name column.

    Any help is appreciated.

     


    Paula

  • Try this

    if len(trim(right(sHold,2))) = 1 then

        'trim leading and trailing spaces form values

        sFirst = trim(mid(sHold,1,len(sHold)-2))

        sMiddleInit = trim(right(sHold,2))

    else

        iLoc = instr(1,sHold," ",0)

        if iLoc > 0 then

            'middle name, so

            'get first name

            sFirst = Left(sHold,iLoc -1)

            'get middlename

            sMiddleInit = mid(sHold,iLoc+1)

        else

            'no middle initial or name so set it to blank and set sFirstrist to sHold

            sFirst = trim(sHold)

            sMiddleInit = ""

        end if

    end if

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Just for diversity, here's one way to do it the Transact SQL way (ie setbased)

    ..assumptions: three columns output - first, middle and last from a single column that contains either first and lastname or a three-part name.

    create table #x ( nm varchar(30) not null )

    insert #x select 'John Travolta'

    insert #x select 'Samuel S Jackson'

    insert #x select 'John James Doe'

    select * from #x

    nm                            

    ------------------------------

    John Travolta

    Samuel S Jackson

    John James Doe

    select cast(left(nm, charindex(' ', nm)) as char(10)) as fName,

     -- check for middle (ie if there are more than 1 space)

     case when len(nm) - len(replace(nm, ' ', '')) > 1

          then  ltrim(substring(nm, charindex(' ', nm), charindex(' ', nm, charindex(' ', nm) +1) - charindex(' ', nm)))

          else ''

          end  as middle,

     cast(ltrim(right(nm, charindex(' ', reverse(nm)))) as char(20)) as lName

    from #x

    fName      middle     lName               

    ---------- ---------- --------------------

    John                  Travolta           

    Samuel     S          Jackson            

    John       James      Doe                

    (3 row(s) affected)

    /Kenneth

     

     

  • Haha, you have seen too much Pulp fiction, Kenneth

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Just for giggles. What about:

    create table #x ( nm varchar(30) not null )

    insert #x select 'John Travolta'

    insert #x select 'Samuel S Jackson'

    insert #x select 'John James Doe'

    select

     isnull(parsename(replace(nm,' ','.'),3),parsename(replace(nm,' ','.'),2))

     , case

        when len(nm)-len(replace(nm,' ',''))>1

        then parsename(replace(nm,' ','.'),2)

        else '' end

     , parsename(replace(nm,' ','.'),1)

    from #x

    drop table #x

    first                          middle                         last                          

    ------------------------------ ------------------------------ ------------------------------

    John                                                          Travolta                     

    Samuel                         S                              Jackson                      

    John                           James                          Doe                          

    (3 row(s) affected)

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Parsename works just fine for this (and actually would be perfect for the title-part, since that may make it a four-part name)

    ...and yeah, I just looooved Pulp Fiction

    /Kenneth

  • Another little "gotcha."

    The abbreviation "Sr" may mean either "Senior" or "Sister."

    For "Sister", it is usually given before the name,

    while "Senior" usually trails the rest or the name.

     

    Also, names like "Mary Ann L. Van Der Hoff" cause their own problems.

    Just my .02 worth.

     

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply