Data modification

  • I am currently migrating old data to new sql2K system. I have a problem importing client names into the new system. The old names were entered in one column "Mr. erick manza"

    I need to divide that into title, first name and last name to fit in the new design. Any Help?

  • Need some more details as to how consistent the data is.

    IE.

    Do they all begin with Mr or Mrs?

    Is there always a dot after the title?

    Is the name always stored title, firstname, surname?

     


    ------------------------------
    The Users are always right - when I'm not wrong!

  • Yes they always start with either Mr. Miss, mrs. but there is not a dot always. They are stored as title, lastname, firstname

  • Is the data consistent other than the period after the title? 

    Is there a comma between lastname and firstname?  If so, in EVERY record?

    Any records with more than three strings (i.e. title= "Herr Doktor", or lastname = "Alvarez Montoya", or a record with a title, lastname, firstname, and middlename/initial)?

  • yes the data is consistent with period after title. No there is no comma between names and there are no records with more than three strings

  • OK, this aint pretty but it should work.  You will need to adjust the code for your table names and if you don't want to create a table from the query, you will need to adjust the SELECT ... INTO statement to insert the records into your new table.  It might be easier to run this, then add the other fields needed to the table this script creates. 

    Two points:

    1) This will not strip out any periods that are part of the title.  You may want to do that after you have parsed the data.

    2) This assumes that EVERY record has a title, one space, last name, one space, and a first name.  If a record has extra spaces between the strings, or has more (or fewer) than 3 strings, you will get some unexpected results.  Legacy data is often poor data.  You will want to check the results.

     

    --Table with values to be parsed

    CREATE TABLE OldNames(FullName varchar(50))

    INSERT INTO OldNames (FullName) VALUES('Mr. Jones Eric')

    INSERT INTO OldNames (FullName) VALUES('Mr Doe John')

    INSERT INTO OldNames (FullName) VALUES('Mrs. Smith Janet')

    INSERT INTO OldNames (FullName) VALUES('Mrs Johnson Virginia')

    INSERT INTO OldNames (FullName) VALUES('Miss Reynolds Miranda')

    INSERT INTO OldNames (FullName) VALUES('Miss. Proctor Emily')

    --Parse the FullName field into Title, LastName and FirstName fields

    --   and insert the values into a new table

    SELECT  Left(FullName, CharIndex(' ', FullName) -1) AS Title,

          Right(FullName, CharIndex(' ',Reverse(FullName))- 1) AS LastName,

                 SubString(FullName, CharIndex(' ', FullName)+ 1, 

        (CharIndex(' ',FullName, CharIndex(' ',FullName)+1))-(CharIndex(' ', FullName) )) AS FirstName

    INTO NewNames

    FROM OldNames

    --Check results

    select * from OldNames

    select * from NewNames

    --Clean up samples

    drop table OldNames

    drop table NewNames

    Hope this helps.  I didn't have much time so this is an ugly, brute-force approach.  If someone has a more polished solution, I hope they will post it.

     

  • the task like this can be (and should be) much easier accomplished in ADO.NET by implementing the following steps:

    1. connect to the SQL server

    2. fetch all the data from your source table (FullNames in this case) into a dataset

    3. process all your strings in a loop by using Regular Expressions:

    4. Insert/Update your target table with altered (in this case split) strings

    -----------------------------------------------------

    here is a C# console that would split your stings into 3 parts according to your spec. The code in the console is Step 3) of the above algorithm. Regular exression that does the trick is just one line of code!

    //declare a regex pattern

    string pattern = @"(\w+\.?)";

    @"(\w+\.?)"       -  It means : match all alphanumeric characters and a dot if available (dot can be after any part of the name).

    using System;

    using System.Text.RegularExpressions;

       public class Regex_Match

       {

           public static void Main()

           {

               //Scan an input string for matches, print them

               //declare an input string of text

               string InputString = @"'Mr. Jones Eric'";

               //declare a regex pattern

               string pattern = @"(\w+\.?)";

               //declare new regex object with no options set

               Regex r = new Regex( pattern, RegexOptions.None);

               //declare a match object

               Match m;

               //match all the words in the input sentence using Match method

               m = r.Match(InputString);

               //print the matches

               while (m.Success)

               {

                   Console.WriteLine(m);

                   //get the next one to print

                   m = m.NextMatch();

               }

               Console.WriteLine("\nPress Enter to Exit");

               Console.ReadLine();

           }

       } //end of the class

    /*

    returns:

    Mr.

    Jones

    Eric

    */

    You can easily modify your Regex in case the pattern of your original data changes . And u would need to change just one line of the code:

    string pattern = @"(\w+\.?)";

    that's what people call 'the power of RegEx'

    Sergei

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply