July 14, 2005 at 12:21 am
I am currently migrating old data to new sql2K system. I have a problem importing client names into the new system. The old names were entered in one column "Mr. erick manza"
I need to divide that into title, first name and last name to fit in the new design. Any Help?
July 14, 2005 at 10:24 am
Need some more details as to how consistent the data is.
IE.
Do they all begin with Mr or Mrs?
Is there always a dot after the title?
Is the name always stored title, firstname, surname?
July 14, 2005 at 10:54 am
Yes they always start with either Mr. Miss, mrs. but there is not a dot always. They are stored as title, lastname, firstname
July 15, 2005 at 7:16 am
Is the data consistent other than the period after the title?
Is there a comma between lastname and firstname? If so, in EVERY record?
Any records with more than three strings (i.e. title= "Herr Doktor", or lastname = "Alvarez Montoya", or a record with a title, lastname, firstname, and middlename/initial)?
July 15, 2005 at 8:43 am
yes the data is consistent with period after title. No there is no comma between names and there are no records with more than three strings
July 15, 2005 at 11:17 am
OK, this aint pretty but it should work. You will need to adjust the code for your table names and if you don't want to create a table from the query, you will need to adjust the SELECT ... INTO statement to insert the records into your new table. It might be easier to run this, then add the other fields needed to the table this script creates.
Two points:
1) This will not strip out any periods that are part of the title. You may want to do that after you have parsed the data.
2) This assumes that EVERY record has a title, one space, last name, one space, and a first name. If a record has extra spaces between the strings, or has more (or fewer) than 3 strings, you will get some unexpected results. Legacy data is often poor data. You will want to check the results.
--Table with values to be parsed
CREATE TABLE OldNames(FullName varchar(50))
INSERT INTO OldNames (FullName) VALUES('Mr. Jones Eric')
INSERT INTO OldNames (FullName) VALUES('Mr Doe John')
INSERT INTO OldNames (FullName) VALUES('Mrs. Smith Janet')
INSERT INTO OldNames (FullName) VALUES('Mrs Johnson Virginia')
INSERT INTO OldNames (FullName) VALUES('Miss Reynolds Miranda')
INSERT INTO OldNames (FullName) VALUES('Miss. Proctor Emily')
--Parse the FullName field into Title, LastName and FirstName fields
-- and insert the values into a new table
SELECT Left(FullName, CharIndex(' ', FullName) -1) AS Title,
Right(FullName, CharIndex(' ',Reverse(FullName))- 1) AS LastName,
SubString(FullName, CharIndex(' ', FullName)+ 1,
(CharIndex(' ',FullName, CharIndex(' ',FullName)+1))-(CharIndex(' ', FullName) )) AS FirstName
INTO NewNames
FROM OldNames
--Check results
select * from OldNames
select * from NewNames
--Clean up samples
drop table OldNames
drop table NewNames
Hope this helps. I didn't have much time so this is an ugly, brute-force approach. If someone has a more polished solution, I hope they will post it.
July 16, 2005 at 10:05 am
the task like this can be (and should be) much easier accomplished in ADO.NET by implementing the following steps:
1. connect to the SQL server
2. fetch all the data from your source table (FullNames in this case) into a dataset
3. process all your strings in a loop by using Regular Expressions:
4. Insert/Update your target table with altered (in this case split) strings
-----------------------------------------------------
here is a C# console that would split your stings into 3 parts according to your spec. The code in the console is Step 3) of the above algorithm. Regular exression that does the trick is just one line of code!
//declare a regex pattern
string pattern = @"(\w+\.?)";
@"(\w+\.?)" - It means : match all alphanumeric characters and a dot if available (dot can be after any part of the name).
using System;
using System.Text.RegularExpressions;
public class Regex_Match
{
public static void Main()
{
//Scan an input string for matches, print them
//declare an input string of text
string InputString = @"'Mr. Jones Eric'";
//declare a regex pattern
string pattern = @"(\w+\.?)";
//declare new regex object with no options set
Regex r = new Regex( pattern, RegexOptions.None);
//declare a match object
Match m;
//match all the words in the input sentence using Match method
m = r.Match(InputString);
//print the matches
while (m.Success)
{
Console.WriteLine(m);
//get the next one to print
m = m.NextMatch();
}
Console.WriteLine("\nPress Enter to Exit");
Console.ReadLine();
}
} //end of the class
/*
returns:
Mr.
Jones
Eric
*/
You can easily modify your Regex in case the pattern of your original data changes . And u would need to change just one line of the code:
string pattern = @"(\w+\.?)";
that's what people call 'the power of RegEx'
Sergei
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply