December 8, 2009 at 1:29 pm
We have a need to break out the first, middle and last names from a full name string. I've got the last and middle name/initial close enough for what we need but can't get the right code to extract the first name. Any help is greatly appreciated!
/* Query to parse the full_name field and extract
Last Name, First Name and MI
nheyen 23-Nov-2009
*/
CREATE TABLE #names (full_name varchar(254))
INSERT INTO #names VALUES ('Simpson, Homer J')
INSERT INTO #names VALUES ('Simpson, Bartholomew JoJo')
INSERT INTO #names VALUES ('Simpson,Maggie')
INSERT INTO #names VALUES ('Bouvier Simpson, Margorie "Marge"')
INSERT INTO #names VALUES ('Simpson, Lisa M.')
INSERT INTO #names VALUES ('Burns,Charles Montgomery "Monty"')
INSERT INTO #names VALUES ('Smithers, Waylon Jr.')
SELECT full_name
,SUBSTRING(full_name,1,(CHARINDEX(',',full_name))-1) AS [Last Name]
,CASE
/* if no period in MI */
WHEN RIGHT(full_name,2) LIKE ' [a-Z]'
THEN LTRIM(SUBSTRING(SUBSTRING(full_name,CHARINDEX(',',full_name)+1,LEN(full_name)),1,CHARINDEX(' ',full_name)-2))
/* if period after MI */
WHEN RIGHT(full_name,1) LIKE '.'
THEN LTRIM(SUBSTRING(SUBSTRING(full_name,CHARINDEX(',',full_name)+1,LEN(full_name)),1,CHARINDEX(' ',full_name)-3))
/* if middle name used instead of MI */
WHEN RIGHT(full_name,2) LIKE '[a-Z][a-Z]'
THEN LTRIM(SUBSTRING(SUBSTRING(full_name,CHARINDEX(',',full_name)+1,LEN(full_name)),1,LEN(full_name)))
/* no middle name or MI */
ELSE LTRIM(SUBSTRING(SUBSTRING(full_name,CHARINDEX(',',full_name)+1,LEN(full_name)),1,LEN(full_name)))
END AS [First Name]
,CASE
WHEN RIGHT(full_name,2) LIKE ' [a-Z]' THEN RIGHT(full_name,1) /* no period after MI */
WHEN RIGHT(full_name,3) IN ('Jr.','Sr.') THEN NULL
WHEN RIGHT(full_name,2) IN ('Jr','Sr') THEN NULL
WHEN RIGHT(full_name,1) = '.' THEN SUBSTRING(full_name,LEN(full_name)-1,1) /* has period after MI */
ELSE NULL /* cannot determine */
END AS [MI]
FROM #names
DROP TABLE #names
December 8, 2009 at 1:49 pm
I should add this is part of a much larger extract to provide data to a third party, so I don't know if I can get by using variables. The speed isn't critical, it is only run once a month at most.
Thanks!
Norman
December 8, 2009 at 2:11 pm
December 8, 2009 at 2:35 pm
Sorry, should have done that - here is what I would like to get
Simpson, Homer JSimpsonHomer J
Simpson, Bartholomew JoJoSimpsonBartholomewNULL
Simpson,MaggieSimpsonMaggieNULL
Bouvier Simpson, Margorie "Marge"Bouvier SimpsonMargorieNULL
Simpson, Lisa M.SimpsonLisa M
Burns,Charles Montgomery "Monty"BurnsCharlesNULL
Smithers, Waylon Jr.SmithersWaylonNULL
Basically if there is no obvious middle initial given, return a NULL instead of trying to figure it out. What I'm trying to get is the first substring after the comma.
Simpson Homer J
simpson Bartholomew
Simpson Maggie
Bouvier Simpson Margorie
Simpson Lisa M
Burns Charles
Smithers Waylon
December 8, 2009 at 3:35 pm
To answer your question, you can get the first substring after the comma by doing something like this:
CASE WHEN CHARINDEX(' ',full_name,CHARINDEX(',',full_name)+1) > 0
THEN LTRIM(SUBSTRING(full_name,CHARINDEX(',',full_name)+1,CHARINDEX(' ',full_name,CHARINDEX(',',full_name)+1)))
ELSE RIGHT(full_name,DATALENGTH(full_name)-CHARINDEX(',',full_name))
END
This uses the third (optional) parameter of CHARINDEX to set a starting point, and uses the , as the starting point (adds one to it so that you don't find the immediate space) and finds the first space after it. If there is no space, you want to use the rest of the name, so you have the case statement.
Because I hadn't done any heavy string manipulation in a while, I took it a few steps further. This works for the sample data... but like most string manipulation, it will fail under certain circumstances. This is just off the top of my head with no real effort to cover situations outside of those presented by the sample data, so it probably won't really work for you, but I had fun writing it, and you're welcome to play with it if you want.
I added the calculated columns for readability (Yes, believe it or not, that IS the easy to read version, without those calculated fields or CTE's (2000 forum), it'd be much, much worse.
CREATE TABLE #names (full_name varchar(254))
INSERT INTO #names VALUES ('Simpson, Homer J')
INSERT INTO #names VALUES ('Simpson, Bartholomew JoJo')
INSERT INTO #names VALUES ('Simpson,Maggie')
INSERT INTO #names VALUES ('Bouvier Simpson, Margorie "Marge"')
INSERT INTO #names VALUES ('Simpson, Lisa M.')
INSERT INTO #names VALUES ('Burns,Charles Montgomery "Monty"')
INSERT INTO #names VALUES ('Smithers, Waylon Jr.')
ALTER TABLE #Names
ADD TL AS DATALENGTH(REPLACE(full_name,', ',',')),-- Length of the original Name
LN AS CHARINDEX(',',full_name)-1,-- Length of the Last Name
TrimLN AS
DATALENGTH(
ISNULL(
REPLACE(
STUFF(full_name,
CHARINDEX('"',full_name), -- First "
CHARINDEX('"',full_name,CHARINDEX('"',full_name)) -- Second "
,'')
,', ',',')
,REPLACE(full_name,', ',','))
),
TrimName AS
ISNULL(
REPLACE(
STUFF(full_name,
CHARINDEX('"',full_name), -- First "
CHARINDEX('"',full_name,CHARINDEX('"',full_name)) -- Second "
,'')
,', ',',')
,REPLACE(full_name,', ',','))
SELECTfull_name,
LEFT(full_name,LN) LastName,
CASEWHEN CHARINDEX(' ',RIGHT(TrimName, (TrimLN-LN)-1),1) > 0
THEN LEFT(RIGHT(TrimName, (TrimLN-LN)-1),CHARINDEX(' ',RIGHT(TrimName, (TrimLN-LN)-1),1))
ELSE RIGHT(TrimName, (TrimLN-LN)-1)
END FirstName,
CASE WHEN CHARINDEX(' ',LTRIM(RTRIM(RIGHT(TrimName, TrimLN-LN-1)))) = 0 THEN NULL
WHEN REPLACE(RIGHT(TrimName, TrimLN - CHARINDEX(' ',TrimName)),'.','') IN ('Jr','Sr') THEN NULL
ELSE RIGHT(TrimName, TrimLN - CHARINDEX(' ',TrimName))
END MiddleName,
CASE WHEN CHARINDEX(' ',LTRIM(RTRIM(RIGHT(TrimName, TrimLN-LN-1)))) = 0 THEN NULL
WHEN REPLACE(RIGHT(TrimName, TrimLN - CHARINDEX(' ',TrimName)),'.','') IN ('Jr','Sr') THEN NULL
ELSE LEFT(RIGHT(TrimName, TrimLN - CHARINDEX(' ',TrimName)),1)
END MiddleInitial,
CASE WHEN CHARINDEX(' ',LTRIM(RTRIM(RIGHT(TrimName, TrimLN-LN-1)))) = 0 THEN NULL
WHEN REPLACE(RIGHT(TrimName, TrimLN - CHARINDEX(' ',TrimName)),'.','') IN ('Jr','Sr') THEN RIGHT(TrimName, TrimLN - CHARINDEX(' ',TrimName))
ELSE NULL
END Suffix,
CASE WHEN CHARINDEX('"',full_name,CHARINDEX('"',full_name)+1) - (CHARINDEX('"',full_name)+1) > 0
THEN
SUBSTRING(full_name,
CHARINDEX('"',full_name)+1, -- First "
CHARINDEX('"',full_name,CHARINDEX('"',full_name)+1) - (CHARINDEX('"',full_name)+1)-- Second "
)
ELSE NULL END NickName
FROM #names
DROP TABLE #names
December 9, 2009 at 10:21 am
Thank you Seth, new functions to learn about. I'll try this when I get back to work tomorrow. (We had a big snow storm and I'm at home today.)
I've been working with this for a week or so and kind of running out of things to try.
Norman
December 10, 2009 at 12:17 am
Divid'n'Conquer...
CREATE TABLE #names (full_name varchar(254))
INSERT INTO #names VALUES ('Simpson, Homer J')
INSERT INTO #names VALUES ('Simpson, Bartholomew JoJo')
INSERT INTO #names VALUES ('Simpson,Maggie')
INSERT INTO #names VALUES ('Bouvier Simpson, Margorie "Marge"')
INSERT INTO #names VALUES ('Simpson, Lisa M.')
INSERT INTO #names VALUES ('Burns,Charles Montgomery "Monty"')
INSERT INTO #names VALUES ('Smithers, Waylon Jr.')
;
WITH
cteStart AS
(
SELECT SUBSTRING(full_name,1,CHARINDEX(',',full_name)-1) AS LastName,
LTRIM(SUBSTRING(full_name,CHARINDEX(',',full_name)+1,254)) AS FirstNameStart
FROM #Names
)
,
cteMiddle AS
(
SELECT LastName,
SUBSTRING(FirstNameStart,1,ISNULL(NULLIF(CHARINDEX(' ',FirstNameStart),0),254)) AS FirstName,
REPLACE(LTRIM(SUBSTRING(FirstNameStart,NULLIF(CHARINDEX(' ',FirstNameStart),0),254)),'.','') AS MiStart
FROM cteStart
)
SELECT LastName,
FirstName,
CASE WHEN LEN(MiStart) = 1 THEN MiStart ELSE NULL END AS MI
FROM cteMiddle
LastNameFirstNameMI
SimpsonHomer J
SimpsonBartholomew NULL
SimpsonMaggieNULL
Bouvier SimpsonMargorie NULL
SimpsonLisa M
BurnsCharles NULL
SmithersWaylon NULL
--Jeff Moden
Change is inevitable... Change for the better is not.
December 10, 2009 at 10:06 am
I recently had a need to accomplish the same task. I cam across this UDF that satisfied all of my criteria. Hope it helps.
December 10, 2009 at 12:33 pm
Thank you Jeff -
When I copy and paste and run this code, I get errors. Is this SQL2000 code? Or does it need SQL2005?
The errors are:
Msg 156, Level 15, State 1, Line 11
Incorrect syntax near the keyword 'WITH'.
Msg 170, Level 15, State 1, Line 17
Line 17: Incorrect syntax near ','.
Any thoughts?
Norman
December 10, 2009 at 12:42 pm
Thank you Seth,
I think this will do just what I need - apologies to those who read my erroneous message about errors.
Thanks again,
Norman
December 10, 2009 at 2:51 pm
n.heyen (12/10/2009)
Thank you Jeff -When I copy and paste and run this code, I get errors. Is this SQL2000 code? Or does it need SQL2005?
The errors are:
Msg 156, Level 15, State 1, Line 11
Incorrect syntax near the keyword 'WITH'.
Msg 170, Level 15, State 1, Line 17
Line 17: Incorrect syntax near ','.
Any thoughts?
Norman
Ahg....my apologies. I forgot I was in a 2k forum. I'll see if I can flip it to 2k code tonight.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 10, 2009 at 3:30 pm
Jeff Moden (12/10/2009)
Ahg....my apologies. I forgot I was in a 2k forum. I'll see if I can flip it to 2k code tonight.
Way to try and cheat Jeff. :hehe:
December 10, 2009 at 6:29 pm
Heh... try hell! I did cheat! I just got caught. :blush:
--Jeff Moden
Change is inevitable... Change for the better is not.
December 10, 2009 at 6:50 pm
Here's a bit of code for SQL Server 2000 that's equivalent to the 2k5 code I "cheated" with.:-D It uses "derived tables" instead of CTE's. Both are sometimes referred to as "inline views".
CREATE TABLE #names (full_name varchar(254))
INSERT INTO #names VALUES ('Simpson, Homer J')
INSERT INTO #names VALUES ('Simpson, Bartholomew JoJo')
INSERT INTO #names VALUES ('Simpson,Maggie')
INSERT INTO #names VALUES ('Bouvier Simpson, Margorie "Marge"')
INSERT INTO #names VALUES ('Simpson, Lisa M.')
INSERT INTO #names VALUES ('Burns,Charles Montgomery "Monty"')
INSERT INTO #names VALUES ('Smithers, Waylon Jr.')
SELECT LastName,
FirstName,
CASE WHEN LEN(MiStart) = 1 THEN MiStart ELSE NULL END AS MI
FROM (
SELECT LastName,
SUBSTRING(FirstNameStart,1,ISNULL(NULLIF(CHARINDEX(' ',FirstNameStart),0),254)) AS FirstName,
REPLACE(LTRIM(SUBSTRING(FirstNameStart,NULLIF(CHARINDEX(' ',FirstNameStart),0),254)),'.','') AS MiStart
FROM (
SELECT SUBSTRING(full_name,1,CHARINDEX(',',full_name)-1) AS LastName,
LTRIM(SUBSTRING(full_name,CHARINDEX(',',full_name)+1,254)) AS FirstNameStart
FROM #Names
)dtStart
)dtMiddle
As a sidebar for those that need to make such a "backwards" conversion, I didn't have to change any of the SELECT's, really... I just had to rearrange them from being CTE's to being "derived tables". Of course, that won't work if you made a recursive CTE. And, if a CTE is used more than once, you'll have to make the same derived table more than once.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 10, 2009 at 10:25 pm
Jeff Moden (12/10/2009)
Heh... try hell! I did cheat! I just got caught. :blush:
I mean, if you were gonna cheat, you could have at least done nickname as well. Poor Monty.:-D
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply