June 19, 2008 at 10:28 am
Hello all,
I'm trying to split a text (e.g. "MR. JOHN SMITH" or "MRS. JANE AUSTIN SMITH") into chunks in order to be able to extract certain blocks: in this case I'd like to extract "JOHN" and "JANE" as first names and "SMITH" and "AUSTIN SMITH" as last names.
Has anybody got an idea or hint how to achieve this?
Many thanks...
June 19, 2008 at 11:19 am
A quick way is to use the INSTR, MID, LEFT, RIGHT functions.
If your column is NAME,
left(NAME,instr(NAME," ")-1) will give you the first part
mid(NAME,instr(NAME," ")+1,50) will give you the start of the next part of the string. Set this in a local variable then use the LEFT function of the variable to get the next part, and so on. The 50 is just a number greater than the length of the original string to make sure we get it all. It will only return up to the end of the original. I'm sure there is a neater way to do it but I started as a TB programmer...old habits die hard.
June 19, 2008 at 11:27 am
INSTR and MID are not T-SQL functions so that won't work. Search SSC for 'split function. You'll find several viable options.
June 19, 2008 at 11:57 am
What are you planning on doing when you come across a last name with a space in it or a person with two middle initials?
June 19, 2008 at 12:25 pm
Hello all,
thanks for your prompt answers! As John remarks, I can't use MID LEFT and INSTR functions because they are not T-SQL...
As for Michael's question: I really need only to extract the first name (which is always the first block after the salutation), all following blocks can be regarded as last name(s) - also multiple first names can be neglected. Basically, I need this:
"Mr. John Smith" -> first_name = "John", last_name = "Smith"
"Mr. John Cooper Smith" -> first_name = "John", last_name = "Cooper Smith"
"Mr. John Cooper Smith Jones" -> first_name = "John", last_name = "Cooper Smith Jones"
"Mr. John William Cooper Smith Jones" -> first_name = "John", last_name = "William Cooper Smith Jones" (although it might not make sense semantically)
Thanks....
June 19, 2008 at 12:51 pm
Try something like this:
declare @T varchar(100)
select @t = 'Mr. John Smith Jones'
select substring(left(@t, charindex(' ', @t, charindex(' ', @t) + 1)),
charindex(' ', @t), charindex(' ', @t, charindex(' ', @t) + 1)),
right(@t, len(@t) - charindex(' ', @t, charindex(' ', @t) + 1))
You can, of course, use a column name instead of a variable.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 19, 2008 at 12:56 pm
Hello GSquared,
thanks a ton for the solution! It works perfectly - exactly what I need...
Cheers,
Marin
June 19, 2008 at 1:28 pm
Ummm... what you really need is a "split" function... and no While loop... please read the following... just replace the comma delimiter with a space when you get to the "split" part of the article...
http://www.sqlservercentral.com/articles/TSQL/62867/
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2008 at 1:49 pm
Jeff, he's not looking to parse out the whole string. Just grab the second word, and everything else.
String parser was my first thought too, but then I read it in a bit more detail, and it doesn't appear to be what's actually wanted.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 19, 2008 at 2:21 pm
GSquared (6/19/2008)
Jeff, he's not looking to parse out the whole string. Just grab the second word, and everything else.String parser was my first thought too, but then I read it in a bit more detail, and it doesn't appear to be what's actually wanted.
Understood, Gus... he'll change what he wants just as soon as he finds out his names don't all follow the same format... 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2008 at 5:53 pm
marin (6/19/2008)
Hello all,I'm trying to split a text (e.g. "MR. JOHN SMITH" or "MRS. JANE AUSTIN SMITH") into chunks in order to be able to extract certain blocks: in this case I'd like to extract "JOHN" and "JANE" as first names and "SMITH" and "AUSTIN SMITH" as last names.
Has anybody got an idea or hint how to achieve this?
Many thanks...
Hey, I just did this!
Here's a cool function that does the job:
ALTER function [dbo].[fnSplit1](
@parameter varchar(Max) -- the string to split
, @Seperator Varchar(64) )
RETURNS @Items TABLE(
ID INT -- the element number
, item VARCHAR(8000) -- the split-out string element
, OffSet int -- the original offest
--( not entirley accurate if LEN(@Seperator) > 1 because of the Replace() ))
AS
BEGIN
/*
"Monster" Split in SQL Server 2005
From Jeff Moden, 2008/05/22
BYoung, 2008/06/18: Modified to be a Table-Valued Function
And to handle CL/LF or LF-only line breaks
(Note: make it inline later, to make it faster)
Test: (scripts all triggers in your database)
Select Lines.Item
From sys.sql_modules M
Join sys.objects O on O.object_id = M.object_id
cross apply dbo.fnSplit1(M.definition, char(13)+char(10)) Lines
Where O.Type = 'TR'
Order by O.create_date, Lines.ID
*/
Declare @Sep char(1)
Set @Sep = char(10) --our seperator character
--NOTE: we make the @Sep character LF so that we will automatically
-- parse out rogue LF-only line breaks.
--===== Add start and end seprators to the Parameter so we can handle
-- all the elements the same way
-- Also change the seperator expressions to our seperator
-- character to keep all offsets = 1
SET @Parameter = @Sep+ Replace(@Parameter,@Seperator,@Sep) +@Sep
;WITH cteTally AS
(--==== Create a Tally CTE from 1 to whatever the length
-- of the parameter is
SELECT TOP (LEN(@Parameter))
ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2
)
INSERT into @Items
SELECT ROW_NUMBER() OVER (ORDER BY N) AS Number,
SUBSTRING(@Parameter, N+1, CHARINDEX(@Sep, @Parameter, N+1)-N-1) AS Value
, N+1
FROM cteTally
WHERE N < LEN(@Parameter)
AND SUBSTRING(@Parameter, N, 1) = @Sep --Notice how we find the seperator
Return
END
Here's how to use it with your examples:
Create table People(
ID int Identity,
FullName varchar(32))
GO
Insert into People(FullName)
Select 'MR. JOHN SMITH'
Union All Select 'MRS. JANE AUSTIN SMITH'
Select *
FROM People
Cross Apply dbo.fnSplit1(FullName, ' ')
Enjoy!
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
June 20, 2008 at 9:34 am
You are so correct...I had worked out a solution in ACCESS by using the INSTR and MID functions, as well as column aliases to be able to work the data down a stream finding each part. I learned....INSTR becomes CHARINDEX and MID becomes SUBSTRING in SQL2005. However, is there any SQL2005 equivalent to using aliases in ACCESS that will let me store a temporary change to a cell's data then rechange it into a new alias to be changed again in yet another alias in a domino effect. Another posting on this topic showed using the CHARINDEX and SUBSTRING functions to get the first and item, then the rest of the line. However, if we can set up something line the ACCESS aliases, we can get any part of the line we want.
For example, with a column "NAME", an ACCESS query might look like:
SELECT NAME AS A, LEFT(A,INSTR(A,' ')-1) AS B,MID(A,INSTR(A,' ')+1,50) AS C,LEFT(C,INSTR(C,' ')-1) AS D…MID(C,INSTR(C,' ')+1,50) AS E,F,G…
where each subsequent alias is built off the prior alias. 'A' becomes the whole line. 'B' becomes the first space-delimited item. 'C' becomes the rest of the line after the first item. 'D' becomes the next space-delimited item. 'E' then becomes the rest of the line after the second space-delimited item, and so on for F,G,H,I....to get each subsequent space-delimited item. However, SQL doesn't like the aliases used for anything but the column header in reports, and the ordering of data, etc. Any suggestions anybody? (I know there can be VB and other more detailed/complicated solutions, (TB can do it in two lines) ...I just like "keep-it-simple".)
June 20, 2008 at 9:43 am
You can get there using a variety of methods. The first one that comes to mind would be to go read up on Common Table Expressions, which kind of allow you to do such things. It's not going to save you any on writing, but it is going to allow you to "centralize" certain calcs, so you can do them once, and reuse them in multiple places.
There are lots of Common Table Expressions (or CTE's) floating about on here. Take a look, or head for Books Online - it has some decent examples as well....
Otherwise, in update queries, you can carry variables around, and reuse them in multiple places in your statements (which would allow you once again to calc something once, and reuse it a little later on in your updates).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 20, 2008 at 12:19 pm
For a table called "ADDRESSES" with a column "NAME", the following will parse out the space-delimited NAME (the entire column) to its separate pieces into a temp table '#N2' (a perm table can also be created/used). From the temp table , you can do anything you want. I use column names of A,B,C... for typing ease. The lines updating columns B and C can be copied for however many columns you want to parse into. I go up to 'G'. Column 'B' is a temp holder that will eventually be empty. Column 'C' is the 1st piece, 'D' is 2nd, 'E' is 3rd.... Column 'A' holds onto the original data for compare purposes. If you only need the first three pieces, you only need to include up to column 'E' below. Make sure your table definition includes enough columns.
If you need to, create another column in the temp table, 'AA' for example, and in the 'SELECT' line below, select the original table primary key column into 'AA' so as to be able to reference directly between the two tables.
Also, LTRIM can be added in each step that re-figures 'B' incase there are multiple spaces between the pieces.
DROP TABLE #N2 ----->> for testing so I can rerun/f5 as much as I want comment out the first time
CREATE TABLE #N2 (A CHAR(50),B CHAR(50), C CHAR(50), D CHAR(50), E CHAR(50), F CHAR(50), G CHAR(50), H CHAR(50))
INSERT INTO #N2
SELECT NAME AS A,1 AS B,2 AS C,3 AS D,4 AS E,5 AS F,6 AS G,7 AS H FROM ADDRESSES
UPDATE #N2 SET B=A+' '
UPDATE #N2 SET C=left(B,CHARINDEX(' ',B)-1)
UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)
UPDATE #N2 SET D=left(B,CHARINDEX(' ',B)-1)
UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)
UPDATE #N2 SET E=left(B,CHARINDEX(' ',B)-1)
UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)
UPDATE #N2 SET F=left(B,CHARINDEX(' ',B)-1)
UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)
UPDATE #N2 SET G=left(B,CHARINDEX(' ',B)-1)
UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)
UPDATE #N2 SET H=left(B,CHARINDEX(' ',B)-1)
SELECT * FROM #N2;
June 20, 2008 at 12:44 pm
Alternately, you can do a standard string parse and process a whole table of names all at once.
SUBSTRING(@String_in+@Delimiter_in, number,
CHARINDEX(@Delimiter_in, @String_in+@Delimiter_in, number) - number) as Parsed,
row_number() over (order by number) as Row
FROM dbo.numbers
WHERE number <= LEN(@String_in)
AND SUBSTRING(@Delimiter_in + @String_in, number, 1) = @Delimiter_in
ORDER BY number
This assumes you have a numbers table.
For more details, see: http://www.sqlservercentral.com/articles/TSQL/62867/
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply