Split text by empty spaces

  • Hello all,

    I'm trying to split a text (e.g. "MR. JOHN SMITH" or "MRS. JANE AUSTIN SMITH") into chunks in order to be able to extract certain blocks: in this case I'd like to extract "JOHN" and "JANE" as first names and "SMITH" and "AUSTIN SMITH" as last names.

    Has anybody got an idea or hint how to achieve this?

    Many thanks...

  • A quick way is to use the INSTR, MID, LEFT, RIGHT functions.

    If your column is NAME,

    left(NAME,instr(NAME," ")-1) will give you the first part

    mid(NAME,instr(NAME," ")+1,50) will give you the start of the next part of the string. Set this in a local variable then use the LEFT function of the variable to get the next part, and so on. The 50 is just a number greater than the length of the original string to make sure we get it all. It will only return up to the end of the original. I'm sure there is a neater way to do it but I started as a TB programmer...old habits die hard.

  • INSTR and MID are not T-SQL functions so that won't work. Search SSC for 'split function. You'll find several viable options.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • What are you planning on doing when you come across a last name with a space in it or a person with two middle initials?

  • Hello all,

    thanks for your prompt answers! As John remarks, I can't use MID LEFT and INSTR functions because they are not T-SQL...

    As for Michael's question: I really need only to extract the first name (which is always the first block after the salutation), all following blocks can be regarded as last name(s) - also multiple first names can be neglected. Basically, I need this:

    "Mr. John Smith" -> first_name = "John", last_name = "Smith"

    "Mr. John Cooper Smith" -> first_name = "John", last_name = "Cooper Smith"

    "Mr. John Cooper Smith Jones" -> first_name = "John", last_name = "Cooper Smith Jones"

    "Mr. John William Cooper Smith Jones" -> first_name = "John", last_name = "William Cooper Smith Jones" (although it might not make sense semantically)

    Thanks....

  • Try something like this:

    declare @T varchar(100)

    select @t = 'Mr. John Smith Jones'

    select substring(left(@t, charindex(' ', @t, charindex(' ', @t) + 1)),

    charindex(' ', @t), charindex(' ', @t, charindex(' ', @t) + 1)),

    right(@t, len(@t) - charindex(' ', @t, charindex(' ', @t) + 1))

    You can, of course, use a column name instead of a variable.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hello GSquared,

    thanks a ton for the solution! It works perfectly - exactly what I need...

    Cheers,

    Marin

  • Ummm... what you really need is a "split" function... and no While loop... please read the following... just replace the comma delimiter with a space when you get to the "split" part of the article...

    http://www.sqlservercentral.com/articles/TSQL/62867/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, he's not looking to parse out the whole string. Just grab the second word, and everything else.

    String parser was my first thought too, but then I read it in a bit more detail, and it doesn't appear to be what's actually wanted.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (6/19/2008)


    Jeff, he's not looking to parse out the whole string. Just grab the second word, and everything else.

    String parser was my first thought too, but then I read it in a bit more detail, and it doesn't appear to be what's actually wanted.

    Understood, Gus... he'll change what he wants just as soon as he finds out his names don't all follow the same format... 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • marin (6/19/2008)


    Hello all,

    I'm trying to split a text (e.g. "MR. JOHN SMITH" or "MRS. JANE AUSTIN SMITH") into chunks in order to be able to extract certain blocks: in this case I'd like to extract "JOHN" and "JANE" as first names and "SMITH" and "AUSTIN SMITH" as last names.

    Has anybody got an idea or hint how to achieve this?

    Many thanks...

    Hey, I just did this!

    Here's a cool function that does the job:

    ALTER function [dbo].[fnSplit1](

    @parameter varchar(Max) -- the string to split

    , @Seperator Varchar(64) )

    RETURNS @Items TABLE(

    ID INT -- the element number

    , item VARCHAR(8000) -- the split-out string element

    , OffSet int -- the original offest

    --( not entirley accurate if LEN(@Seperator) > 1 because of the Replace() ))

    AS

    BEGIN

    /*

    "Monster" Split in SQL Server 2005

    From Jeff Moden, 2008/05/22

    BYoung, 2008/06/18: Modified to be a Table-Valued Function

    And to handle CL/LF or LF-only line breaks

    (Note: make it inline later, to make it faster)

    Test: (scripts all triggers in your database)

    Select Lines.Item

    From sys.sql_modules M

    Join sys.objects O on O.object_id = M.object_id

    cross apply dbo.fnSplit1(M.definition, char(13)+char(10)) Lines

    Where O.Type = 'TR'

    Order by O.create_date, Lines.ID

    */

    Declare @Sep char(1)

    Set @Sep = char(10) --our seperator character

    --NOTE: we make the @Sep character LF so that we will automatically

    -- parse out rogue LF-only line breaks.

    --===== Add start and end seprators to the Parameter so we can handle

    -- all the elements the same way

    -- Also change the seperator expressions to our seperator

    -- character to keep all offsets = 1

    SET @Parameter = @Sep+ Replace(@Parameter,@Seperator,@Sep) +@Sep

    ;WITH cteTally AS

    (--==== Create a Tally CTE from 1 to whatever the length

    -- of the parameter is

    SELECT TOP (LEN(@Parameter))

    ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N

    FROM Master.sys.All_Columns t1

    CROSS JOIN Master.sys.All_Columns t2

    )

    INSERT into @Items

    SELECT ROW_NUMBER() OVER (ORDER BY N) AS Number,

    SUBSTRING(@Parameter, N+1, CHARINDEX(@Sep, @Parameter, N+1)-N-1) AS Value

    , N+1

    FROM cteTally

    WHERE N < LEN(@Parameter)

    AND SUBSTRING(@Parameter, N, 1) = @Sep --Notice how we find the seperator

    Return

    END

    Here's how to use it with your examples:

    Create table People(

    ID int Identity,

    FullName varchar(32))

    GO

    Insert into People(FullName)

    Select 'MR. JOHN SMITH'

    Union All Select 'MRS. JANE AUSTIN SMITH'

    Select *

    FROM People

    Cross Apply dbo.fnSplit1(FullName, ' ')

    Enjoy!

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • You are so correct...I had worked out a solution in ACCESS by using the INSTR and MID functions, as well as column aliases to be able to work the data down a stream finding each part. I learned....INSTR becomes CHARINDEX and MID becomes SUBSTRING in SQL2005. However, is there any SQL2005 equivalent to using aliases in ACCESS that will let me store a temporary change to a cell's data then rechange it into a new alias to be changed again in yet another alias in a domino effect. Another posting on this topic showed using the CHARINDEX and SUBSTRING functions to get the first and item, then the rest of the line. However, if we can set up something line the ACCESS aliases, we can get any part of the line we want.

    For example, with a column "NAME", an ACCESS query might look like:

    SELECT NAME AS A, LEFT(A,INSTR(A,' ')-1) AS B,MID(A,INSTR(A,' ')+1,50) AS C,LEFT(C,INSTR(C,' ')-1) AS D…MID(C,INSTR(C,' ')+1,50) AS E,F,G…

    where each subsequent alias is built off the prior alias. 'A' becomes the whole line. 'B' becomes the first space-delimited item. 'C' becomes the rest of the line after the first item. 'D' becomes the next space-delimited item. 'E' then becomes the rest of the line after the second space-delimited item, and so on for F,G,H,I....to get each subsequent space-delimited item. However, SQL doesn't like the aliases used for anything but the column header in reports, and the ordering of data, etc. Any suggestions anybody? (I know there can be VB and other more detailed/complicated solutions, (TB can do it in two lines) ...I just like "keep-it-simple".)

  • You can get there using a variety of methods. The first one that comes to mind would be to go read up on Common Table Expressions, which kind of allow you to do such things. It's not going to save you any on writing, but it is going to allow you to "centralize" certain calcs, so you can do them once, and reuse them in multiple places.

    There are lots of Common Table Expressions (or CTE's) floating about on here. Take a look, or head for Books Online - it has some decent examples as well....

    Otherwise, in update queries, you can carry variables around, and reuse them in multiple places in your statements (which would allow you once again to calc something once, and reuse it a little later on in your updates).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • For a table called "ADDRESSES" with a column "NAME", the following will parse out the space-delimited NAME (the entire column) to its separate pieces into a temp table '#N2' (a perm table can also be created/used). From the temp table , you can do anything you want. I use column names of A,B,C... for typing ease. The lines updating columns B and C can be copied for however many columns you want to parse into. I go up to 'G'. Column 'B' is a temp holder that will eventually be empty. Column 'C' is the 1st piece, 'D' is 2nd, 'E' is 3rd.... Column 'A' holds onto the original data for compare purposes. If you only need the first three pieces, you only need to include up to column 'E' below. Make sure your table definition includes enough columns.

    If you need to, create another column in the temp table, 'AA' for example, and in the 'SELECT' line below, select the original table primary key column into 'AA' so as to be able to reference directly between the two tables.

    Also, LTRIM can be added in each step that re-figures 'B' incase there are multiple spaces between the pieces.

    DROP TABLE #N2 ----->> for testing so I can rerun/f5 as much as I want comment out the first time

    CREATE TABLE #N2 (A CHAR(50),B CHAR(50), C CHAR(50), D CHAR(50), E CHAR(50), F CHAR(50), G CHAR(50), H CHAR(50))

    INSERT INTO #N2

    SELECT NAME AS A,1 AS B,2 AS C,3 AS D,4 AS E,5 AS F,6 AS G,7 AS H FROM ADDRESSES

    UPDATE #N2 SET B=A+' '

    UPDATE #N2 SET C=left(B,CHARINDEX(' ',B)-1)

    UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)

    UPDATE #N2 SET D=left(B,CHARINDEX(' ',B)-1)

    UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)

    UPDATE #N2 SET E=left(B,CHARINDEX(' ',B)-1)

    UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)

    UPDATE #N2 SET F=left(B,CHARINDEX(' ',B)-1)

    UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)

    UPDATE #N2 SET G=left(B,CHARINDEX(' ',B)-1)

    UPDATE #N2 SET B=SUBSTRING(B,CHARINDEX(' ',B)+1,50)

    UPDATE #N2 SET H=left(B,CHARINDEX(' ',B)-1)

    SELECT * FROM #N2;

  • Alternately, you can do a standard string parse and process a whole table of names all at once.

    SUBSTRING(@String_in+@Delimiter_in, number,

    CHARINDEX(@Delimiter_in, @String_in+@Delimiter_in, number) - number) as Parsed,

    row_number() over (order by number) as Row

    FROM dbo.numbers

    WHERE number <= LEN(@String_in)

    AND SUBSTRING(@Delimiter_in + @String_in, number, 1) = @Delimiter_in

    ORDER BY number

    This assumes you have a numbers table.

    For more details, see: http://www.sqlservercentral.com/articles/TSQL/62867/

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply