First Two Words

  • Hi All,

    I want to extract first two words from the string. Example

    ABC Property Management Inc.

    I need to retrieve ABC Property

     

    Thanks

     

     

  • I would like to send you on an exploration mission here.

    Firstly take a look at the documentation for CHARINDEX and SUBSTRING

    https://learn.microsoft.com/en-us/sql/t-sql/functions/charindex-transact-sql?view=sql-server-ver16

    https://learn.microsoft.com/en-us/sql/t-sql/functions/substring-transact-sql?view=sql-server-ver16

    Using a combination of these functions you can easily get the CHARINDEX of the 2nd space in the string.  And then you can feed that CHARINDEX value into SUBSTRING to pull everything to the left of that 2nd space.

    If you have issues on doing this, please let us know.

  • I want to extract first two words from the string. Example

    ABC Property Management Inc.

    I need to retrieve ABC Property

    Even this case also

    Brinker Inc

    I need to retrieve Brinker Inc

  • Please show at least some working of the problem you have tried.

    You could easily wrap the CHARINDEX/SUBSTRING combination in a CASE statement and where there is no second space return the string itself.

    Or you could use something like a tally splitter and then use a CROSSTAB method to pull the string as you need it.

    There are a number of options here, but simply giving the answer doesn't improve in ones learning, so if you can show some understanding or at least attempt a solution yourself and post back what you're stuck on it will go a long way to show you are not here to just copy and paste code.

  • SUBSTRING(comp, 0, CHARINDEX(' ', comp, CHARINDEX(' ', comp, 0)+1))

    This query not returning exactly how I was lookin for it

  • Think of it this way, you're looking for the second space ' '. That's how you solve this.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • sathishkm wrote:

    SUBSTRING(comp, 0, CHARINDEX(' ', comp, CHARINDEX(' ', comp, 0)+1))

    This query not returning exactly how I was lookin for it

     

    OK so you got a way to get to the 2nd space, excellent.

    The issue with "Brinker Inc" is there is no second space, so the CHARINDEX result is 0, so there is nothing for SUBSTRING to return.

    So there you could wrap the query in a case statement

    case when (CharIndex(' ', comp, CharIndex(' ', comp) + 1)) = 0 THEN comp ELSE SubString(comp, 1, CharIndex(' ', comp, CharIndex(' ', comp) + 1)) end

    Or you could use NULLIF and ISNULL

    ISNULL(NULLIF(SubString(comp, 1, CharIndex(' ', comp, CharIndex(' ', comp) + 1)),''),comp)

    Or you could split using a tally splitter (DelimitedSplit - https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-%e2%80%9ccsv-splitter%e2%80%9d-function), and then use a CROSSTAB method to concatenate the values, or maybe even STRING_AGG

    SELECT max(case when itemnumber = 1 then item end)+' '+max(case when ItemNumber = 2 then item end), string_agg(item, ' ')
    FROM
    (
    SELECT * FROM dbo.DelimitedSplit8K(comp,' ')
    ) AS Split
    WHERE ItemNumber <=2
  • declare @string varchar(50) = 'ABC Property Management Inc.'

    SELECT string_agg(x.value,' ') FROM (
    select top 2 * from string_split(@string,' ') y)x
  • Set up some test data:

    DROP TABLE IF EXISTS #temp
    GO
    SELECT *
    INTO #temp
    FROM (VALUES ('ABC Property Management Inc.'),
    ('ABC Property'),
    ('ABC'),
    ('')
    ) T(Col1)
    ;

    Query:

    SELECT LEFT(Col1 + '  ', T1.FirstSpace - 1) FirstWord, 
    SUBSTRING(t.Col1 + ' ', T1.FirstSpace+1, T2.SecondSpace - T1.FirstSpace - 1) SecondWord
    FROM #temp t
    CROSS APPLY (VALUES (charindex(' ', t.Col1 + ' '))) T1(FirstSpace)
    CROSS APPLY (VALUES (charindex(' ', t.Col1 + ' ', T1.FirstSpace+1))) T2(SecondSpace)

    As T1.FirstSpace appears more than once in the query it is a good idea to use cross apply to keep the code simple to avoid repeating code.

    This is ChatGPT's answer, which to be fair is quite acceptable:

    SELECT CASE WHEN CHARINDEX(' ', Col1) > 0 THEN LEFT(Col1, CHARINDEX(' ', Col1) - 1)
    ELSE Col1
    END AS FirstWord,
    CASE WHEN CHARINDEX(' ', Col1) > 0
    THEN SUBSTRING(Col1, CHARINDEX(' ', Col1) + 1, CHARINDEX(' ', Col1 + ' ', CHARINDEX(' ', Col1) + 1) - CHARINDEX(' ', Col1) - 1)
    ELSE NULL
    END AS SecondWord
    FROM #temp
  • Imo davidandrews13 has the right idea.  In SQL 2022

    select string_agg(sv.[value], ' ') within group (order by sv.ordinal) first2_in_order
    from #temp t
    cross apply (select top(2) ss.[value], ss.ordinal
    from string_split(t.Col1, ' ', 1) ss) sv([value], ordinal)
    group by Col1;

     

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    Imo davidandrews13 has the right idea.  In SQL 2022

    select string_agg(sv.[value], ' ') within group (order by sv.ordinal) first2_in_order
    from #temp t
    cross apply (select top(2) ss.[value], ss.ordinal
    from string_split(t.Col1, ' ', 1) ss) sv([value], ordinal)
    group by Col1;

    I don't think calling complicated string_split and string_agg functions will be more efficient than calling simple string functions.

  • Jonathan AC Roberts wrote:

    I don't think calling complicated string_split and string_agg functions will be more efficient than calling simple string functions.

    Maybe so.  Can your chatbot make the case for it?  Splitting and re-aggregation means no nested case logic.  Without the ordinal split and the addition of WITHIN GROUP ORDER BY the earlier query would be incorrect imo

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    Jonathan AC Roberts wrote:

    I don't think calling complicated string_split and string_agg functions will be more efficient than calling simple string functions.

    Maybe so.  Can your chatbot make the case for it?  Splitting and re-aggregation means no nested case logic.  Without the ordinal split and the addition of WITHIN GROUP ORDER BY the earlier query would be incorrect imo

    LOL - I wouldn't trust anything ChatGPT says without testing it first. I can set up a test for you to run, I don't have SQL 2022 installed so can't get the ordinal part of the string_split function, so I cannot test it:

    Here is the data setup:

    drop table if exists #temp
    go
    -- Create some test data (1 million rows)
    with cte as
    (
    select tly.n , x.Word
    from dbo.fnTally(1,1000000) tly
    cross apply (values ((ABS(CHECKSUM(NewId())) % 20) + 1)) t(RandomInt)
    cross apply (values ((ABS(CHECKSUM(NewId())) % 6) + 4)) t2(RandomInt)
    cross apply (select LEFT(NEWID(), t2.RandomInt) from dbo.fnTally(1,t.RandomInt)) x(Word)
    )
    select STRING_AGG(Word,' ') Col1
    into #temp
    from cte
    group by n
    ;

    Here are the queries to test:

    set statistics io, time on
    go
    drop table if exists #1
    print 'Start ***************************************************** CROSS APPLY'
    SELECT LEFT(Col1 + ' ', T1.FirstSpace - 1) FirstWord,
    SUBSTRING(t.Col1 + ' ', T1.FirstSpace+1, T2.SecondSpace - T1.FirstSpace - 1) SecondWord
    into #1
    FROM #temp t
    CROSS APPLY (VALUES (charindex(' ', t.Col1 + ' '))) T1(FirstSpace)
    CROSS APPLY (VALUES (charindex(' ', t.Col1 + ' ', T1.FirstSpace+1))) T2(SecondSpace)
    ;
    print 'End ***************************************************** CROSS APPLY'
    go
    drop table if exists #2
    print 'Start ***************************************************** CASE'
    SELECT CASE WHEN CHARINDEX(' ', Col1) > 0 THEN LEFT(Col1, CHARINDEX(' ', Col1) - 1)
    ELSE Col1
    END AS FirstWord,
    CASE WHEN CHARINDEX(' ', Col1) > 0
    THEN SUBSTRING(Col1, CHARINDEX(' ', Col1) + 1, CHARINDEX(' ', Col1 + ' ', CHARINDEX(' ', Col1) + 1) - CHARINDEX(' ', Col1) - 1)
    ELSE NULL
    END AS SecondWord
    into #2
    FROM #temp
    print 'End ***************************************************** CASE'
    ;
    go
    drop table if exists #3
    print 'Start ***************************************************** string_agg, string_split'
    select string_agg(sv.[value], ' ') within group (order by sv.ordinal) first2_in_order
    into #3
    from #temp t
    cross apply (select top(2) ss.[value], ss.ordinal
    from string_split(t.Col1, ' ', 1) ss) sv([value], ordinal)
    group by Col1;
    print 'End ***************************************************** string_agg, string_split'

    You need dbo.fnTally.

    Please let me know the results.

  • Already had dbo.fnTally.  Here are the results which show 7 or 8 seconds for CHARINDEX and 33+ seconds for STRING_SPLIT.  This is my tiny Azure SQL compatibility level 150 test instance with 50 DTU's.  O well I know which approach I would still use if I was a hurry to get the correct answer.  If ChatGPT set that up and it was quick and easy that's amazing.  Before it writes queries it would be nice if it did boring and boilerplate work

    SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    Start ***************************************************** CROSS APPLY

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    Table '#temp_______________________________________________________________________________________________________________000000012384'. Scan count 1, logical reads 11282, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 109 ms, elapsed time = 6926 ms.

    (1000000 row(s) affected)

    End ***************************************************** CROSS APPLY

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    Start ***************************************************** CASE

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    Table '#temp_______________________________________________________________________________________________________________000000012384'. Scan count 1, logical reads 11282, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 47 ms, elapsed time = 8196 ms.

    (1000000 row(s) affected)

    End ***************************************************** CASE

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 15 ms, elapsed time = 1245 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    Start ***************************************************** string_agg, string_split

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table '#temp_______________________________________________________________________________________________________________000000012384'. Scan count 1, logical reads 11282, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 33825 ms.

    (999445 row(s) affected)

    End ***************************************************** string_agg, string_split

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Heh... ChatGPT... when it comes to SQL, it's amazingly "average" because it's drawing on the "average"... which is usually amazingly low bar compared to what most of the denizens of this forum know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply