Split String On Delimiter

  • I've a string as follows:

    45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc

    I am expecting to get it as follows:

    String1 - 45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg

    String2 - TBL101 | PC | 1.00 | COMP101 | CS | 1.00......... etc

    So what I am trying to do is to split the string into two string on 7th delimiter and tried the following but it works only for string1:

    DECLARE @STR VARCHAR(1000),

    @str1 VARCHAR(1000),

    @str2 VARCHAR(1000)

    SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'

    /****I was actually following a tutorial on doing this but stuck on this statements. Would expect some explanation on this - Starts****/

    --Creating a number table

    ;WITH numcte AS(

    SELECT 1 AS rn UNION ALL SELECT rn + 1 FROM numcte WHERE rn <= LEN(@str)),

    --Get the position of the "|" charecters

    GetDelimitedCharPos AS(

    SELECT ROW_NUMBER() OVER(ORDER BY getdate()) seq, rn, delimitedcharpos

    FROM numcte

    CROSS APPLY(SELECT SUBSTRING(@str, rn, 1)delimitedcharpos) m WHERE delimitedcharpos = '|')

    /****I was actually following a tutorial on doing this but stuck on this statements. Would expect some explanation on this - Ends****/

    --Applying the formula SUBSTRING(@str, startseq + 1, endseq-startseq + 1)

    SELECT TOP 1

    (CASE WHEN SUBSTRING(

    @STR

    ,(SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 6)

    ,(SELECT TOP 1 rn FROM GetDelimitedCharPos WHERE seq = 7 ) -

    (SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 6)

    ) = SUBSTRING(

    @STR

    ,(SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 6)

    ,(SELECT TOP 1 rn FROM GetDelimitedCharPos WHERE seq = 7 ) -

    (SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 6)

    ) THEN SUBSTRING(

    @STR

    ,(SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 1)

    ,(SELECT TOP 1 rn FROM GetDelimitedCharPos WHERE seq = 7 ) -

    (SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 1)

    ) ELSE '' END) AS String1

    FROM GetDelimitedCharPos

    I would expect some ideas from the experts to implement it perfectly. Thanks.

  • Here's a simple option using the DelimitedSplit8k that can be found here along with its explanation: http://www.sqlservercentral.com/articles/Tally+Table/72993/

    A specific function could be made to split only on the 7th delimiter and avoid having to concatenate the strings again.

    DECLARE @STR VARCHAR(1000),

    @str1 VARCHAR(1000),

    @str2 VARCHAR(1000)

    SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'

    SELECT s.Item, s.ItemNumber, (s.ItemNumber - 1) / 7 AS ItemGroup

    INTO #Splitted

    FROM dbo.DelimitedSplit8K( @STR, '|')s;

    SELECT STUFF(( SELECT '| ' + i.Item

    FROM #Splitted i

    WHERE i.ItemGroup = s.ItemGroup

    ORDER BY i.ItemNumber

    FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '')

    FROM #Splitted s

    GROUP BY ItemGroup;

    DROP TABLE #Splitted;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis is a master at showing set based solutions and if you plan on using SQL for any length of time, I would understand why those solutions are preferable over something like this....

    DECLARE @STR VARCHAR(1000),

    @str1 VARCHAR(1000),

    @str2 VARCHAR(1000),

    @pos INT,

    @counter INT

    SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'

    SET @counter=0

    SET @pos=0

    WHILE @counter <=7

    BEGIN

    SET @pos=CHARINDEX('|',@str,@pos+1)

    set @counter=@counter+1

    END

    SET @str1=SUBSTRING(@str,1,@pos)

    SET @str2=SUBSTRING(@str,@pos+1,LEN(@str)-@pos)

    Print 'str1='+ @str1

    Print 'str2='+ @str2

    The solution I posted is the epitome of a RBAR type solution.....(looping through data to get to the final solution)

  • Luis Cazares (10/14/2016)


    Here's a simple option using the DelimitedSplit8k that can be found here along with its explanation: http://www.sqlservercentral.com/articles/Tally+Table/72993/

    A specific function could be made to split only on the 7th delimiter and avoid having to concatenate the strings again.

    DECLARE @STR VARCHAR(1000),

    @str1 VARCHAR(1000),

    @str2 VARCHAR(1000)

    SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'

    SELECT s.Item, s.ItemNumber, (s.ItemNumber - 1) / 7 AS ItemGroup

    INTO #Splitted

    FROM dbo.DelimitedSplit8K( @STR, '|')s;

    SELECT STUFF(( SELECT '| ' + i.Item

    FROM #Splitted i

    WHERE i.ItemGroup = s.ItemGroup

    ORDER BY i.ItemNumber

    FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '')

    FROM #Splitted s

    GROUP BY ItemGroup;

    DROP TABLE #Splitted;

    Please forgive me if I'm misunderstanding the requirement and for not coding an example (I'm on vacation without a PC).

    I don't think a splitter or concatenation is required here. Why not just use a tally table and row_number() to identify the location of the 7th delimiter: let's call that value X. Then for string1:

    SUBSTRING(yourstring, 1, X-1)

    For string2:

    SUBSTRING(yourstring, X+1, 8000)

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Here are 2 modified versions of the DelimitedSplit8k function. The second one will only work on 2012 and more recent versions.

    CREATE FUNCTION [dbo].[DelimitedSplit8KNOccurrence](@pString [varchar](8000), @pDelimiter [char](1), @pOccurrence int)

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

    -- enough to cover NVARCHAR(4000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1,G1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1, 0 UNION ALL

    SELECT t.N+1 , ROW_NUMBER() OVER(ORDER BY N)

    FROM cteTally t

    WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

    ),

    cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

    SELECT s1.N1, ISNULL( s2.N1 - s1.N1 - 1, 8000)

    FROM cteStart s1

    LEFT JOIN cteStart s2 ON s1.G1/@pOccurrence = s2.G1/@pOccurrence - 1 AND s2.G1 % @pOccurrence = 0

    WHERE s1.G1 % @pOccurrence = 0

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

    Item = SUBSTRING(@pString, l.N1, l.L1)

    FROM cteLen l

    ;

    GO

    CREATE FUNCTION [dbo].[DelimitedSplit8KLeadNOccurrence](@pString [varchar](8000), @pDelimiter [char](1), @pOccurrence int)

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT 0 UNION ALL

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1, G1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT t.N+1, ROW_NUMBER() OVER(ORDER BY N) % @pOccurrence

    FROM cteTally t

    WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),

    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))

    FROM cteStart s

    WHERE G1 = 1;

    GO

    With those, the final code would be simplified to a single statement.

    DECLARE @STR VARCHAR(1000)

    SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'

    SELECT *

    FROM dbo.DelimitedSplit8KNOccurrence( @STR, '|', 7)s;

    I'm not able to claim any performance improvements, because I haven't test both approaches for performance. That doesn't mean that there won't be, it only means that you need to test them.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Alan.B (10/14/2016)


    Luis Cazares (10/14/2016)


    Here's a simple option using the DelimitedSplit8k that can be found here along with its explanation: http://www.sqlservercentral.com/articles/Tally+Table/72993/

    A specific function could be made to split only on the 7th delimiter and avoid having to concatenate the strings again.

    DECLARE @STR VARCHAR(1000),

    @str1 VARCHAR(1000),

    @str2 VARCHAR(1000)

    SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'

    SELECT s.Item, s.ItemNumber, (s.ItemNumber - 1) / 7 AS ItemGroup

    INTO #Splitted

    FROM dbo.DelimitedSplit8K( @STR, '|')s;

    SELECT STUFF(( SELECT '| ' + i.Item

    FROM #Splitted i

    WHERE i.ItemGroup = s.ItemGroup

    ORDER BY i.ItemNumber

    FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '')

    FROM #Splitted s

    GROUP BY ItemGroup;

    DROP TABLE #Splitted;

    Please forgive me if I'm misunderstanding the requirement and for not coding an example (I'm on vacation without a PC).

    I don't think a splitter or concatenation is required here. Why not just use a tally table and row_number() to identify the location of the 7th delimiter: let's call that value X. Then for string1:

    SUBSTRING(yourstring, 1, X-1)

    For string2:

    SUBSTRING(yourstring, X+1, 8000)

    That would work when there are 2 values. I'm not sure if that would always be the case.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Here's a simple, brute-force -- but still no looping -- method, if you prefer that.

    SELECT

    string AS original_string,

    CASE WHEN are_enough_delims_present = 0 THEN string

    ELSE LEFT(string, string_delim_7 - 1) END AS string1,

    CASE WHEN are_enough_delims_present = 0 THEN ''

    ELSE LTRIM(SUBSTRING(string, string_delim_7 + 1, 8000)) END AS string2

    FROM (

    VALUES('45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'),

    ('46 | xxx | not | enough | delims')

    ) AS test_data(string)

    CROSS APPLY (

    SELECT CAST(CASE WHEN string LIKE '%|%|%|%|%|%|%|%'

    THEN 1 ELSE 0 END AS bit) AS are_enough_delims_present

    ) AS ca1

    OUTER APPLY (

    SELECT CHARINDEX('|', string) AS string_delim_1

    WHERE are_enough_delims_present = 1

    ) AS ca2

    OUTER APPLY (

    SELECT CHARINDEX('|', string, string_delim_1 + 1) AS string_delim_2

    WHERE are_enough_delims_present = 1

    ) AS ca3

    OUTER APPLY (

    SELECT CHARINDEX('|', string, string_delim_2 + 1) AS string_delim_3

    WHERE are_enough_delims_present = 1

    ) AS ca4

    OUTER APPLY (

    SELECT CHARINDEX('|', string, string_delim_3 + 1) AS string_delim_4

    WHERE are_enough_delims_present = 1

    ) AS ca5

    OUTER APPLY (

    SELECT CHARINDEX('|', string, string_delim_4 + 1) AS string_delim_5

    WHERE are_enough_delims_present = 1

    ) AS ca6

    OUTER APPLY (

    SELECT CHARINDEX('|', string, string_delim_5 + 1) AS string_delim_6

    WHERE are_enough_delims_present = 1

    ) AS ca7

    OUTER APPLY (

    SELECT CHARINDEX('|', string, string_delim_6 + 1) AS string_delim_7

    WHERE are_enough_delims_present = 1

    ) AS ca8

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (10/14/2016)


    Here's a simple, brute-force -- but still no looping -- method, if you prefer that.

    Not very efficient with 7 x ( source scan -> Filter -> Compute Scalar -> Nested Loops Join ) in the execution plan, better not use this on a large set.

    😎

    Here is a more efficient alternative, beats the c..p out of the string splitting functions

    USE TEEST;

    GO

    SET NOCOUNT ON;

    ;WITH SAMPLE_DATA AS

    (

    SELECT (SMPL_STR) FROM

    (

    VALUES('45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc')

    ,('46 | xxx | not | enough | delims')

    ) AS X(SMPL_STR)

    )

    ,DELIMITER_COUNT(DLMCNT,SMPL_STR) AS

    (

    SELECT

    LEN(SD.SMPL_STR) - LEN(REPLACE(SD.SMPL_STR,CHAR(124),'')) AS DLMCNT

    ,SD.SMPL_STR AS SMPL_STR

    FROM SAMPLE_DATA SD

    )

    SELECT

    DC.SMPL_STR

    ,CASE

    WHEN DC.DLMCNT > 6 THEN SUBSTRING(DC.SMPL_STR,1,CI7.CI - 1)

    ELSE DC.SMPL_STR

    END AS PART_ONE

    ,CASE

    WHEN DC.DLMCNT > 6 THEN SUBSTRING(DC.SMPL_STR,CI7.CI + 1,LEN(DC.SMPL_STR))

    ELSE NULL

    END PART_TWO

    FROM DELIMITER_COUNT DC

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR, 1) ) CI1(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI1.CI + 1) ) CI2(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI2.CI + 1) ) CI3(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI3.CI + 1) ) CI4(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI4.CI + 1) ) CI5(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI5.CI + 1) ) CI6(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI6.CI + 1) ) CI7(CI)

    ;

    Output

    SMPL_STR PART_ONE PART_TWO

    --------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------- -------------------------------------------------------------

    45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc 45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc

    46 | xxx | not | enough | delims 46 | xxx | not | enough | delims NULL

    Edit: Typo

  • Eirikur Eiriksson (10/15/2016)


    Here is a more efficient alternative, beats the c..p out of the string splitting functions

    VERY cool concept but, in this case, it appears to be coming up with an incorrect result set. Yes, I absolutely agree that it has successfully split out the two parts correctly but it's spitting them out on the same row. The requirement, if I'm reading the original post correctly, is that each of the parts should reside on a separate row.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/16/2016)


    Eirikur Eiriksson (10/15/2016)


    Here is a more efficient alternative, beats the c..p out of the string splitting functions

    VERY cool concept but, in this case, it appears to be coming up with an incorrect result set. Yes, I absolutely agree that it has successfully split out the two parts correctly but it's spitting them out on the same row. The requirement, if I'm reading the original post correctly, is that each of the parts should reside on a separate row.

    Good catch Jeff, thanks. Here is a corrected version to fit the OP's requirements.

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    ;WITH SAMPLE_DATA AS

    (

    SELECT (SMPL_STR) FROM

    (

    VALUES('45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc')

    ,('46 | xxx | not | enough | delims')

    ) AS X(SMPL_STR)

    )

    ,DELIMITER_COUNT(DLMCNT,SMPL_STR) AS

    (

    SELECT

    LEN(SD.SMPL_STR) - LEN(REPLACE(SD.SMPL_STR,CHAR(124),'')) AS DLMCNT

    ,SD.SMPL_STR AS SMPL_STR

    FROM SAMPLE_DATA SD

    )

    ,CHOPPED_STRINGS AS

    (

    SELECT

    CASE

    WHEN DC.DLMCNT > 6 THEN SUBSTRING(DC.SMPL_STR,1,CI7.CI - 1)

    ELSE DC.SMPL_STR

    END AS PART_ONE

    ,CASE

    WHEN DC.DLMCNT > 6 THEN SUBSTRING(DC.SMPL_STR,CI7.CI + 1,LEN(DC.SMPL_STR))

    ELSE NULL

    END PART_TWO

    FROM DELIMITER_COUNT DC

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR, 1) ) CI1(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI1.CI + 1) ) CI2(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI2.CI + 1) ) CI3(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI3.CI + 1) ) CI4(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI4.CI + 1) ) CI5(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI5.CI + 1) ) CI6(CI)

    CROSS APPLY (SELECT CHARINDEX(CHAR(124),DC.SMPL_STR,CI6.CI + 1) ) CI7(CI)

    )

    SELECT

    STRING_PARTS.PART

    FROM CHOPPED_STRINGS

    CROSS APPLY

    (

    SELECT PART_ONE UNION ALL

    SELECT PART_TWO

    ) STRING_PARTS(PART)

    ;

    Output

    PART

    ---------------------------------------------------------------------

    45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg

    TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc

    46 | xxx | not | enough | delims

    NULL

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply