Split input string into multicolumn - multirows

  • Hello all - I am having hard time to split an input string into multicolumn - multirows.

    Task - Create a stored procedure that reads an input string with pre-defined field and row terminators; splits the string into multicolumn - multirows; and inserts records into a table.

    The table structure is predefined:

    create table #temp (field1 varchar(200), field2 varchar(200));

    Here is an example input string: (the input string is of variable length; the fields are variable lengths as well)

    30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|

    ";" = field terminator

    "|" = row terminator

    The stored proc should insert the values in this fashion:

    field1 field2

    3038469

    3138470

    3238471

    3338472

    3438473

    3538474

    3638475

    3738476

    3838477

    Storing the input string as a text file is not an option. Life would have been much better if I could save the text file and use bulk insert!!

    Any help is greatly appreciated. Thanks!!

  • This will do something like that.

    It caters for any number of fields - just amend the final select for the number required

    declare @s-2 varchar(8000)

    select @s-2 = '30;38469;1|31;38470;1|32;38471;1|33;38472;1|34;38473;1|35;38474;1|36;38475;1|37;38476;1|38;38477;1|'

    select @s-2 = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'

    declare @rowterm varchar(1) = '|'

    declare @fieldterm varchar(1) = ';'

    ;with cte as

    (

    select rowstrt = 1, rowend = charindex(@rowterm,@s)-1, seq = 1

    union all

    select rowstrt = rowend+2, rowend = charindex(@rowterm,@s,rowend+2)-1, seq = seq + 1

    from cte where charindex(@rowterm,@s,rowend+2)<>0

    ) ,

    cte2 as

    (

    select s = substring(@s, rowstrt,rowend-rowstrt+1), seq

    from cte

    ) ,

    cte3 as

    (

    select fldstrt = 1, fldend = charindex(@fieldterm,s)-1, seq, s, fldseq = 1 from cte2

    union all

    select fldstrt = fldend+2,

    fldend = case when charindex(@fieldterm,s,fldend+2) <> 0 then charindex(@fieldterm,s,fldend+2)-1 else len(s) end,

    seq, s, fldseq = fldseq+1

    from cte3 where fldend < len(s)

    )

    select s1.s, s2.s, s3.s

    from

    (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 1) s1

    join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 2) s2 on s1.seq = s2.seq

    left join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 3) s3 on s1.seq = s3.seq


    Cursors never.
    DTS - only when needed and never to control.

  • My hope is that Jeff Moden won't take um-bridge to my ripping apart of his code. . . 😀

    I've based this answer on the 8K splitter[/url], with some modifications to make it split on multiple deliminators.

    IF EXISTS (SELECT 1

    FROM sysobjects

    WHERE id = OBJECT_ID('SSC_Multi_Split') AND OBJECTPROPERTY(id, 'IsProcedure') = 1)

    BEGIN

    DROP PROCEDURE SSC_Multi_Split;

    END

    GO

    CREATE PROCEDURE SSC_Multi_Split (@string VARCHAR(8000), @rowDeliminater CHAR(1), @colDeliminater CHAR(1))

    AS

    BEGIN

    IF object_id('tempdb..#temporaryResultHolder') IS NOT NULL

    BEGIN

    DROP TABLE #temporaryResultHolder;

    END;

    WITH CTE1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),

    CTE2(N) AS (SELECT 1 FROM CTE1 X CROSS JOIN CTE1 Y),

    CTE3(N) AS (SELECT 1 FROM CTE2 X CROSS JOIN CTE2 Y),

    CTE4(N) AS (SELECT 1 FROM CTE3 X CROSS JOIN CTE3 Y),

    CTE5(N) AS (SELECT 1 FROM CTE4 X CROSS JOIN CTE4 Y),

    CTE6(N) AS (SELECT 1 FROM CTE5 X CROSS JOIN CTE5 Y),

    TALLY(N) AS (SELECT TOP (ISNULL(DATALENGTH(@string),0))

    ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM CTE6),

    RowStart(start) AS (SELECT 1 UNION ALL

    SELECT N+1

    FROM TALLY

    WHERE SUBSTRING(@string,N,1) = @rowDeliminater

    ),

    RowLen(start,size) AS (SELECT start,

    ISNULL(NULLIF(CHARINDEX(@rowDeliminater,@string,start),0)-start,ISNULL(DATALENGTH(@string),8000))

    FROM RowStart

    ),

    RowData(rowNumber,data) AS (SELECT ROW_NUMBER() OVER(ORDER BY start),

    SUBSTRING(@string, start, size)

    FROM RowLen

    ),

    ColumnStart(start, rowNumber) AS (SELECT b.N, b.rowNumber

    FROM (SELECT N+1, rowNumber

    FROM TALLY

    CROSS APPLY (SELECT rowNumber,data

    FROM RowData) b

    WHERE SUBSTRING(data,N,1) = @colDeliminater

    )a(N, rowNumber)

    CROSS APPLY (SELECT 1, rowNumber UNION ALL SELECT N, rowNumber)b(N, rowNumber)

    ),

    ColumnLen(rowNumber,start,size,data) AS (SELECT b.rowNumber, start,

    ISNULL(NULLIF(CHARINDEX(@colDeliminater,data,start),0)-start,ISNULL(DATALENGTH(data),8000)),

    data

    FROM ColumnStart a

    CROSS APPLY (SELECT rowNumber,data

    FROM RowData

    WHERE a.rowNumber = rowNumber) b

    ),

    allData(rowNumber,columnNumber,data) AS (SELECT rowNumber, ROW_NUMBER() OVER(PARTITION BY rowNumber ORDER BY start),

    SUBSTRING(data, start, size)

    FROM ColumnLen

    )

    SELECT rowNumber,columnNumber,data

    INTO #temporaryResultHolder

    FROM allData;

    DECLARE @sql NVARCHAR(MAX);

    WITH CTE1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),

    CTE2(N) AS (SELECT 1 FROM CTE1 X CROSS JOIN CTE1 Y),

    CTE3(N) AS (SELECT 1 FROM CTE2 X CROSS JOIN CTE2 Y),

    CTE4(N) AS (SELECT 1 FROM CTE3 X CROSS JOIN CTE3 Y),

    CTE5(N) AS (SELECT 1 FROM CTE4 X CROSS JOIN CTE4 Y),

    CTE6(N) AS (SELECT 1 FROM CTE5 X CROSS JOIN CTE5 Y),

    TALLY(N) AS (SELECT TOP (ISNULL(DATALENGTH(@string),0))

    ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM CTE6)

    SELECT @sql = STUFF((SELECT ','+CHAR(13)+CHAR(10)+'MAX(CASE WHEN columnNumber = '+CAST(N AS VARCHAR(20))+' THEN data ELSE '+CHAR(39)+CHAR(39)+' END) AS field'+CAST(N AS VARCHAR(20))

    FROM TALLY

    CROSS APPLY (SELECT TOP 1 columnNumber

    FROM #temporaryResultHolder

    ORDER BY columnNumber DESC) b

    WHERE columnNumber >= N

    FOR XML PATH(''), TYPE

    ).value('.', 'NVARCHAR(MAX)'),1,3,'');

    SELECT @sql = 'SELECT '+@SQL+CHAR(13)+CHAR(10)+'FROM #temporaryResultHolder'+CHAR(13)+CHAR(10)+'GROUP BY rowNumber'+CHAR(13)+CHAR(10)+'ORDER BY rowNumber';

    EXECUTE sp_executeSQL @sql;

    END

    GO

    To test: -

    EXECUTE SSC_Multi_Split '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|','|',';';

    Which returns: -

    field1 field2

    ------ -------

    30 38469

    31 38470

    32 38471

    33 38472

    34 38473

    35 38474

    36 38475

    37 38476

    38 38477


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks much, nigelrivett .. this is a great help! I could understand the code. Really appreciate it.

  • Thanks Cadavre. This little too complicated for a newbie. But I'll go through the code and understand the beauty of it. Appreciate your quick reply.

  • Just some food for thought:

    declare @TestString varchar(8000) = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|';

    select

    max(case ds2.ItemNumber when 1 then ds2.Item else '' end) as field1,

    max(case ds2.ItemNumber when 2 then ds2.Item else '' end) as field2

    from

    dbo.DelimitedSplit8K(@TestString,'|') ds1

    cross apply dbo.DelimitedSplit8K(ds1.Item,';') ds2

    where

    ds1.Item <> ''

    group by

    ds1.ItemNumber;

    Uses Jeff's DelimitedSplit8K function as is.

    You will find the code for the dbo.DelimitedSplit8K function here: http://www.sqlservercentral.com/articles/Tally+Table/72993/.

  • Lynn Pettis (12/11/2012)


    Just some food for thought:

    declare @TestString varchar(8000) = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|';

    select

    max(case ds2.ItemNumber when 1 then ds2.Item else '' end) as field1,

    max(case ds2.ItemNumber when 2 then ds2.Item else '' end) as field2

    from

    dbo.DelimitedSplit8K(@TestString,'|') ds1

    cross apply dbo.DelimitedSplit8K(ds1.Item,';') ds2

    where

    ds1.Item <> ''

    group by

    ds1.ItemNumber;

    Uses Jeff's DelimitedSplit8K function as is.

    You will find the code for the dbo.DelimitedSplit8K function here: http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    You're right, that's a much better idea. To make it work for the OPs requirements, I'd stick it in a sproc to generate the result-set dynamically so that the number of "field" doesn't need to be known before-hand.

    So first, we need Jeff's splitter: -

    IF EXISTS (SELECT 1

    FROM sysobjects

    WHERE id = OBJECT_ID('DelimitedSplit8K'))

    BEGIN

    DROP FUNCTION [dbo].[DelimitedSplit8K];

    END

    GO

    CREATE FUNCTION [dbo].[DelimitedSplit8K]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

    ),

    cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

    SELECT s.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)

    FROM cteStart s

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

    Item = SUBSTRING(@pString, l.N1, l.L1)

    FROM cteLen l

    ;

    GO

    Then we need the calling sproc: -

    IF EXISTS (SELECT 1

    FROM sysobjects

    WHERE id = OBJECT_ID('SSC_Multi_Split'))

    BEGIN

    DROP PROCEDURE SSC_Multi_Split;

    END

    GO

    CREATE PROCEDURE SSC_Multi_Split (@string VARCHAR(8000), @rowDeliminater CHAR(1), @colDeliminater CHAR(1))

    AS

    BEGIN

    IF object_id('tempdb..#temporaryResultHolder') IS NOT NULL

    BEGIN

    DROP TABLE #temporaryResultHolder;

    END;

    SELECT ds1.ItemNumber AS rowNumber, ds2.ItemNumber AS columnNumber, ds2.Item AS data

    INTO #temporaryResultHolder

    FROM dbo.DelimitedSplit8K(@string, @rowDeliminater) ds1

    CROSS APPLY dbo.DelimitedSplit8K(ds1.Item, @colDeliminater) ds2

    WHERE ds1.Item <> '';

    DECLARE @sql NVARCHAR(MAX);

    WITH CTE1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),

    CTE2(N) AS (SELECT 1 FROM CTE1 X CROSS JOIN CTE1 Y),

    CTE3(N) AS (SELECT 1 FROM CTE2 X CROSS JOIN CTE2 Y),

    CTE4(N) AS (SELECT 1 FROM CTE3 X CROSS JOIN CTE3 Y),

    CTE5(N) AS (SELECT 1 FROM CTE4 X CROSS JOIN CTE4 Y),

    CTE6(N) AS (SELECT 1 FROM CTE5 X CROSS JOIN CTE5 Y),

    TALLY(N) AS (SELECT TOP (ISNULL(DATALENGTH(@string),0))

    ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM CTE6)

    SELECT @sql = STUFF((SELECT ','+CHAR(13)+CHAR(10)+'MAX(CASE WHEN columnNumber = '+CAST(N AS VARCHAR(20))+' THEN data ELSE '+CHAR(39)+CHAR(39)+' END) AS field'+CAST(N AS VARCHAR(20))

    FROM TALLY

    CROSS APPLY (SELECT TOP 1 columnNumber

    FROM #temporaryResultHolder

    ORDER BY columnNumber DESC) b

    WHERE columnNumber >= N

    FOR XML PATH(''), TYPE

    ).value('.', 'NVARCHAR(MAX)'),1,3,'');

    SELECT @sql = 'SELECT '+@SQL+CHAR(13)+CHAR(10)+'FROM #temporaryResultHolder'+CHAR(13)+CHAR(10)+'GROUP BY rowNumber'+CHAR(13)+CHAR(10)+'ORDER BY rowNumber';

    EXECUTE sp_executeSQL @sql;

    END

    GO

    And we execute as follows: -

    EXECUTE SSC_Multi_Split '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|','|',';';


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks Cadavre. This is cool stuff.

    Can your code take two row terminators? pipe (|) and line break ()?

    Example data:

    field1;field2|

    30;38469|

    31;38469|

    32;38469|

    33;38469|

    34;38469|

    I didn't notice the line break in the original data. Very sorry.

    The intent is that my user copies example data from excel and enters into a multiline text box in a webpage. He then hits a button that would transfer the input string (example data) to a stored proc that reads the data and splits into multiple column - multiple rows.

    Thanks for all your help.

  • murthyvs (12/11/2012)


    Thanks Cadavre. This is cool stuff.

    Can your code take two row terminators? pipe (|) and line break ()?

    Example data:

    field1;field2|

    30;38469|

    31;38469|

    32;38469|

    33;38469|

    34;38469|

    I didn't notice the line break in the original data. Very sorry.

    The intent is that my user copies example data from excel and enters into a multiline text box in a webpage. He then hits a button that would transfer the input string (example data) to a stored proc that reads the data and splits into multiple column - multiple rows.

    Thanks for all your help.

    Why not just remove the line breaks?

    DECLARE @exampleData VARCHAR(8000) = '30;38469|

    31;38469|

    32;38469|

    33;38469|

    34;38469|';

    SET @exampleData = REPLACE(REPLACE(@exampleData,CHAR(13),''),CHAR(10),'');

    EXECUTE SSC_Multi_Split @exampleData,'|',';'


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Awesome .. thanks!!

  • Here's another way:

    DECLARE @MyString VARCHAR(8000) =

    '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'

    SELECT col1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)

    ,col2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)

    FROM PatternSplitCM(@MyString, '[0-9]')

    WHERE [Matched] = 1

    GROUP BY ItemNumber/4

    PatternSplitCM is described here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/

    Not sure it will be faster than using DelimitedSplit8K but since you're calling that function twice, it might give it a run for its money.

    Since this is posted in a SQL 2005 forum, you'll need to note that the Tally Table CTE used in PatternSplitCM in the article requires SQL 2008. However it is easily replaced by an Itzik Ben-Gan style CTE tally table like the one that appears (I think) in PatternSplitQU (also in the article).

    Edit: Revised (simplified) my vector of attack slightly.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi - the solution works as expected. Is there anyway to increase the size of the input string to 100,000. Its only limited to 8,000.

    Thanks much!

  • Jeff has said (he can correct me if I'm wrong) that DelimitedSplit8K is optimized for VARCHAR(8000) so change it to VARCHAR(MAX) with that in mind.

    I think although I don't have rigid testing results that PatternSplitCM is not significantly affected if you change to VARCHAR(MAX).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Yes, I have tried it. I have observed performance degradation. Query runs much slower with VARCHAR(MAX).

  • murthyvs (12/14/2012)


    Hi - the solution works as expected. Is there anyway to increase the size of the input string to 100,000. Its only limited to 8,000.

    Thanks much!

    Unless your example is just a striking coincidence, I've seen such a thing before and I'd like to make a totally alternate suggestion if what I suspect is actually true.

    It would appear that both fields in the string are sequential in their own right. If that is true (as it was with what I've seen before), then it will save you a huge amount of aggravation, the server a huge amount of processing time and I/O, your network an unnecessary blurb of 100K bytes for each usage, and the loss of feeling in your left arm when everyone jumps on you for resource usage, if you were to send just a starting value for each field and the number of field-pairs and let SQL very quickly generate even millions of pairs more quickly than you could ever transmit them.

    For example, your example has fields that start with 30 and 38469 respectively and then both increment by 1 for 9 pairs of fields. Using the method I proposed above, you would pass only those 3 parameters over the pipe and then let SQL Server have at it. That would be capable of generating millions of rows if you needed it an it would happen a whole lot quicker than trying to send a million such field pairs over the pipe. For example, here code that does what I said o a million rows. I believe you'll find it to be remarkably quick.

    --===== These would be parameters in a stored procedure

    DECLARE @F1Start INT,

    @F2Start INT,

    @Pairs INT

    ;

    SELECT @F1Start = 30,

    @F2Start = 38469,

    @Pairs = 1000000

    ;

    --===== This would do the deed as previously described.

    WITH

    cteTally AS

    (

    SELECT TOP (@Pairs)

    N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    )

    SELECT Field1 = @F1Start + N,

    Field2 = @F2Start + N

    INTO #Temp

    FROM cteTally

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 57 total)

You must be logged in to reply to this topic. Login to reply