Split input string into multicolumn - multirows

  • dwain.c (12/14/2012)


    Jeff has said (he can correct me if I'm wrong) that DelimitedSplit8K is optimized for VARCHAR(8000) so change it to VARCHAR(MAX) with that in mind.

    I think although I don't have rigid testing results that PatternSplitCM is not significantly affected if you change to VARCHAR(MAX).

    It is, indeed, optimized for 8K. You also have to remember that VARCHAR(MAX) doesn't like to be joined to even by a Tally Table and will cause at least a 2 to 1 slowdown just by changing the function variables from VARCHAR(8000) to VARCHAR(MAX) even if the data stays under 8K.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff - Sorry those things are not pairs/dont increment by one. I just made that example in excel.

  • Yes, DelimitedSplit8K is miraculously optimized for VARCHAR(8000). It runs very fast!

  • Hi - Quick question. Do you know why VARCHAR(MAX) doesnt work in this query?

    It returns the following error:

    Msg 240, Level 16, State 1, Line 110

    Types don't match between the anchor and the recursive part in column "rowstrt" of recursive query "cte".

    Thanks!

    nigelrivett (12/11/2012)


    This will do something like that.

    It caters for any number of fields - just amend the final select for the number required

    declare @s-2 varchar(8000)

    select @s-2 = '30;38469;1|31;38470;1|32;38471;1|33;38472;1|34;38473;1|35;38474;1|36;38475;1|37;38476;1|38;38477;1|'

    select @s-2 = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'

    declare @rowterm varchar(1) = '|'

    declare @fieldterm varchar(1) = ';'

    ;with cte as

    (

    select rowstrt = 1, rowend = charindex(@rowterm,@s)-1, seq = 1

    union all

    select rowstrt = rowend+2, rowend = charindex(@rowterm,@s,rowend+2)-1, seq = seq + 1

    from cte where charindex(@rowterm,@s,rowend+2)<>0

    ) ,

    cte2 as

    (

    select s = substring(@s, rowstrt,rowend-rowstrt+1), seq

    from cte

    ) ,

    cte3 as

    (

    select fldstrt = 1, fldend = charindex(@fieldterm,s)-1, seq, s, fldseq = 1 from cte2

    union all

    select fldstrt = fldend+2,

    fldend = case when charindex(@fieldterm,s,fldend+2) <> 0 then charindex(@fieldterm,s,fldend+2)-1 else len(s) end,

    seq, s, fldseq = fldseq+1

    from cte3 where fldend < len(s)

    )

    select s1.s, s2.s, s3.s

    from

    (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 1) s1

    join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 2) s2 on s1.seq = s2.seq

    left join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 3) s3 on s1.seq = s3.seq

  • Yes... both variables and any/all string literals must be defined as VARCHAR(MAX) including the ones that hold the single character delimiters. It's one of the great joys of recursive CTE's. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • murthyvs (12/14/2012)


    Hi Jeff - Sorry those things are not pairs/dont increment by one. I just made that example in excel.

    Like I said... remarkable coincidence. Thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dwain.c (12/11/2012)


    Here's another way:

    DECLARE @MyString VARCHAR(8000) =

    '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'

    SELECT col1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)

    ,col2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)

    FROM PatternSplitCM(@MyString, '[0-9]')

    WHERE [Matched] = 1

    GROUP BY ItemNumber/4

    PatternSplitCM is described here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/

    Not sure it will be faster than using DelimitedSplit8K but since you're calling that function twice, it might give it a run for its money.

    Since this is posted in a SQL 2005 forum, you'll need to note that the Tally Table CTE used in PatternSplitCM in the article requires SQL 2008. However it is easily replaced by an Itzik Ben-Gan style CTE tally table like the one that appears (I think) in PatternSplitQU (also in the article).

    Edit: Revised (simplified) my vector of attack slightly.

    Hi - Just exploring some VARCHAR(MAX) options. Will patternsplitcm work for this dataset?

    dataset1:

    A123W1;EESLGKNR912|

    A12W2;EESLGKNR913|

    A3W3;EESLGR914|

    123W4;12EESLGKNR915|

    123W5;EESL12NR916|

    A123W;E31LGKNR917|

    A123W7;EE18|

    A123W8;1234|

    1238;1234|

    dataset2:

    A123W1EESLGKNR912

    A12W2EESLGKNR913

    A3W3EESLGR914

    123W412EESLGKNR915

    123W5EESL12NR916

    A123WE31LGKNR917

    A123W7EE18

    A123W81234

    12381234

    Thanks

  • Here's a way that uses XML, which you could generate in the web application before passing the data to SQL, or even just convert to XML and pump straight into a SQL table all within the web app...

    Anyway, this is the SQL version:

    -- set up some sample input data

    declare @input varchar(max);

    set @input = '30;38469|

    31;38470|

    32;38471|

    33;38472|

    34;38473|

    35;38474|

    36;38475|

    37;38476|

    38;38477|'

    -- convert to xml

    -- of course, this could be done in your web app and may be quicker.

    declare @xml xml;

    SET @xml =

    CONVERT(xml,

    '<row><col>'+

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    @input,

    CHAR(13),

    ''

    ),

    CHAR(10),

    ''

    ),

    '|',

    '</col></row><row><col>'

    ),

    ';',

    '</col><col>'

    )

    +'</col></row>'

    );

    -- and select the values out as rows/columns

    select

    nd.value('(col/text())[1]','varchar(200)') as field1,

    nd.value('(col/text())[2]','varchar(200)') as field2

    from @xml.nodes('row') as x(nd)

    where not nd.value('(col/text())[1]','varchar(200)') is null

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • murthyvs (12/14/2012)


    dwain.c (12/11/2012)


    Here's another way:

    DECLARE @MyString VARCHAR(8000) =

    '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'

    SELECT col1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)

    ,col2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)

    FROM PatternSplitCM(@MyString, '[0-9]')

    WHERE [Matched] = 1

    GROUP BY ItemNumber/4

    PatternSplitCM is described here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/

    Not sure it will be faster than using DelimitedSplit8K but since you're calling that function twice, it might give it a run for its money.

    Since this is posted in a SQL 2005 forum, you'll need to note that the Tally Table CTE used in PatternSplitCM in the article requires SQL 2008. However it is easily replaced by an Itzik Ben-Gan style CTE tally table like the one that appears (I think) in PatternSplitQU (also in the article).

    Edit: Revised (simplified) my vector of attack slightly.

    Hi - Just exploring some VARCHAR(MAX) options. Will patternsplitcm work for this dataset?

    dataset1:

    A123W1;EESLGKNR912|

    A12W2;EESLGKNR913|

    A3W3;EESLGR914|

    123W4;12EESLGKNR915|

    123W5;EESL12NR916|

    A123W;E31LGKNR917|

    A123W7;EE18|

    A123W8;1234|

    1238;1234|

    dataset2:

    A123W1EESLGKNR912

    A12W2EESLGKNR913

    A3W3EESLGR914

    123W412EESLGKNR915

    123W5EESL12NR916

    A123WE31LGKNR917

    A123W7EE18

    A123W81234

    12381234

    Thanks

    For dataset1 the answer is yes. Use pattern: '[0-9A-Z]' or if the database is case sensitive: '[0-9A-Za-z]'

    For dataset2 it is possible the same pattern will yield the results you seek, except it is unclear what the row split character is. CHAR(10) or CHAR(15) perhaps?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi - this a nice method. didnt know this before.

    I have created this function and using it in a stored proc..

    CREATE FUNCTION [dbo].[FnXmlSplit](@input_string VARCHAR(max))

    RETURNS @temptable TABLE (

    field1 VARCHAR(max)

    ,field2 VARCHAR(max))

    AS

    BEGIN

    DECLARE @xml XML;

    SET @xml = CONVERT(XML, '<row><col>'

    + REPLACE(REPLACE(@input_string, CHAR(13), '</col></row><row><col>'), CHAR(9), '</col><col>' )

    + '</col></row>');

    INSERT @temptable

    select

    Ltrim(Rtrim(replace(replace(field1,char(10),''),char(9),''))) as field1, Ltrim(Rtrim(replace(replace(field2,char(10),''),char(9),''))) as field2

    from (

    SELECT nd.value('(col/text())[1]', 'varchar(200)') AS field1

    ,nd.value('(col/text())[2]', 'varchar(200)') AS field2

    FROM @xml.nodes('row') AS x(nd)

    WHERE NOT nd.value('(col/text())[1]', 'varchar(200)') IS NULL) as a;

    RETURN;

    END;

    Function call in the stored proc: select * from dbo.FnXmlSplit (@input_string).

    For some reason, this method when used inside a stored proc, it runs slow. Any idea why? Thanks!

    mister.magoo (12/15/2012)


    Here's a way that uses XML, which you could generate in the web application before passing the data to SQL, or even just convert to XML and pump straight into a SQL table all within the web app...

    Anyway, this is the SQL version:

    -- set up some sample input data

    declare @input varchar(max);

    set @input = '30;38469|

    31;38470|

    32;38471|

    33;38472|

    34;38473|

    35;38474|

    36;38475|

    37;38476|

    38;38477|'

    -- convert to xml

    -- of course, this could be done in your web app and may be quicker.

    declare @xml xml;

    SET @xml =

    CONVERT(xml,

    '<row><col>'+

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    @input,

    CHAR(13),

    ''

    ),

    CHAR(10),

    ''

    ),

    '|',

    '</col></row><row><col>'

    ),

    ';',

    '</col><col>'

    )

    +'</col></row>'

    );

    -- and select the values out as rows/columns

    select

    nd.value('(col/text())[1]','varchar(200)') as field1,

    nd.value('(col/text())[2]','varchar(200)') as field2

    from @xml.nodes('row') as x(nd)

    where not nd.value('(col/text())[1]','varchar(200)') is null

    Edited: My input is two columns of excel rows

  • Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.

    sample:

    3038469

    3138470

    The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!

  • murthyvs (12/16/2012)


    Hi - this a nice method. didnt know this before.

    I have created this function and using it in a stored proc..

    Function call in the stored proc: select * from dbo.FnXmlSplit (@input_string).

    For some reason, this method when used inside a stored proc, it runs slow. Any idea why? Thanks!

    Edited: My input is two columns of excel rows

    Yeah, it's not the fastest way to do it.

    If you want speed, split the data at source or persist with dwain's patter split or Jeff's DelimitedSplit8K

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • murthyvs (12/16/2012)


    Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.

    sample:

    3038469

    3138470

    The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!

    The Delimittedsplit8k only counts up to E4's max which is 10,000. If you have more than 10,000 characters in the string you'll need to increase the size of the tally based CTE by creating an E8 which would cross 10,000 by 10,000 and handle

    100,000,000 columns of data for searching for delimitters.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • mtassin (12/17/2012)


    murthyvs (12/16/2012)


    Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.

    sample:

    3038469

    3138470

    The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!

    The Delimittedsplit8k only counts up to E4's max which is 10,000. If you have more than 10,000 characters in the string you'll need to increase the size of the tally based CTE by creating an E8 which would cross 10,000 by 10,000 and handle

    100,000,000 columns of data for searching for delimitters.

    The same applies to the Tally tables generated within PatternSplitCM.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I know Jeff Moden's splitter function has already been discussed. But this is how I would do it with no aggregates or case statements:

    DECLARE

    @InputString VARCHAR(8000)

    ,@Delimiter1 CHAR(1)

    ,@Delimiter2 CHAR(1)

    SET @InputString = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'

    SET @Delimiter1 = '|'

    SET @Delimiter2 = ';'

    --to get rid of the extra '|' character at the end

    SET @InputString = LEFT(@InputString,LEN(@InputString)-1)

    SELECT

    (SELECT Item FROM dbo.DelimitedSplit8K(r1.Item,@Delimiter2) WHERE ItemNumber = 1) AS Col1

    ,(SELECT Item FROM dbo.DelimitedSplit8K(r1.Item,@Delimiter2) WHERE ItemNumber = 2) AS Col2

    FROM

    (

    SELECT

    dsk1.Item

    FROM

    dbo.DelimitedSplit8K(@InputString,@Delimiter1) AS dsk1

    ) r1

    Output:

    Col1Col2

    3038469

    3138470

    3238471

    3338472

    3438473

    3538474

    3638475

    3738476

    3838477

     

  • Viewing 15 posts - 16 through 30 (of 57 total)

    You must be logged in to reply to this topic. Login to reply