Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • Sean Lange (2/11/2013)


    ahpitre (2/11/2013)


    How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

    Easiest way I can think of would be to simply use replace.

    select *

    from YourTable

    cross apply dbo.DelimitedSplit8k(replace(YourColumn, ';', ','), ',')

    This way you are still splitting on commas but it will work for either character. 😉

    Be careful now! [Sorry I just love it when Jeff says that] Try this:

    CREATE TABLE #Strings

    (strcol VARCHAR(8000))

    ;WITH Tally (n) AS (

    SELECT TOP 1000 1 FROM sys.all_columns a, sys.all_columns b)

    INSERT INTO #Strings

    SELECT REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20)

    FROM Tally

    DECLARE @BlackHole VARCHAR(8000)

    PRINT 'Sean''s suggestion'

    SET STATISTICS TIME ON

    select @BlackHole=Item

    from #Strings

    cross apply dbo.DelimitedSplit8k(replace(strcol, ';', ','), ',')

    SET STATISTICS TIME OFF

    PRINT 'Dwain''s suggestion'

    SET STATISTICS TIME ON

    select @BlackHole=Item

    from #Strings

    CROSS APPLY (SELECT MyString=REPLACE(strcol COLLATE Latin1_General_BIN, ';', ',')) a

    cross apply dbo.DelimitedSplit8k(MyString, ',') b

    SET STATISTICS TIME OFF

    PRINT 'Sean''s suggestion with COLLATE'

    SET STATISTICS TIME ON

    select @BlackHole=Item

    from #Strings

    cross apply dbo.DelimitedSplit8k(replace(strcol COLLATE Latin1_General_BIN, ';', ','), ',')

    SET STATISTICS TIME OFF

    DROP TABLE #Strings

    I found that applying a built-in function to the string to be split in the DelimitedSplit8K FUNCTION's call has adverse performance effects. I can't say as to why. But these are the results:

    Sean's suggestion

    SQL Server Execution Times:

    CPU time = 27971 ms, elapsed time = 28411 ms.

    Dwain's suggestion

    SQL Server Execution Times:

    CPU time = 468 ms, elapsed time = 471 ms.

    Sean's suggestion with COLLATE

    SQL Server Execution Times:

    CPU time = 13323 ms, elapsed time = 13567 ms.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Steven Willis (2/11/2013)


    ...I can't guess what effect this would have on performance of the function.

    I was wondering about that! Thanks for doing the proof testing.

     

  • Steven Willis (2/11/2013)


    Steven Willis (2/11/2013)


    ...I can't guess what effect this would have on performance of the function.

    I was wondering about that! Thanks for doing the proof testing.

     

    I would love to hear an explanation for it. It is fortunate that I tried something like this once before so I knew the consequences.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Note that I posted the query plan in my previous post in case someone wants to take a shot at that analysis.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Great. Thanks for the prompt response. Do you think that the code could be modified to be recursive, in other words, if I send 1 delimiter, code within function runs once, if I send 2 delimiters, ir runs twice.

    What would be the performance penalty of recursion vs your suggestion of just simply replacing? If recursion takes more time and CPU cycles, then, could the function be modified to accespt the delimiters as part of an array, then, loop thru the array doring the replace portion, before doing the actual split?;-)

  • ahpitre (2/12/2013)


    Great. Thanks for the prompt response. Do you think that the code could be modified to be recursive, in other words, if I send 1 delimiter, code within function runs once, if I send 2 delimiters, ir runs twice.

    What would be the performance penalty of recursion vs your suggestion of just simply replacing? If recursion takes more time and CPU cycles, then, could the function be modified to accespt the delimiters as part of an array, then, loop thru the array doring the replace portion, before doing the actual split?;-)

    The whole point of this function to not do any looping. Looping is what causes sql server to crawl like a snail.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • OK. Great advice. I guess I could modify it just so the user can provide multiple delimiters, then use the replace before doing the split. Thanks.

  • Very interesting Dwain. I tried another idea to see how it would hold up. It seems that it is possible to use another temp table to hold the replaced values. I tried with your sample data and scaled up x10 up to a million rows and this approach seems to have a slight edge at all those sizes. Now if the table had more columns this is going to degrade as is scales but it is certainly interesting.

    CREATE TABLE #Strings

    (strcol VARCHAR(8000))

    ;WITH Tally (n) AS (

    SELECT TOP 10000 1 FROM sys.all_columns a, sys.all_columns b)

    INSERT INTO #Strings

    SELECT REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20)

    FROM Tally

    DECLARE @BlackHole VARCHAR(8000)

    PRINT 'Sean''s new suggestion'

    SET STATISTICS TIME ON

    --create a new table using the replace logic

    select replace(strcol, ';', ',') as strcol

    into #NewStrings

    from #Strings

    select @BlackHole=Item

    from #NewStrings

    cross apply dbo.DelimitedSplit8k(strcol, ',')

    SET STATISTICS TIME OFF

    PRINT 'Dwain''s suggestion'

    SET STATISTICS TIME ON

    select @BlackHole=Item

    from #Strings

    CROSS APPLY (SELECT MyString=REPLACE(strcol COLLATE Latin1_General_BIN, ';', ',')) a

    cross apply dbo.DelimitedSplit8k(MyString, ',') b

    SET STATISTICS TIME OFF

    DROP TABLE #Strings

    DROP TABLE #NewStrings

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • dwain.c (2/11/2013)


    I found that applying a built-in function to the string to be split in the DelimitedSplit8K function's call has adverse performance effects.

    If you look closely at the execution plans, you'll see the ones that do not perform well end up doing the REPLACE on a big string a *ridiculous* number of times. This is because the optimizer hardly costs scalar functions at all, so it does not care very much how many times they are executed so long as the result is correct.

    Physically separating the replace from the function call using Sean's method is a supported way to work around this limitation, though it does involve writing a copy of the whole input set. There are also *unreliable* tricks like the following, which may cause the replace to be applied only once:

    SELECT @BlackHole = dsk.Item

    FROM

    (

    SELECT

    strcol = REPLACE(strcol, ';', ',') + LEFT(NEWID(), 0)

    FROM #Strings

    ) AS s

    CROSS APPLY dbo.DelimitedSplit8K(s.strcol, ',') AS dsk;

    None of the methods shown so far performs as well (for me) as simply applying the SQLCLR function twice:

    SELECT

    dsk2.Item

    FROM #Strings

    CROSS APPLY dbo.SplitterB(strcol, ',') AS dsk1

    CROSS APPLY dbo.SplitterB(dsk1.Item, ';') AS dsk2;

    That returns results so quickly I didn't even bother coding up a CLR function that would accept an array of delimiters. No doubt that would be even faster. The same idea could be applied to the T-SQL function, I suppose, but the implementation and testing looks decidedly non-trivial to me. For anyone that needs the SplitterB code:

    CREATE ASSEMBLY [Split]

    FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C010300F7F34D500000000000000000E00002210B010800000E00000006000000000000AE2D0000002000000040000000004000002000000002000004000000000000000400000000000000008000000002000000000000030040850000100000100000000010000010000000000000100000000000000000000000602D00004B000000004000006803000000000000000000000000000000000000006000000C00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000B40D000000200000000E000000020000000000000000000000000000200000602E7273726300000068030000004000000004000000100000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000001400000000000000000000000000004000004200000000000000000000000000000000902D0000000000004800000002000500E42100007C0B00000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000A2026F1300000A2D0E026F1400000A0373040000062B0C168D140000011673040000068C030000022A000000133002001A0000000100001102A5040000020A031200280800000654041200280A000006512A1E02281500000A2AC202037D0100000402047D0300000402027B010000048E697D02000004027C05000004FE150400000202167D040000042A0013300500C400000002000011027B04000004027B020000043102162A027B040000040A2B56027B010000040693027B030000043342027C0500000425280800000617582809000006027C05000004027B01000004027B0400000406027B0400000459731700000A280B000006020617587D04000004172A0617580A06027B0200000432A1027C0500000425280800000617582809000006027C05000004027B01000004027B04000004027B02000004027B0400000459731700000A280B00000602027B0200000417587D04000004172A32027B050000048C040000022A1A731800000A7A1E027B060000042A2202037D060000042A1E027B070000042A2202037D070000042A000042534A4201000100000000000C00000076322E302E35303732370000000005006C000000CC030000237E0000380400009004000023537472696E677300000000C80800000800000023555300D0080000100000002347554944000000E00800009C02000023426C6F6200000000000000020000015717A2030902000000FA253300160000010000001900000004000000070000000B00000009000000010000001900000012000000020000000200000003000000050000000300000001000000020000000200000000000A00010000000000060052004B00060059004B000600760063000A00A3008E000A002D02120206007502560206009F028D020600B6028D020600D3028D020600F2028D0206000B038D02060024038D0206003F038D0206005A038D02060073035602060087038D020600C003A0030600E003A0030A000404120206002E044B00060033045602060049045602060054044B0006005B044B0006007304A003000000000100000000000100010001001000140000000500010001000B011000290000000900010004000B01100039000000090006000800210057012E0021005D01320021006401350001006E0132000100740138000100CE0132000100E80152005020000000009600AC000A0001007C20000000009600B60012000300A220000000008618BE001B000600AA20000000008318BE001F000600DC2000000000E101C40026000800AC2100000000E109F5002A000800B92100000000E1012C011B000800C021000000008308A20140000800C821000000008308AF0144000800D121000000008308BC0149000900D921000000008308C5014D000900000001000C02000002003F02000001004902020002004D02020003008202000001000C02000002003F0200000100870200000100870203000D001900EC002600190020012A00190051011B002900BE001B003100BE001B003900BE004D004100BE004D004900BE004D005100BE004D005900BE004D006100BE004D006900BE004D007100BE004D007900BE0070008100BE004D008900BE0044009100BE001B009900BE001B00210019042600210024040D020900BE001B00A900BE001702B900BE001D02C100BE001B00C900BE001B00200093007500240023005D002E0033002E022E0043003D022E008B007C022E004B0043022E0053002E022E0073003D022E003B003D022E00830073022E005B0052022E0063003D02C100CB002902E100CB0029020001CB0029022001CB0029024001CB0029026001CB00290212022502030001000400020000007B013C000000FE01550000000702590002000600030001000900050002000800050002000A00070001000B00070003000A00030003000C00050003000E00070004800000010000001D12E315000000000000FE03000002000000000000000000000001004200000000000200000000000000000000000100820000000000030002000400020000000000003C4D6F64756C653E0053706C69742E646C6C0055736572446566696E656446756E6374696F6E730053706C6974456E756D657261746F720053706C6974526F77006D73636F726C69620053797374656D004F626A6563740056616C7565547970650053797374656D2E436F6C6C656374696F6E730049456E756D657261746F720053797374656D2E446174610053797374656D2E446174612E53716C54797065730053716C43686172730053706C6974746572420046696C6C526F77002E63746F720053797374656D2E436F6C6C656374696F6E732E49456E756D657261746F722E4D6F76654E657874004D6F76654E6578740053797374656D2E436F6C6C656374696F6E732E49456E756D657261746F722E6765745F43757272656E74006765745F43757272656E740053797374656D2E436F6C6C656374696F6E732E49456E756D657261746F722E526573657400526573657400696E707574006C656E6774680064656C696D69746572007374617274007265636F72640053797374656D2E436F6C6C656374696F6E732E49456E756D657261746F722E43757272656E74006765745F53657175656E6365007365745F53657175656E6365006765745F4974656D007365745F4974656D003C53657175656E63653E6B5F5F4261636B696E674669656C64003C4974656D3E6B5F5F4261636B696E674669656C640053657175656E6365004974656D00496E707574004D6963726F736F66742E53716C5365727665722E5365727665720053716C46616365744174747269627574650044656C696D69746572006F626A0073657175656E63650053797374656D2E52756E74696D652E496E7465726F705365727669636573004F7574417474726962757465006974656D0076616C75650053797374656D2E5265666C656374696F6E00417373656D626C795469746C6541747472696275746500417373656D626C794465736372697074696F6E41747472696275746500417373656D626C79436F6E66696775726174696F6E41747472696275746500417373656D626C79436F6D70616E7941747472696275746500417373656D626C7950726F6475637441747472696275746500417373656D626C79436F7079726967687441747472696275746500417373656D626C7954726164656D61726B41747472696275746500417373656D626C7943756C7475726541747472696275746500436F6D56697369626C6541747472696275746500417373656D626C7956657273696F6E4174747269627574650053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300436F6D70696C6174696F6E52656C61786174696F6E734174747269627574650052756E74696D65436F6D7061746962696C6974794174747269627574650053706C69740053716C46756E6374696F6E417474726962757465006765745F49734E756C6C006765745F56616C75650043686172005374727563744C61796F7574417474726962757465004C61796F75744B696E6400537472696E67004E6F74496D706C656D656E746564457863657074696F6E00436F6D70696C657247656E6572617465644174747269627574650000000003200000000000769A3CD63DE4C742A197817A9DEE94EC0008B77A5C561934E089070002120D121103080003011C1008100E03200001062002011D0303032000020320001C03061D03020608020603030611100328001C0320000804200101080320000E042001010E02060E032800080328000E12010001005408074D617853697A65FFFFFFFF04200101028196010006005455794D6963726F736F66742E53716C5365727665722E5365727665722E446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D322E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038390A446174614163636573730000000054557F4D6963726F736F66742E53716C5365727665722E5365727665722E53797374656D446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D322E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038391053797374656D446174614163636573730000000054020F497344657465726D696E69737469630154020949735072656369736501540E1146696C6C526F774D6574686F644E616D650746696C6C526F77540E0F5461626C65446566696E6974696F6E2173657175656E636520494E542C206974656D204E564152434841522834303030290420001D030407011110052001011159072003011D0308080307010804010000000E01000953706C69747465724200000501000000000E0100094D6963726F736F667400002001001B436F7079726967687420C2A9204D6963726F736F6674203230313100000801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F77730100882D000000000000000000009E2D0000002000000000000000000000000000000000000000000000902D00000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF250020400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000018000080000000000000000000000000000001000100000030000080000000000000000000000000000001000000000048000000584000000C03000000000000000000000C0334000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE0000010000000100E3151D1200000100E3151D123F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B0046C020000010053007400720069006E006700460069006C00650049006E0066006F00000048020000010030003000300030003000340062003000000034000A00010043006F006D00700061006E0079004E0061006D006500000000004D006900630072006F0073006F006600740000003C000A000100460069006C0065004400650073006300720069007000740069006F006E0000000000530070006C0069007400740065007200420000003C000E000100460069006C006500560065007200730069006F006E000000000031002E0030002E0034003600330037002E003500360030003300000034000A00010049006E007400650072006E0061006C004E0061006D0065000000530070006C00690074002E0064006C006C0000005C001B0001004C006500670061006C0043006F007000790072006900670068007400000043006F0070007900720069006700680074002000A90020004D006900630072006F0073006F006600740020003200300031003100000000003C000A0001004F0072006900670069006E0061006C00460069006C0065006E0061006D0065000000530070006C00690074002E0064006C006C00000034000A000100500072006F0064007500630074004E0061006D00650000000000530070006C00690074007400650072004200000040000E000100500072006F006400750063007400560065007200730069006F006E00000031002E0030002E0034003600330037002E003500360030003300000044000E00010041007300730065006D0062006C0079002000560065007200730069006F006E00000031002E0030002E0034003600330037002E0035003600300033000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000B03D00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

    WITH PERMISSION_SET = SAFE;

    GO

    CREATE FUNCTION [dbo].[SplitterB]

    (@Input [nvarchar](max), @Delimiter [nchar](1))

    RETURNS TABLE

    (

    [sequence] [int] NULL,

    [item] [nvarchar](4000) NULL

    ) WITH EXECUTE AS CALLER

    AS EXTERNAL NAME [Split].[UserDefinedFunctions].[SplitterB];

  • Sean Lange (2/12/2013)


    ahpitre (2/12/2013)


    Great. Thanks for the prompt response. Do you think that the code could be modified to be recursive, in other words, if I send 1 delimiter, code within function runs once, if I send 2 delimiters, ir runs twice.

    What would be the performance penalty of recursion vs your suggestion of just simply replacing? If recursion takes more time and CPU cycles, then, could the function be modified to accespt the delimiters as part of an array, then, loop thru the array doring the replace portion, before doing the actual split?;-)

    The whole point of this function to not do any looping. Looping is what causes sql server to crawl like a snail.

    The other way instead of looping (no recursion required) is to use cascading CROSS APPLYs, as Paul has done in his example where he calls the CLR splitter twice.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Sean Lange (2/12/2013)


    Very interesting Dwain. I tried another idea to see how it would hold up. It seems that it is possible to use another temp table to hold the replaced values. I tried with your sample data and scaled up x10 up to a million rows and this approach seems to have a slight edge at all those sizes. Now if the table had more columns this is going to degrade as is scales but it is certainly interesting.

    CREATE TABLE #Strings

    (strcol VARCHAR(8000))

    ;WITH Tally (n) AS (

    SELECT TOP 10000 1 FROM sys.all_columns a, sys.all_columns b)

    INSERT INTO #Strings

    SELECT REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ';' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20) + ',' +

    REPLICATE(LEFT('abcdefghijklmnopqrsatuvwxyz', 1+ABS(CHECKSUM(NEWID()))%26),ABS(CHECKSUM(NEWID()))%20)

    FROM Tally

    DECLARE @BlackHole VARCHAR(8000)

    PRINT 'Sean''s new suggestion'

    SET STATISTICS TIME ON

    --create a new table using the replace logic

    select replace(strcol, ';', ',') as strcol

    into #NewStrings

    from #Strings

    select @BlackHole=Item

    from #NewStrings

    cross apply dbo.DelimitedSplit8k(strcol, ',')

    SET STATISTICS TIME OFF

    PRINT 'Dwain''s suggestion'

    SET STATISTICS TIME ON

    select @BlackHole=Item

    from #Strings

    CROSS APPLY (SELECT MyString=REPLACE(strcol COLLATE Latin1_General_BIN, ';', ',')) a

    cross apply dbo.DelimitedSplit8k(MyString, ',') b

    SET STATISTICS TIME OFF

    DROP TABLE #Strings

    DROP TABLE #NewStrings

    Sean - Very nice touch. Your temp table approach beats the cascading CROSS APPLYs and a couple of variants I tried it against, even without adding the COLLATE on REPLACE.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Paul White (2/12/2013)


    dwain.c (2/11/2013)


    I found that applying a built-in function to the string to be split in the DelimitedSplit8K function's call has adverse performance effects.

    If you look closely at the execution plans, you'll see the ones that do not perform well end up doing the REPLACE on a big string a *ridiculous* number of times. This is because the optimizer hardly costs scalar functions at all, so it does not care very much how many times they are executed so long as the result is correct.

    Physically separating the replace from the function call using Sean's method is a supported way to work around this limitation, though it does involve writing a copy of the whole input set. There are also *unreliable* tricks like the following, which may cause the replace to be applied only once:

    SELECT @BlackHole = dsk.Item

    FROM

    (

    SELECT

    strcol = REPLACE(strcol, ';', ',') + LEFT(NEWID(), 0)

    FROM #Strings

    ) AS s

    CROSS APPLY dbo.DelimitedSplit8K(s.strcol, ',') AS dsk;

    Paul - Thanks for the analysis. Sorry for the basic question but, can you show me where exactly in the execution plan you're seeing this? I'm not really very good at reading them but I want to improve.

    And yes, the CLR approach certainly rules the roost here. I was just trying to avoid causing the suggestion of doing the REPLACE inside the DelimitedSplit8K call from doing something unexepected, as like I said I'd seen this issue before.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (2/12/2013)


    Sorry for the basic question but, can you show me where exactly in the execution plan you're seeing this?

    Taking the following code as an example (which ran for 5m 33s on my SQL Server 2012 machine):

    select @BlackHole=Item

    from #Strings

    cross apply dbo.DelimitedSplit8k(replace(strcol, ';', ','), ',')

    The execution plan is:

    The Filter operator executes 1,000 times applying the following predicate to the 1,291,917 rows it receives:

    substring(replace([tempdb].[dbo].[#Strings].[strcol],';',','),CONVERT_IMPLICIT(int,[Expr1054],0),(1))=','

    So that particular REPLACE executes 1.3M times.

    The other references are in the Compute Scalars:

    Now there are some added complications regarding exactly when each defined expression gets evaluated and how many times, but that's enough to give you the flavour.

  • Thanks very much Paul for the detailed answer.

    I'll need to spend some time working through it to make sure that I understand it all.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • How do you use this function? Also, how can I pass an additional parameter, so it's always inserted into the new table? I have a column named Part. I want the table with the split to include Part (which is repeated for all substrings that are splitted from main string). My final output should be something like this :

    Input_table

    Part Specs

    ---------------------------------

    123 D-dfldkk; P-4987843; D48974587

    456 A-dfldkk; Z-4987843

    Output_table (created by Split function)

    Part Specs

    ---------------------------------

    123 D-dfldkk

    123 P-4987843

    123 D48974587

    456 A-dfldkk

    456 Z-4987843

Viewing 15 posts - 496 through 510 (of 990 total)

You must be logged in to reply to this topic. Login to reply