May 24, 2012 at 11:43 pm
Jeff Moden (5/24/2012)
dwain.c (5/24/2012)
Jeff - I've been a fan of this splitter for a long time even though I've been a slow adopter, 🙂 recommending it highly around my office at any opportunity.I just went to use it today and realized that it only supports 1 character delimiter.
I'm sure I could modify it to handle a longer delimiter but I'm afraid if I bumble around and make those modifications I'm going to do something nasty to its performance.
Any suggestions on how to do this and not impact the performance to any significant degree?
Why does it need to handle more than a single character delimiter? Replace multiple character delimiters with 1 before you pass it to the splitter. Then, beat the tar out of the person that designed the data with a multi-character delimiter. 😉
Rats! Now why didn't I think of that?
Just goes to show, the right approach was to ask your advice first.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 20, 2012 at 5:37 am
Naomi N (5/3/2011)
Jeff,That was one of the best articles I read recently. Terrific job!
BTW, is there a way to contact you privately somehow?
Thanks again.
My apologies. I missed this response.
Thank you very much for the feedback. You could always initiate "private" message through this site's PM messaging system.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 20, 2012 at 7:43 am
Jeff Moden (5/24/2012)
dwain.c (5/24/2012)
Jeff - I've been a fan of this splitter for a long time even though I've been a slow adopter, 🙂 recommending it highly around my office at any opportunity.I just went to use it today and realized that it only supports 1 character delimiter.
I'm sure I could modify it to handle a longer delimiter but I'm afraid if I bumble around and make those modifications I'm going to do something nasty to its performance.
Any suggestions on how to do this and not impact the performance to any significant degree?
Why does it need to handle more than a single character delimiter? Replace multiple character delimiters with 1 before you pass it to the splitter. Then, beat the tar out of the person that designed the data with a multi-character delimiter. 😉
I too have struggled with this occasionally. I use this splitter in a lot of very unconventional ways. For example say I want to find all anchor tags in some html. I can't split on either < or a but being able to split on <a would be highly helpful. I have used your idea of replacing with some other character but this in itself can be somewhat challenging to find a suitable character because you have to scour the data first to see if the character to use as a replacement is nowhere else. hmmm thanks Dwain I think now I may have to make another version of this splitter that allows for varchar as the delimiter. Seems like performance may take a hit but it would be a great addition to the toolbox.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
June 20, 2012 at 9:18 am
Sean Lange (6/20/2012)
I too have struggled with this occasionally. I use this splitter in a lot of very unconventional ways. For example say I want to find all anchor tags in some html. I can't split on either < or a but being able to split on <a would be highly helpful. I have used your idea of replacing with some other character but this in itself can be somewhat challenging to find a suitable character because you have to scour the data first to see if the character to use as a replacement is nowhere else. hmmm thanks Dwain I think now I may have to make another version of this splitter that allows for varchar as the delimiter. Seems like performance may take a hit but it would be a great addition to the toolbox.
For finding a suitable single character value, look outside the printable values. In particular, the ASCII unit and record separators make excellent choices these days. They're almost never used, and they're explicitly defined for the purpose.
Unit separator:
SELECT * FROM YourSplitterName(CHAR(31) + 'One' + CHAR(31) + 'Two' + CHAR(31) + 'Three', CHAR(31))
Record separator:
SELECT * FROM YourSplitterName(CHAR(30) + 'One' + CHAR(30) + 'Two' + CHAR(30) + 'Three', CHAR(30))
June 20, 2012 at 9:30 am
Nadrek (6/20/2012)
Sean Lange (6/20/2012)
I too have struggled with this occasionally. I use this splitter in a lot of very unconventional ways. For example say I want to find all anchor tags in some html. I can't split on either < or a but being able to split on <a would be highly helpful. I have used your idea of replacing with some other character but this in itself can be somewhat challenging to find a suitable character because you have to scour the data first to see if the character to use as a replacement is nowhere else. hmmm thanks Dwain I think now I may have to make another version of this splitter that allows for varchar as the delimiter. Seems like performance may take a hit but it would be a great addition to the toolbox.
For finding a suitable single character value, look outside the printable values. In particular, the ASCII unit and record separators make excellent choices these days. They're almost never used, and they're explicitly defined for the purpose.
Unit separator:
SELECT * FROM YourSplitterName(CHAR(31) + 'One' + CHAR(31) + 'Two' + CHAR(31) + 'Three', CHAR(31))
Record separator:
SELECT * FROM YourSplitterName(CHAR(30) + 'One' + CHAR(30) + 'Two' + CHAR(30) + 'Three', CHAR(30))
/facepalm
That is a great idea. Sometimes the easiest solution is right in front you and you just can't see it. Thanks!!
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
June 20, 2012 at 2:14 pm
Nadrek (6/20/2012)
Sean Lange (6/20/2012)
I too have struggled with this occasionally. I use this splitter in a lot of very unconventional ways. For example say I want to find all anchor tags in some html. I can't split on either < or a but being able to split on <a would be highly helpful. I have used your idea of replacing with some other character but this in itself can be somewhat challenging to find a suitable character because you have to scour the data first to see if the character to use as a replacement is nowhere else. hmmm thanks Dwain I think now I may have to make another version of this splitter that allows for varchar as the delimiter. Seems like performance may take a hit but it would be a great addition to the toolbox.
For finding a suitable single character value, look outside the printable values. In particular, the ASCII unit and record separators make excellent choices these days. They're almost never used, and they're explicitly defined for the purpose.
Unit separator:
SELECT * FROM YourSplitterName(CHAR(31) + 'One' + CHAR(31) + 'Two' + CHAR(31) + 'Three', CHAR(31))
Record separator:
SELECT * FROM YourSplitterName(CHAR(30) + 'One' + CHAR(30) + 'Two' + CHAR(30) + 'Three', CHAR(30))
+1 on this. We do the same thing with needing to parse CSV params with possible quoted fields.
/* Anything is possible but is it worth it? */
June 20, 2012 at 6:13 pm
"Back in the old days"... the ASCII set of values has some pretty neat stuff in it. We didn't use to try to make things readable by human on things like paper tape and we sure didn't try to make them readable on mag tape. Instead of tabs, commas, CrLf combinations, and all sorts of "tag" markers, we'd use ASCII characters 28 through 31 and they worked great especially for transmitting things that looked like tables.
28 = File Separator (think "table" here)
29 = Group Separator (we used this to group hierarchical information)
30 = Record Separator (think table "row" here)
31 = Unit Separator (think table "column" here)
What that meant is that you could also pass "documents" that had embedded tabs, commas, and CrLfs very easily and without having to worry about using double quotes within CSV data because it wasn't CSV data. It was SSV (Separator Separated Values) or "true ASCII transmission file".
It would still work great except that humans are annoyed by little square boxes they don't actually know what to do with when they're trying to read data that was meant for loading into a computer. People laugh at this but would you ever actually try to read the old style of Excel data by eye? Heck no!
Throw in the Esc character (27) and you have the ability to easily replace things like XML (really bloated, IMHO), JSON, and many other formats that you need special code to import data with. With just a little imagination, you might even be able to make it handle Unicode.
Since the "Bell" character (7) isn't used anymore (it actually caused a physical bell to ring on Teletypes and some monitors and printers), I'll sometimes use that as a "special" delimiter.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 20, 2012 at 7:02 pm
Jeff,
Ctrl+G all the way! I used to do that when character 7 did make the computer beep. 🙂
Erik
June 21, 2012 at 2:51 am
dwain.c (5/24/2012)
I just went to use it today and realized that it only supports 1 character delimiter.
For anyone interested, here's a multi-character SQLCLR splitter based on Adam Machanic's code:
CREATE ASSEMBLY [MultiSplit]
FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C010300E3DEE24F0000000000000000E00002210B010800000E00000006000000000000FE2C0000002000000040000000004000002000000002000004000000000000000400000000000000008000000002000000000000030040850000100000100000000010000010000000000000100000000000000000000000A42C000057000000004000009003000000000000000000000000000000000000006000000C00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000040D000000200000000E000000020000000000000000000000000000200000602E7273726300000090030000004000000004000000100000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000001400000000000000000000000000004000004200000000000000000000000000000000E02C000000000000480000000200050054220000500A00000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000D2026F1000000A2D1A036F1000000A2D12026F1100000A036F1100000A73050000062A168D15000001168D1500000173050000062A000000133002002A000000010000110274030000020A03067B01000004281200000A810400000104067B02000004281300000A81050000012A1E02281400000A2A5602281400000A02037D0100000402047D020000042A03300300530000000000000002281400000A02037D0600000402038E697D0800000402047D0700000402048E69D27D0900000402027B0900000417FE017D0A00000402167D0500000402167D0300000402027B09000004155A7D040000042AAA027B05000004027B06000004027B03000004027B04000004027B0300000459731500000A73040000062A000013300300C90000000200001102257B0500000417587D05000004027B04000004027B080000043202162A02027B04000004027B09000004587D03000004027B030000040A2B65170B027B0A0000042C16027B060000040693027B0700000416932E39160B2B35160C2B28060858027B080000042F14027B0600000406085893027B0700000408932E04160B2B0E081758D20C08027B0900000432CF072C0902067D04000004172A0617580A06027B08000004329202027B04000004027B09000004587D0300000402027B080000047D04000004172A5A02167D0300000402027B09000004155A7D040000042A42534A4201000100000000000C00000076322E302E35303732370000000005006C0000002C030000237E000098030000A803000023537472696E6773000000004007000008000000235553004807000010000000234755494400000058070000F802000023426C6F6200000000000000020000015717A2010902000000FA2533001600000100000016000000040000000A000000080000000900000001000000150000000D0000000200000001000000010000000100000001000000020000000200000000000A0001000000000006005C0055000600760063000A00A3008E000A00BE008E000A00C7008E000A00840169010600C301A4010600FA01E80106001102E80106002E02E80106004D02E80106006602E80106007F02E80106009A02E8010600B502E8010600CE02A4010600E202E80106001B03FB0206003B03FB020A006403690106008E03550006009F03550000000000010000000000010001000100100019000000050001000100030110002E000000050001000400020110003B0000000500030005002300E50022002300EE00250001000E0122000100160122000100E500220021001E013E00210028013E0021003201220021003C0142002100490145005020000000009600AC000A0001008820000000009600D10013000300BE20000000008618DF001E000600C620000000008618DF0028000600DC20000000008618DF002E0008003B2100000000E609F30036000A00682100000000E601FF003A000A003D2200000000E60108011E000A0000000100630100000200960100000100A00102000200E50002000300EE0000000100D00100000200D90100000100DE01000002009601040009003100DF001E003900DF001E004100DF00B0004900DF00B0005100DF00B0005900DF00B0006100DF00B0006900DF00B0007100DF00B0007900DF00B0008100DF00B5008900DF00B0009100DF00BA009900DF001E00A100DF001E00190079033A00190084036302210093036802290093036E020900DF001E00B100DF00790220007B00BF0024000B004C002E0033009D022E001B0087022E00230097022E002B0097022E004B0097022E007300D6022E004300AC022E003B0087022E005B0097022E006B00CD0244000B007E00740281020400010000005B0148000200060003000480000001000000CB11D9980000000000005903000002000000000000000000000001004C0000000000020000000000000000000000010082000000000003000200040002000000003C4D6F64756C653E004D756C746953706C69742E646C6C0055736572446566696E656446756E6374696F6E73004F75747075745265636F72640053706C6974537472696E674D756C7469006D73636F726C69620053797374656D004F626A6563740053797374656D2E436F6C6C656374696F6E730049456E756D657261746F720053797374656D2E446174610053797374656D2E446174612E53716C54797065730053716C43686172730053706C6974537472696E675F4D756C74690053716C496E7433320053716C537472696E670046696C6C526F775F4D756C7469002E63746F720073657175656E6365006974656D006765745F43757272656E74004D6F76654E657874005265736574006C617374506F73006E657874506F7300746865537472696E670064656C696D6974657200737472696E674C656E0064656C696D697465724C656E00697353696E676C654368617244656C696D0043757272656E7400496E707574004D6963726F736F66742E53716C5365727665722E5365727665720053716C46616365744174747269627574650044656C696D69746572006F626A0053797374656D2E52756E74696D652E496E7465726F705365727669636573004F75744174747269627574650053657175656E6365004974656D00546865537472696E670053797374656D2E5265666C656374696F6E00417373656D626C795469746C6541747472696275746500417373656D626C794465736372697074696F6E41747472696275746500417373656D626C79436F6E66696775726174696F6E41747472696275746500417373656D626C79436F6D70616E7941747472696275746500417373656D626C7950726F6475637441747472696275746500417373656D626C79436F7079726967687441747472696275746500417373656D626C7954726164656D61726B41747472696275746500417373656D626C7943756C7475726541747472696275746500436F6D56697369626C6541747472696275746500417373656D626C7956657273696F6E4174747269627574650053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300436F6D70696C6174696F6E52656C61786174696F6E734174747269627574650052756E74696D65436F6D7061746962696C697479417474726962757465004D756C746953706C69740053716C46756E6374696F6E417474726962757465006765745F49734E756C6C006765745F56616C75650043686172006F705F496D706C6963697400537472696E670000000003200000000000A3AEF14706753B429B464E1C5D2862700008B77A5C561934E0890800021209120D120D0A0003011C1011111011150320000102060802060E05200201080E072002011D031D030320001C0320000203061D030206050206020328001C31010003005408074D617853697A65FFFFFFFF54020D497346697865644C656E6774680054020A49734E756C6C61626C650031010003005408074D617853697A65FF00000054020D497346697865644C656E6774680054020A49734E756C6C61626C6500042001010E0420010102042001010881A201000600540E1146696C6C526F774D6574686F644E616D650D46696C6C526F775F4D756C74695455794D6963726F736F66742E53716C5365727665722E5365727665722E446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D322E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038390A446174614163636573730000000054557F4D6963726F736F66742E53716C5365727665722E5365727665722E53797374656D446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D322E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038391053797374656D446174614163636573730000000054020F497344657465726D696E69737469630154020949735072656369736501540E0F5461626C65446566696E6974696F6E274974656D4E756D62657220696E74656765722C204974656D206E766172636861722834303030290420001D0305000111110805000111150E040701120C072003011D0308080507030802050F01000A4D756C746953706C697400000501000000000E0100094D6963726F736F667400002001001B436F7079726967687420C2A9204D6963726F736F6674203230313200000801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F777301000000CC2C00000000000000000000EE2C0000002000000000000000000000000000000000000000000000E02C00000000000000000000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF250020400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100100000001800008000000000000000000000000000000100010000003000008000000000000000000000000000000100000000004800000058400000380300000000000000000000380334000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE0000010000000100D998CB1100000100D998CB113F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B00498020000010053007400720069006E006700460069006C00650049006E0066006F00000074020000010030003000300030003000340062003000000034000A00010043006F006D00700061006E0079004E0061006D006500000000004D006900630072006F0073006F0066007400000040000B000100460069006C0065004400650073006300720069007000740069006F006E00000000004D0075006C0074006900530070006C00690074000000000040000F000100460069006C006500560065007200730069006F006E000000000031002E0030002E0034003500350035002E00330039003100320039000000000040000F00010049006E007400650072006E0061006C004E0061006D00650000004D0075006C0074006900530070006C00690074002E0064006C006C00000000005C001B0001004C006500670061006C0043006F007000790072006900670068007400000043006F0070007900720069006700680074002000A90020004D006900630072006F0073006F0066007400200032003000310032000000000048000F0001004F0072006900670069006E0061006C00460069006C0065006E0061006D00650000004D0075006C0074006900530070006C00690074002E0064006C006C000000000038000B000100500072006F0064007500630074004E0061006D006500000000004D0075006C0074006900530070006C00690074000000000044000F000100500072006F006400750063007400560065007200730069006F006E00000031002E0030002E0034003500350035002E00330039003100320039000000000048000F00010041007300730065006D0062006C0079002000560065007200730069006F006E00000031002E0030002E0034003500350035002E00330039003100320039000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000003D00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
WITH PERMISSION_SET = SAFE;
GO
CREATE FUNCTION dbo.SplitString_Multi
(
@Input nvarchar(max),
@Delimiter nvarchar(255)
)
RETURNS TABLE
(
ItemNumber integer NULL,
Item nvarchar(4000) NULL
)
-- ORDER (ItemNumber) -- SQL Server 2008 onward
AS EXTERNAL NAME MultiSplit.UserDefinedFunctions.SplitString_Multi;
Example usage:
SELECT
ssm.ItemNumber,
ssm.Item
FROM dbo.SplitString_Multi (N'First-*Second-*Third', N'-*') AS ssm;
Source code:
using System.Collections;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
[SqlFunction
(
FillRowMethodName = "FillRow_Multi",
DataAccess = DataAccessKind.None,
SystemDataAccess = SystemDataAccessKind.None,
IsDeterministic = true,
IsPrecise = true,
TableDefinition = "ItemNumber integer, Item nvarchar(4000)"
)
]
public static IEnumerator SplitString_Multi
(
[SqlFacet(MaxSize = -1, IsFixedLength = false, IsNullable = false)]
SqlChars Input,
[SqlFacet(MaxSize = 255, IsFixedLength = false, IsNullable = false)]
SqlChars Delimiter
)
{
return
(
(Input.IsNull || Delimiter.IsNull) ?
new SplitStringMulti(new char[0], new char[0]) :
new SplitStringMulti(Input.Value, Delimiter.Value));
}
private sealed class OutputRecord
{
internal readonly int sequence;
internal readonly string item;
public OutputRecord(int Sequence, string Item)
{
this.sequence = Sequence;
this.item = Item;
}
}
public static void FillRow_Multi(object obj, out SqlInt32 sequence, out SqlString item)
{
OutputRecord r = (OutputRecord)obj;
sequence = r.sequence;
item = r.item;
}
public sealed class SplitStringMulti : IEnumerator
{
public SplitStringMulti(char[] TheString, char[] Delimiter)
{
theString = TheString;
stringLen = TheString.Length;
delimiter = Delimiter;
delimiterLen = (byte)(Delimiter.Length);
isSingleCharDelim = (delimiterLen == 1);
sequence = 0;
lastPos = 0;
nextPos = delimiterLen * -1;
}
#region IEnumerator Members
public object Current
{
get
{
return new OutputRecord(sequence, new string(theString, lastPos, nextPos - lastPos));
}
}
public bool MoveNext()
{
sequence++;
if (nextPos >= stringLen)
return false;
else
{
lastPos = nextPos + delimiterLen;
for (int i = lastPos; i < stringLen; i++)
{
bool matches = true;
if (isSingleCharDelim)
{
if (theString != delimiter[0])
matches = false;
}
else
{
for (byte j = 0; j < delimiterLen; j++)
{
if (((i + j) >= stringLen) || (theString != delimiter[j]))
{
matches = false;
break;
}
}
}
if (matches)
{
nextPos = i;
return true;
}
}
lastPos = nextPos + delimiterLen;
nextPos = stringLen;
return true;
}
}
public void Reset()
{
lastPos = 0;
nextPos = delimiterLen * -1;
}
#endregion
private int lastPos;
private int nextPos;
private int sequence;
private readonly char[] theString;
private readonly char[] delimiter;
private readonly int stringLen;
private readonly byte delimiterLen;
private readonly bool isSingleCharDelim;
}
};
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 4, 2012 at 5:13 am
Messing around with some code yesterday got me thinking. DS8K throws away all the tally table rows where substring(string,n,1) isn't a delimiter. The discarded rows contain information - the distance to the next delimiter. If you could count them, you could do away with the CHARINDEX used to find the length of an item. Counting the rows in between delimiters using the DS8K code block wouldn't work, it would introduce too much cost - it would be an islands'n'gaps analysis.
But what about CHARINDEX?
DECLARE @OneRowOfData VARCHAR(8000), @pDelimiter VARCHAR(1)
-- delimiters at 4, 13, 20, 41, 50, 54, end + 1 = 91
SET @OneRowOfData = '255.a55xxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.255.255000000000000000000000000000000000'
SET @pDelimiter = '.'
;WITH Tally AS (
SELECT TOP (1+DATALENGTH(@OneRowOfData))
n = ROW_NUMBER() OVER(ORDER BY @@SPID)
FROM sys.columns a, sys.columns b
)
SELECT
n,
NextDelimiter = CHARINDEX('.',@OneRowOfData+'.',n)
FROM Tally
This generates output like the following:
nNextDelimiter
14
24
34
44
513
613
..
1313
1420
1520
Grouping the output over [NextDelimiter] yields the next delimiter position, and the length of the preceeding segment:
;WITH Tally AS (
SELECT TOP (1+DATALENGTH(@OneRowOfData))
n = ROW_NUMBER() OVER(ORDER BY @@SPID)
FROM sys.columns a, sys.columns b
)
SELECT [WordLength] = COUNT(*)-1, NextDelimiter
FROM (
SELECT
n,
NextDelimiter = CHARINDEX('.',@OneRowOfData+'.',n)
FROM Tally
) d
GROUP BY NextDelimiter
Output:
WordLengthNextDelimiter
34
813
620
2041
850
354
3691
- which is all the information you need to put together a splitter. Note that to record a value other than 0 for the last segment of the input string it has to be terminated with a delimiter, and the number of rows collected from the tally has to be adjusted to account for this.
Putting it all together with the code to provide a list sequence number yields the following:
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT TOP (1+DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY p),
Item = SUBSTRING(@pString ,p-(COUNT(*)-1), (COUNT(*)-1))
FROM cteTally
OUTER APPLY (SELECT CHARINDEX(@pDelimiter,@pString+@pDelimiter,n)) x (p)
GROUP BY p
- where I've used the inline tally cte from DS8K because folks are familiar with it.
So how does it perform? Plugged into the test harness used for this article, it works well for small numbers but horribly when the number of segments rises beyond five or six. Out of curiosity, I tested it against a mimic of some real data (10 rows multiplied out 10,000 times) and the results were surprising:
APPLY splitter ======================================================
Table 'Worktable'. Scan count 10, logical reads 200121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Temp_______________________________________________________________________________________________________________000000000044'. Scan count 1, logical reads 1360, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 998 ms, elapsed time = 6939 ms.
DelimitedSplit8K ====================================================
Table 'Worktable'. Scan count 10, logical reads 200121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Temp_______________________________________________________________________________________________________________000000000044'. Scan count 1, logical reads 1360, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3963 ms, elapsed time = 6889 ms.
Heh, interesting. So what happens if you eliminate some of the io (stream the results into a @BlackHole)?
APPLY splitter ======================================================
Table '#Temp_______________________________________________________________________________________________________________000000000044'. Scan count 3, logical reads 1360, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 10, logical reads 200122, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1420 ms, elapsed time = 1343 ms.
DelimitedSplit8K ====================================================
Table 'Worktable'. Scan count 10, logical reads 200121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Temp_______________________________________________________________________________________________________________000000000044'. Scan count 1, logical reads 1360, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3900 ms, elapsed time = 4099 ms.
It's been fun to play with. Make of this what you will - but if you choose to use it, be sure to test it rigorously first.
Here's the test harness:
SET NOCOUNT ON
DECLARE @OneRowOfData VARCHAR(8000), @pDelimiter VARCHAR(1)
SET @pDelimiter = '.'
/*
-- 100,000 rows of quick and dirty data: 10 rows repeated 10000 times.
DROP TABLE #Temp
;WITH Tally AS (
SELECT n = n1+n2+n3+n4+1
FROM (VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t2(n2)
CROSS APPLY (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t3(n3)
CROSS APPLY (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n1)
CROSS APPLY (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4(n4)
)
SELECT OneRowOfData
INTO #Temp
FROM Tally
CROSS JOIN (
SELECT OneRowOfData = '255.a55xxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.251.2550000000000000000A00000000000000000' UNION ALL
SELECT '2DDD55.a55xxDxxA.bxxB.cxxxxxxxxxxxxxC.dxxxxD.252.25500000000000B000000000000' UNION ALL
SELECT '2FFF55.a55xxDDDDDDDDDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.253.255000000000000000000C000000000000000' UNION ALL
SELECT '25SS5.a55xxDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.254.255000000000000000D000000000000000000' UNION ALL
SELECT '25x.a55xxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.255.25500000000000000000E0000000000000000' UNION ALL
SELECT '2755.a55xxDDDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.256.25500000000000000000000F0000000000000' UNION ALL
SELECT '2BBBBBBBBB55.a55xxDDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.257.25500000000000000G0000000000000000000' UNION ALL
SELECT '25EEE5.a55xDDDDDDDxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.258.25500000000000000000H0000000000000000' UNION ALL
SELECT '2WW55.a55xxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.259.25500000000000000000I0000000000000000' UNION ALL
SELECT '25LLLLLLLLLLLLLL5.a55xxDDDDDDDDDDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.260.25500000000000000000J0000000000000000'
) d
*/
DECLARE @BlackHole VARCHAR(10)
PRINT 'APPLY splitter ======================================================'
SET STATISTICS IO,TIME ON
SELECT @BlackHole = Item
FROM (
SELECT x.*
FROM #Temp t
CROSS APPLY [dbo].[DelimitedSplitCA] (t.OneRowOfData, @pDelimiter) x
) t
SET STATISTICS IO,TIME OFF
PRINT 'DelimitedSplit8K ===================================================='
SET STATISTICS IO,TIME ON
SELECT @BlackHole = Item
FROM (
SELECT x.*
FROM #Temp t
CROSS APPLY [dbo].[DelimitedSplit8K] (t.OneRowOfData, @pDelimiter) x
) t
SET STATISTICS IO,TIME OFF
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 4, 2012 at 9:18 am
Chris,
Which version of DS8K did you use? The one from the article or the updated one in the attachments?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 4, 2012 at 9:21 am
Hi Jeff
This one, which I'm pretty sure is the most recent:
ALTER FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
FROM cteStart s
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 4, 2012 at 9:46 am
Jeff Moden (7/4/2012)
Chris,Which version of DS8K did you use? The one from the article or the updated one in the attachments?
It was an old version :blush: the updated one is significantly more efficient.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 4, 2012 at 1:47 pm
Jeff Moden (7/4/2012)
Chris,Which version of DS8K did you use? The one from the article or the updated one in the attachments?
Update the dratted article already! 😎 This happened to me too. :crazy:
July 4, 2012 at 8:38 pm
+1 to Chris for trying!
I guess now I'll need to check whether I'm using the right version of DelimitedSplit8K!
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 15 posts - 331 through 345 (of 990 total)
You must be logged in to reply to this topic. Login to reply