December 9, 2018 at 9:44 pm
Comments posted to this topic are about the item An Overview of STRING_SPLIT()
December 9, 2018 at 10:46 pm
Nice article Steve
Would be interested in seeing how the new function stacks up against the "custom" split functions that every DBA seems to have a few versions of.
Cheers
Phil
--------------------
Colt 45 - the original point and click interface
December 9, 2018 at 11:36 pm
Thanks for the write-up Steve.
😎
One thing, there are performance references but no mention of the fastest non CLR string splitting function (on SQL Server 2012 and later).
December 10, 2018 at 7:03 am
Eirikur Eiriksson - Sunday, December 9, 2018 11:36 PMThanks for the write-up Steve.
😎
One thing, there are performance references but no mention of the fastest non CLR string splitting function (on SQL Server 2012 and later).
Added a link
December 10, 2018 at 7:03 am
philcart - Sunday, December 9, 2018 10:46 PMNice article SteveWould be interested in seeing how the new function stacks up against the "custom" split functions that every DBA seems to have a few versions of.
Cheers
Phil
If you look at the reference to Aaron Bertrand's article, he rates performance in different ways. STRING_SPLIT() stacks up nicely.
December 10, 2018 at 8:24 am
Steve Jones - SSC Editor - Monday, December 10, 2018 7:03 AMphilcart - Sunday, December 9, 2018 10:46 PMNice article SteveWould be interested in seeing how the new function stacks up against the "custom" split functions that every DBA seems to have a few versions of.
Cheers
PhilIf you look at the reference to Aaron Bertrand's article, he rates performance in different ways. STRING_SPLIT() stacks up nicely.
If you look at Aaron Bertrand's article, he comes to the conclusion that the XML method for splitting is as good as the DemilitedSplit8K method. Nothing could be further from the truth. The problem is with the "Devils in the Data" (I've gotta finish the articles on that subject for you soon). He creates "grooved" test data that has a cardinality of "1". That seriously skews the test results because the XML method can take advantage of such low cardinalities for a large set. But, if you have a table of, say, 10,000 rows with a cardinality of 10,000, the XML splitter method becomes a huge performance issue.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 10, 2018 at 8:42 am
Thank you for sharing good news.
From other side I used to use KNIME free analytical platform for data manipulation
and most of the time it is necessary to add columns with the values than lines.
December 10, 2018 at 8:44 am
Jeff Moden - Monday, December 10, 2018 8:24 AMIf you look at Aaron Bertrand's article, he comes to the conclusion that the XML method for splitting is as good as the DemilitedSplit8K method. Nothing could be further from the truth. The problem is with the "Devils in the Data" (I've gotta finish the articles on that subject for you soon). He creates "grooved" test data that has a cardinality of "1". That seriously skews the test results because the XML method can take advantage of such low cardinalities for a large set. But, if you have a table of, say, 10,000 rows with a cardinality of 10,000, the XML splitter method becomes a huge performance issue.
+100
😎
December 10, 2018 at 9:05 am
Steve Jones - SSC Editor - Monday, December 10, 2018 7:03 AMEirikur Eiriksson - Sunday, December 9, 2018 11:36 PMThanks for the write-up Steve.
😎
One thing, there are performance references but no mention of the fastest non CLR string splitting function (on SQL Server 2012 and later).Added a link
Thanks Steve, don't think there are too many (poor buggers) on 2008 or earlier,
😎
December 10, 2018 at 9:09 am
Jeff Moden - Monday, December 10, 2018 8:24 AMIf you look at Aaron Bertrand's article, he comes to the conclusion that the XML method for splitting is as good as the DemilitedSplit8K method. Nothing could be further from the truth. The problem is with the "Devils in the Data" (I've gotta finish the articles on that subject for you soon). He creates "grooved" test data that has a cardinality of "1". That seriously skews the test results because the XML method can take advantage of such low cardinalities for a large set. But, if you have a table of, say, 10,000 rows with a cardinality of 10,000, the XML splitter method becomes a huge performance issue.
AB's article isn't bad but it's incomplete and hasn't been appropriately updated.
😎
One thing AB missed entirely is the efficiency of fixed width parameter strings, blows the chops of any other methods by a considerable margin (must finish that article)
December 10, 2018 at 9:40 am
Eirikur Eiriksson - Monday, December 10, 2018 9:09 AMAB's article isn't bad but it's incomplete and hasn't been appropriately updated.
😎
One thing AB missed entirely is the efficiency of fixed width parameter strings, blows the chops of any other methods by a considerable margin (must finish that article)
Sorry but have to disagree there. I did try to coach Aaron publically and privately and, instead, he simply stated he wasn't going to do any additional testing and closed the article to any further comment. Anyone reading it can easily come to the same wrong conclusion that he did because of the "Holy Grail" nature (an article that contains code to test functionality and performance as well as having charts that "prove" the findings, which are horribly wrong in that case) of his article and people end up drinking from the wrong cup and "dying" when the deploy such poor methods to a production environment. even worse, his article didn't take into account any special characters and the XML code he published causes a failure there..
--Jeff Moden
Change is inevitable... Change for the better is not.
December 10, 2018 at 12:15 pm
Eirikur Eiriksson - Monday, December 10, 2018 9:09 AMJeff Moden - Monday, December 10, 2018 8:24 AMIf you look at Aaron Bertrand's article, he comes to the conclusion that the XML method for splitting is as good as the DemilitedSplit8K method. Nothing could be further from the truth. The problem is with the "Devils in the Data" (I've gotta finish the articles on that subject for you soon). He creates "grooved" test data that has a cardinality of "1". That seriously skews the test results because the XML method can take advantage of such low cardinalities for a large set. But, if you have a table of, say, 10,000 rows with a cardinality of 10,000, the XML splitter method becomes a huge performance issue.
AB's article isn't bad but it's incomplete and hasn't been appropriately updated.
😎
One thing AB missed entirely is the efficiency of fixed width parameter strings, blows the chops of any other methods by a considerable margin (must finish that article)
If you consider the conclusions reached because of the incorrect test data used, it's bad.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 10, 2018 at 5:08 pm
Eirikur Eiriksson - Monday, December 10, 2018 9:05 AMSteve Jones - SSC Editor - Monday, December 10, 2018 7:03 AMEirikur Eiriksson - Sunday, December 9, 2018 11:36 PMThanks for the write-up Steve.
😎
One thing, there are performance references but no mention of the fastest non CLR string splitting function (on SQL Server 2012 and later).Added a link
Thanks Steve, don't think there are too many (poor buggers) on 2008 or earlier,
😎
Our shop is still one of the few "poor buggers" on 2008 R2. (Sigh.) Working on management for a while now...
Just sharing. 😉 This has been a really helpful article and discussion. Thank you all. I'm thinking positively that we will upgrade soon, and I will be able to start using all the new, helpful functionality.
December 10, 2018 at 10:39 pm
Here's an idea that I've been kicking around for a little while now but haven't had the chance to torture test yet...
It's one of those "not guaranteed to work every time... but... seems to work every time" kind of deals.
The idea is simple. Just encapsulate ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) with the STRING_SPLIT function so that THE ROW_NUMBER() is able to register the rows as they are coming out of the STRING_SPLIT before any other part of the query can affect the ordering..
Like I said, I haven't had a chance to really go nuts trying to break it, but it has stood up the few things I have tried. I figured I'd throw it out and see if anyone wants to take a crack at proving that an idea that shouldn't, doesn't work... (because so far it does)
The original idea...SET QUOTED_IDENTIFIER ON;
GO
SET ANSI_NULLS ON;
GO
ALTER FUNCTION dbo.SPLIT_STRING_RNv1
/* ===============================================================================
12/10/2018 JL, Created: Adds a ROW_NUMBER() to the built in STRING_SPLIT function.
IMPORTANT! Fo STRING_SPLIT BOL: "The order is not guaranteed to match the
order of the substrings in the input string."
Just becuase this trick works "most of the time" it is not guaranteed to
work every time. DO NOT use in a business critical situation where the
original order is critical.
=============================================================================== */
--===== Define I/O parameters
(
@string NVARCHAR(MAX),
@separator NCHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT
rn = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
ss.[Value]
FROM
STRING_SPLIT(@string, @separator) ss;
GO
This one was more of an odd-nut idea that popped into my head after recalling that using the TOP w/ ORDER BY forces an internal sort to handle the TOP(N) criteria, even when N is large enough to handle any result set. I have absolutely no idea if it's doing anything at all... I haven't been able to get the 1st one to screw up to see if this one does any better...SET QUOTED_IDENTIFIER ON;
GO
SET ANSI_NULLS ON;
GO
ALTER FUNCTION dbo.SPLIT_STRING_RNv2
/* ===============================================================================
12/10/2018 JL, Created: Adds a ROW_NUMBER() to the built in STRING_SPLIT function.
This is an alternate version that adds a TOP & ORDER BY to the query
in order to see if it behaves differently.
IMPORTANT! Fo STRING_SPLIT BOL: "The order is not guaranteed to match the
order of the substrings in the input string."
Just becuase this trick works "most of the time" it is not guaranteed to
work every time. DO NOT use in a business critical situation where the
original order is critical.
=============================================================================== */
--===== Define I/O parameters
(
@string NVARCHAR(MAX),
@separator NCHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT TOP (2147483647)
rn = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
ss.[Value]
FROM
STRING_SPLIT(@string, @separator) ss
ORDER BY
rn;
GO
This 3rd option is based on good idea from Aaron Bertrand. It's guaranteed to work as long as your input string is composed of unique values and you're willing to eat the resulting sort operation.
It works by using CHARINDEX to find the Values position on the original input string...
If you want to see the conversation that prompted me to actually start testing the idea (or read Aaron's thoughts on the idea [spoiler: not a fan]), Here's a link: https://stackoverflow.com/questions/53562331/selecting-data-against-numeric-values-saved-as-comma-separated-string/53563089#53563089SET QUOTED_IDENTIFIER ON;
GO
SET ANSI_NULLS ON;
GO
CREATE FUNCTION dbo.SPLIT_STRING_RNvAB
/* ===============================================================================
12/10/2018 JL, Created: Adds a ROW_NUMBER() to the built in STRING_SPLIT function.
Based on an Aaron Butyrand's idea of using CHARINDEX to find the items location
in the original string.
Note: Array items must be DISTINCT for this approach to work properly.
=============================================================================== */
--===== Define I/O parameters
(
@string NVARCHAR(MAX),
@separator NCHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT
rn = ROW_NUMBER() OVER (ORDER BY value_position),
ss.[Value]
FROM
STRING_SPLIT(@string, @separator) ss
CROSS APPLY ( VALUES (CHARINDEX(@separator + ss.Value + @separator, @separator + @string + @separator)) ) vp (value_position);
GO
Here's a simple test script I threw together...CREATE TABLE #TestData (Num INT NOT NULL PRIMARY KEY); -- Adding the clustered key to impose a "natural sort" that causes the stock
INSERT #TestData (Num) VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); --STRING_SPLIT to not display in the order of the original string.
DECLARE @string VARCHAR(8000) = '9,6,3,1,4,8,5,6,1,0,2';
-- original STRING_SPLIT function...
SELECT
td.Num,
ss.value
FROM
STRING_SPLIT(@string, ',') ss
JOIN #TestData td
ON CONVERT(INT, ss.Value) = td.Num;
-- ROW_NUMBER() alone
SELECT
td.Num,
ssr.rn,
ssr.Value
FROM
dbo.SPLIT_STRING_RNv1(@string, ',') ssr
JOIN #TestData td
ON CONVERT(INT, ssr.Value) = td.Num
-- ROW_NUMBER() w/ TOP & ORDER BY
SELECT
td.Num,
ssr.rn,
ssr.Value
FROM
dbo.SPLIT_STRING_RNv2(@string, ',') ssr
JOIN #TestData td
ON CONVERT(INT, ssr.Value) = td.Num;
-- ROW_NUMBER() with order set by CHARINDEX
SELECT
td.Num,
ssr.rn,
ssr.Value
FROM
dbo.SPLIT_STRING_RNvAB(@string, ',') ssr
JOIN #TestData td
ON CONVERT(INT, ssr.Value) = td.Num;
DROP TABLE #TestData;
December 10, 2018 at 11:58 pm
Lot's of people have been doing that trick with RowNumber. Thinking about the code behind the scenes, it should be guaranteed but MS refuses to state that or even imply that. It's really a shame they didn't take the time to return the element ordinal.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply