July 26, 2012 at 9:58 pm
Jeff Moden (7/26/2012)
dwain.c (7/26/2012)
NEWS FLASH -/-The SQL-verse was set aflutter today by a purported attempt to debunk the myth perpetrated by Jeff Moden that his DelimitedSplit8K was the fastest 8K SQL-based string splitter that the known SQL-verse had ever seen.
Reports are coming in that the debunkers of the debunkers are planning a mass rally in Jeff’s favor, and that the code churning machines have been turned up to full volume production levels. SQL Server performance across the known SQL-verse has now dropped to all-time lows as these unprecedented levels of code flurry, expend CPU cycles attempting to make sense out of these unverified reports.
The victor in these skirmishes will no doubt rise to the top of the string-splitting throne, with scepter in hand, to declare “all those that have come before me shall finally bow to the prowess of this solution.”
Stay tuned to BBSQL for breaking news updates as this drama unfolds.
Easy, big guy. First, it's not my splitter. It's our splitter. A lot of good folks have put some good time and effort into it and 2 folks in the discussion of the "Tally Oh!" article actually took my code and, in the SSC spirit, made it even better.
Although Aaron was apparently unaware of the affect that simply changing the input variable to a MAX datatype would have and he certainly went about it all the wrong way, his goal wasn't to challenge N4K or 8K splitters. Despite my personal disappointment in how he went about it, his goal was to find out what the best MAX splitter(s) were.
Just having a bit of fun with this is all. 😛
No offense was intended to the other contributors to the splitter-verse either of course. It's just that everybody I think considers you the 'big daddy' on this topic.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 26, 2012 at 10:04 pm
Re: Aaron's article.
If I were the author of the SSC article concerned, these are the main points I would take away:
It is too harsh to say Aaron's article is "poorly written" (he clearly put considerable time and effort into it). One might not agree with the change to MAX data types (though it is a natural change many people might make, absent any explicit warnings to the contrary in the original article). One might also not agree with the testing methodology (Aaron does explain his reasoning, though).
From the CLR side, I might complain that the generic CLR splitter used (Adam's) is not the optimal one to compare since it handles multi-character delimiters. If it were important, I would point Aaron to the splitter code I wrote for Jeff, which is optimized for single-character delimiters. I probably won't, since the difference is relatively small, and he comes to the same correct conclusion that Jeff does anyway: "The current best way to split delimited strings in SQL Server is to use a CLR splitter.".
That said, it ought to be possible to make those sort of technical responses without causing Aaron's Snark-O-Meter™ to explode.
July 26, 2012 at 10:14 pm
Ok so here is the response im planning with code. i cant account for the major difference in time. as i state i make only minor changes to Jeffs (Our) code and made no changes to the moden style splitter Aaron posted in his blog.
Aaron, the changes you made to jeffs splitter seriously impacted the performance. I can not find the specific place as the changes you made seem very minor but even still in my tests with the minor changes to Jeffs code i have made (Changing the variable types to NVARCHAR(MAX) and NCHAR(1)) it still performs much better than what you posted. i have used the following code to set up a test bed and generate 100 rows of random length elements from 10 to 90 charecters for lengths around 550,000 characters long. the times are astonishing for the diffence in time it took to perform these tests. In Jeff's article on his new DelimitedSplit8k even his testing showed that a CLR was faster. however i believe your changes to his splitter have decreased the performance to a point where you do his splitter a severe injustice. here are the results of my testing on a dual core pentium E5400 (2.7 GHZ) and 3 gig of ram, and i think we can agree that the virtual box you were running these tests on can smoke my machine any day of the week.
The results:
(Ran multiple times and all results were similar)
------Almost pure moden splitter modified slightly to take a NVARCHAR(MAX)
and have a cteTally big enough and eliminate the OR when calculating the start of the string-----
SQL Server Execution Times:
CPU time = 4368 ms, elapsed time = 5021 ms.
------Aaron version of Moden Splitter with NVARCHAR(255)----------------
SQL Server Execution Times:
CPU time = 336665 ms, elapsed time = 339644 ms.
And the code so any one else can run the same tests:
The setup code was borowed from Jeff and you can get the original version here
--View needed for the CreateCsv Function
CREATE VIEW iFunction AS
SELECT NEWID() AS MyNewID
GO
--Create a massivly large CSV string with random length elements
CREATE FUNCTION dbo.CreateCsv
(
@pNumberOfRows INT,
@pNumberOfElementsPerRow INT,
@pMinElementwidth INT,
@pMaxElementWidth INT
)
RETURNS TABLE
AS
RETURN
SELECT TOP (@pNumberOfRows) --Controls the number of rows in the test table
ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum,
CSV =
(--==== This creates each CSV
SELECT CAST(
STUFF( --=== STUFF get's rid of the leading comma
( --=== This builds CSV row with a leading comma
SELECT TOP (@pNumberOfElementsPerRow) --Controls the number of CSV elements in each row
','
+ LEFT(--==== Builds random length variable within element width constraints
LEFT(REPLICATE('1234567890',CEILING(@pMaxElementWidth/10.0)), @pMaxElementWidth),
ABS(CHECKSUM((SELECT MyNewID FROM dbo.iFunction)))
% (@pMaxElementWidth - @pMinElementwidth + 1) + @pMinElementwidth
)
FROM sys.All_Columns ac3 --Classic cross join pseudo-cursor
CROSS JOIN sys.All_Columns ac4 --can produce row sets up 16 million.
WHERE ac3.Object_ID <> ac1.Object_ID --Without this line, all rows would be the same.
FOR XML PATH('')
)
,1,1,'')
AS VARCHAR(MAX))
)
FROM sys.All_Columns ac1 --Classic cross join pseudo-cursor
CROSS JOIN sys.All_Columns ac2 --can produce row sets up 16 million rows
;
GO
--The table to hold our strings
CREATE TABLE LongCSVTest (ID INT, CSV NVARCHAR(MAX))
--A clustered index as you had on your tables
CREATE UNIQUE CLUSTERED INDEX uix_LongCsvTest ON LongCSVTest(ID)
--Create our strings and populate the data
INSERT INTO LongCSVTest
SELECT * FROM dbo.CreateCsv(100, 11000, 10, 90)
GO
--Slightly modified Moden splitter to take NVARCHAR(MAX) and NCHAR(1) for the delimiter
--I also changed the finding of the start of the first element by eliminating the OR and
-- using SELECT 1 UNION ALL since the first string is going to have a start of 1
-- Its how i use the DelimitedSplit8k in my databases
CREATE FUNCTION [dbo].[DelimitedSplitMAX]
(@pString NVARCHAR(MAX), @pDelimiter NCHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table” produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
E5(N) AS (SELECT 1 FROM E4 a, E4 b),
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (SELECT 1 UNION ALL
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
FROM cteStart s
;
GO
CREATE FUNCTION dbo.SplitStrings_Moden
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b),
E42(N) AS (SELECT 1 FROM E4 a, E2 b),
cteTally(N) AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@List,1)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0))
SELECT Item = SUBSTRING(@List, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,8000))
FROM cteStart s;
GO
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
DECLARE @Shutup VARCHAR(MAX)
PRINT '------Almost pure moden splitter modified slightly to take a NVARCHAR(MAX)
and have a cteTally big enough and eliminate the OR when calculating the start of the string-----'
SET STATISTICS TIME ON
SELECT @Shutup = Item FROM LongCSVTest CROSS APPLY DelimitedSplitMAX (CSV,',')
SET STATISTICS TIME OFF
PRINT '------Aaron version of Moden Splitter with NVARCHAR(255)----------------'
SET STATISTICS TIME ON
SELECT @Shutup = Item FROM LongCSVTest CROSS APPLY SplitStrings_Moden (CSV,',')
SET STATISTICS TIME OFF
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
July 26, 2012 at 10:23 pm
capn.hector (7/26/2012)
------Almost pure moden splitter modified slightly to take a NVARCHAR(MAX)
and have a cteTally big enough and eliminate the OR when calculating the start of the string-----
------Aaron version of Moden Splitter with NVARCHAR(255)----------------
I suggest you call it the "Moden community splitter" to allow Jeff to retain his humility. 🙂
Who's your daddy?
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 26, 2012 at 10:26 pm
dwain.c (7/26/2012)
capn.hector (7/26/2012)
------Almost pure moden splitter modified slightly to take a NVARCHAR(MAX)
and have a cteTally big enough and eliminate the OR when calculating the start of the string-----
------Aaron version of Moden Splitter with NVARCHAR(255)----------------
I suggest you call it the "Moden community splitter" to allow Jeff to retain his humility. 🙂
Who's your daddy?
Will do. i was amazed at how slow the exact splitter Aaron used was for what seemed to be minor changes. i cant really account for the time as i also tried Aaron's modified splitter with an NCHAR(1) and it did not finish after 11 min running it. at that point i gave up on it and you have the post in its final form.
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
July 26, 2012 at 10:41 pm
SQL Kiwi (7/26/2012)
Re: Aaron's article.If I were the author of the SSC article concerned, these are the main points I would take away:
- The in-line article code should be updated to ensure people always use the latest/best version.
- It should be clearly stated that performance is poor above 8000 bytes, and the code should not be modified to do that.
- A T-SQL splitter that can handle > 8000 bytes is required.
It is too harsh to say Aaron's article is "poorly written" (he clearly put considerable time and effort into it). One might not agree with the change to MAX data types (though it is a natural change many people might make, absent any explicit warnings to the contrary in the original article). One might also not agree with the testing methodology (Aaron does explain his reasoning, though).
From the CLR side, I might complain that the generic CLR splitter used (Adam's) is not the optimal one to compare since it handles multi-character delimiters. If it were important, I would point Aaron to the splitter code I wrote for Jeff, which is optimized for single-character delimiters. I probably won't, since the difference is relatively small, and he comes to the same correct conclusion that Jeff does anyway: "The current best way to split delimited strings in SQL Server is to use a CLR splitter.".
That said, it ought to be possible to make those sort of technical responses without causing Aaron's Snark-O-Meter™ to explode.
Way ahead of you. I've already submitted such changes to the "Tally OH!" article.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 26, 2012 at 10:47 pm
capn.hector (7/26/2012)
Ok so here is the response im planning with code. i cant account for the major difference in time.
Your cteTally references E4, not the E5 I presume you intended. That means you are only looking at the first 10,000 characters. You have also left the hard-coded 8000 in the final ISNULL(NULLIF(CHARINDEX... expression, so that cuts it down to 8000. As it stands, your code does not split very much of the test data, and every row in the test data table is the same (for me, anyway, tested on 2012), so I wouldn't post it just yet if I were you.
edit: I misremembered the purpose of the 8000 constant. The remainder of the explanation stands, however.
July 26, 2012 at 10:50 pm
Jeff Moden (7/26/2012)
Way ahead of you. I've already submitted such changes to the "Tally OH!" article.
Oh good. I see there are new comments on Aaron's post now too, which seem to have improved the direction of the debate a great deal.
July 26, 2012 at 11:03 pm
SQL Kiwi (7/26/2012)
Jeff Moden (7/26/2012)
Way ahead of you. I've already submitted such changes to the "Tally OH!" article.Oh good. I see there are new comments on Aaron's post now too, which seem to have improved the direction of the debate a great deal.
No need to get sarcastic with me, Paul. I didn't see Capn Hector's post here to stop it because I was working on the correction to the article.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 26, 2012 at 11:05 pm
well no more tequila and coding late at night. just made an *** of my self.
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
July 26, 2012 at 11:10 pm
Jeff Moden (7/26/2012)
SQL Kiwi (7/26/2012)
Jeff Moden (7/26/2012)
Way ahead of you. I've already submitted such changes to the "Tally OH!" article.Oh good. I see there are new comments on Aaron's post now too, which seem to have improved the direction of the debate a great deal.
No need to get sarcastic with me, Paul. I didn't see Capn Hector's post here to stop it because I was working on the correction to the article.
No sarcasm, Jeff; a genuine observation. (Hector's post wasn't there when I wrote that). Gosh people are on a short fuse today.
July 26, 2012 at 11:11 pm
Jeff Moden (7/26/2012)
SQL Kiwi (7/26/2012)
Jeff Moden (7/26/2012)
Way ahead of you. I've already submitted such changes to the "Tally OH!" article.Oh good. I see there are new comments on Aaron's post now too, which seem to have improved the direction of the debate a great deal.
No need to get sarcastic with me, Paul. I didn't see Capn Hector's post here to stop it because I was working on the correction to the article.
Since you decided to jump in on this, you might also want to suggest to Aaron that he also needs to make a change to his article because there's a shed load of bad code with my name on it that I didn't write.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 26, 2012 at 11:15 pm
capn.hector (7/26/2012)
well no more tequila and coding late at night. just made an *** of my self.
I'm curious. I see you posted "i must post a retraction with the corrected cteTally the times are the same." - does that mean the times with your code are the same as with Aaron's MAXed version now? I realize it is difficult to post apologies sometimes, but that seems brief to the point where it's not quite clear what you are saying.
July 26, 2012 at 11:17 pm
Jeff Moden (7/26/2012)
Since you decided to jump in on this, you might also want to suggest to Aaron that he also needs to make a change to his article because there's a shed load of bad code with my name on it that I didn't write.
I'm going to let this slide, because you misinterpreted my earlier comment as sarcasm, when it wasn't. Take a break and a deep breath Jeff!
July 26, 2012 at 11:21 pm
I say we all take a deep breath and have ¡Mas tequila!
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 15 posts - 37,096 through 37,110 (of 66,749 total)
You must be logged in to reply to this topic. Login to reply