May 6, 2011 at 7:37 am
Hi Jeff,
After looking at it some more, I discovered an obvious tweak, I think you will like this A LOT.
create FUNCTION dbo.DelimitedSplit8K_T0
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
select 1 union all -- does away with 0 base CTE, and the OR condition in one go!
SELECT t.N+1 FROM cteTally t WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
FROM cteStart s
;
I disposed of the need for a zero based tally, and also removed the need for the OR condition you were trying to get rid off, by moving the union all one step in the processing order, towards to the end. My earlier change to the top operator has also been included as well. Unless I messed things up a little, this version seems to run circles (15-20%) around the articles original code.
Here is another alternative, often slightly faster still, but i would need to see comparative graphs to make sure its constant in its behavior.
alter FUNCTION dbo.DelimitedSplit8K_T1
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
select 1 union all -- does away with 0 base CTE, and the OR condition in one go!
select t.N+1 from cteTally t where (substring(@pString,t.N,1) = @pDelimiter)
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT
s.N1
, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
FROM
cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
In experimenting I found that even the slightest modification can totally wreck performance. This makes me a little unsure how well these routines will operate as a replacement of existing code.
May 6, 2011 at 7:55 am
Wow, what a difference 2 years makes:
Jeff Moden, February 2009, http://www.sqlservercentral.com/Forums/FindPost652105.aspx:
With the exception of RegEx, damned near all CLR's are simply not the right way to do it.
Jeff Moden, May 2011, http://www.sqlservercentral.com/articles/Tally+Table/72993/[/url]:
Make no doubt about it. The current best way to split delimited strings in SQL Server is to use a CLR splitter.
And no -- this is meant to be neither prod nor provocation.:-D (Also, I don't think I've quoted out of context.)
My point is that if one of the best and brightest among us (Jeff!) can (1) object to indiscriminate use of a new coding approach and then (B) nevertheless keep an open mind and see the utility of that new approach, then maybe I'll pause before I condemn the next "new thing" that comes along.
Thanks,
Rich
P.S. With acknowledgments to Car Talk....
May 6, 2011 at 10:50 am
New graphs would be nice, I got the feeling that the on-the-fly generated tally is now just as fast, if not faster then a 0 based physical tally table. The seedup numbers are in a similar range, if not a little higher.
Please, pretty please, Jeff 🙂
May 6, 2011 at 11:13 am
peter-757102 (5/6/2011)
New graphs would be nice, I got the feeling that the on-the-fly generated tally is now just as fast, if not faster then a 0 based physical tally table. The seedup numbers are in a similar range, if not a little higher.Please, pretty please, Jeff 🙂
Do what I did when Wayne asked me: Download the Build script from Jeff's article, drop yours in, run the tests, verify that your assumptions hold, then export to CSV, and either:
1) Do better than I did and make the graphs
or
2) post the raw CSV results here in a code block for others to view.
May 6, 2011 at 12:15 pm
I did modify the test set and tested the code of course, I just want to see comparable graphs from identical servers as the others originated from. Maybe it is just better on servers configured like mine, and is terrible on others ;).
May 6, 2011 at 12:20 pm
Nadrek (5/6/2011)
peter-757102 (5/6/2011)
New graphs would be nice, I got the feeling that the on-the-fly generated tally is now just as fast, if not faster then a 0 based physical tally table. The seedup numbers are in a similar range, if not a little higher.Please, pretty please, Jeff 🙂
Do what I did when Wayne asked me: Download the Build script from Jeff's article, drop yours in, run the tests, verify that your assumptions hold, then export to CSV, and either:
1) Do better than I did and make the graphs
or
2) post the raw CSV results here in a code block for others to view.
2. Do like the build script says: copy the data from your results grid into excel (with the column headers), then post the excel file here as an attachment.
FYI: the graphs are a "Scatter" graph, with smooth lines and markers. They are a wee bit tedious to set up, but straightforward. Then do a screen shot of the excel sheet with the graph, paste into your favorite graphic editor, copy just the graph portion, and paste it into a new image. Save, and upload. Use the img tag in your post to have it display.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 6, 2011 at 1:43 pm
WayneS (5/6/2011)
Nadrek (5/6/2011)
peter-757102 (5/6/2011)
New graphs would be nice, I got the feeling that the on-the-fly generated tally is now just as fast, if not faster then a 0 based physical tally table. The seedup numbers are in a similar range, if not a little higher.Please, pretty please, Jeff 🙂
Do what I did when Wayne asked me: Download the Build script from Jeff's article, drop yours in, run the tests, verify that your assumptions hold, then export to CSV, and either:
1) Do better than I did and make the graphs
or
2) post the raw CSV results here in a code block for others to view.2. Do like the build script says: copy the data from your results grid into excel (with the column headers), then post the excel file here as an attachment.
FYI: the graphs are a "Scatter" graph, with smooth lines and markers. They are a wee bit tedious to set up, but straightforward. Then do a screen shot of the excel sheet with the graph, paste into your favorite graphic editor, copy just the graph portion, and paste it into a new image. Save, and upload. Use the img tag in your post to have it display.
It was not really straightforward (using open office), but I managed to get some graphs, see attachments.
But now how do i link an attachment as an image on this forum?
I see no straight way to do this, the img tag only adds some plain text!
Ok, i figured it out, but it is a horrible interface 🙁
The code was posted here: http://www.sqlservercentral.com/Forums/FindPost1104575.aspx
May 6, 2011 at 2:30 pm
Peter: On my machine, you've definitely beat the article's original try, and beat my Union version, but the 0 based permanent temp table is still better.
Wayne: thank you for the information on graphing. I don't have time to do a comprehensive job, but here's some samples, attached, split between smaller and larger number of elements to make differences more obvious.
May 6, 2011 at 2:36 pm
Nadrek (5/6/2011)
Peter: On my machine, you've definitely beat the article's original try, and beat my Union version, but the 0 based permanent temp table is still better.
Interesting... I would be interested to see Peter's versions with a permanent Tally table. (I don't have time right now, but if nobody gets to it before this weekend I'll take a stab at it.)
May 6, 2011 at 2:46 pm
Nadrek (5/6/2011)
Peter: On my machine, you've definitely beat the article's original try, and beat my Union version, but the 0 based permanent temp table is still better.Wayne: thank you for the information on graphing. I don't have time to do a comprehensive job, but here's some samples, attached, split between smaller and larger number of elements to make differences more obvious.
Can you provide me a generation script for the exact tally table you used in your tests, I am then going to repeat my tests with your code added as well to see how it works on my system.
I will also do a test with that order by you did not have in your select on the tally table and check if that makes a speed difference at all.
Maybe one of the tricks I applied might also work on your code, and speed it up even more.
May 6, 2011 at 2:52 pm
UMG Developer (5/6/2011)
Nadrek (5/6/2011)
Peter: On my machine, you've definitely beat the article's original try, and beat my Union version, but the 0 based permanent temp table is still better.Interesting... I would be interested to see Peter's versions with a permanent Tally table. (I don't have time right now, but if nobody gets to it before this weekend I'll take a stab at it.)
I am more thinking along the lines of a modified tally table as was introduces a year or more ago. It has not only N, but also a N + 1 column, doing away with a calculation at the expense of some IO.
Hard to imagine a simple +1 makes much computational difference, but removing such calculations seems to prevent the optimizer being thrown off track.
We got to test it tho, with some code it might work wonders, on other not at all.
Still, the nice thing about the CTE method is that you do not need a table at all, the function stands on its own, which can be quite valuable as a universal tool.
May 6, 2011 at 2:53 pm
peter-757102 (5/6/2011)
Nadrek (5/6/2011)
Peter: On my machine, you've definitely beat the article's original try, and beat my Union version, but the 0 based permanent temp table is still better.Wayne: thank you for the information on graphing. I don't have time to do a comprehensive job, but here's some samples, attached, split between smaller and larger number of elements to make differences more obvious.
Can you provide me a generation script for the exact tally table you used in your tests, I am then going to repeat my tests with your code added as well to see how it works on my system.
I will also do a test with that order by you did not have in your select on the tally table and check if that makes a speed difference at all.
Maybe one of the tricks I applied might also work on your code, and speed it up even more.
Can we get someone to put Peter's and Nadrek's code into a new test script, so that all can easily test it? Just modify Jeff's script to remove all but the DelimitedSplit8K, and add the four newer ones from Peter and Nadrek. If you need a tally table, it needs to be 1-10000, with a CI with 100% fill factor.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 6, 2011 at 4:06 pm
WayneS (5/6/2011)
Can we get someone to put Peter's and Nadrek's code into a new test script, so that all can easily test it? Just modify Jeff's script to remove all but the DelimitedSplit8K, and add the four newer ones from Peter and Nadrek. If you need a tally table, it needs to be 1-10000, with a CI with 100% fill factor.
I am working on that as we speak.
update
I made the script but am still in the process of optimizing / correcting a few things. Right now, there are too many processes running on my server to get reliable clean data for the graphs, so i will resume tomorrow. Hopefully by then, someone else will have also taken a second look and we can compare findings.
May 6, 2011 at 5:10 pm
This thread is getting more and more exciting.
Awesome.
-- Gianluca Sartori
May 6, 2011 at 8:10 pm
Gianluca Sartori (5/6/2011)
This thread is getting more and more exciting.Awesome.
I know what you mean. I, for one, am looking forward to the revealing of the ultimate DelimitedSplit function. It would be ultra-sweet if it were to handle varchar(max) also!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 15 posts - 181 through 195 (of 990 total)
You must be logged in to reply to this topic. Login to reply