May 3, 2011 at 7:24 am
Anipaul (5/3/2011)
As usual, you have written another excellent article Jeff. It was great one.:-)
Thanks, Anipaul. Good to "see" you again. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 7:27 am
adnan.korkmaz (5/3/2011)
Great job! Thank you all those who have written, contributed or inspired Jeff.I have done some tests on real data before using this new one, executing both under same parameters for 10.000 times and logging the times.
The previous splitter function we used took 1084, 1986 and 633 milliseconds (average, max and min) on a given parameter. This new one takes 286, 416 and 176 milliseconds (again avg, max, min).
It's around 4 times faster in my scenario.
So, just congrats and thanks!
Very cool feedback, Adnan. Is there a chance of you posting your previous splitter function so we can take a gander at the differences the two may have?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 7:28 am
srikant maurya (5/3/2011)
Great Job,I will use these code for future use.
Thanks, Srikant. I appreciate the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 7:32 am
ALZDBA (5/3/2011)
It's a pity some obscure character caught the attention of CNN, Bloomberg, Al Jazera, ... the day your great article got published.For sure it would have been a headliner !
Great code, great benchmark info,
Great job Jeff
Thanks again for sharing the knowledge.
Thanks, Johan. Always good to hear from you and thanks for the laugh. 🙂
As a side bar, I've submitted one abstract for PASS so far and will submit another soon. I can't afford to go unless I present because I have to pay for everything out of my own pocket. If I get selected, it would be a real pleasure to meet with you again. Do you know if you're going?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 7:36 am
Jeff Moden (5/3/2011)
adnan.korkmaz (5/3/2011)
Great job! Thank you all those who have written, contributed or inspired Jeff.I have done some tests on real data before using this new one, executing both under same parameters for 10.000 times and logging the times.
The previous splitter function we used took 1084, 1986 and 633 milliseconds (average, max and min) on a given parameter. This new one takes 286, 416 and 176 milliseconds (again avg, max, min).
It's around 4 times faster in my scenario.
So, just congrats and thanks!
Very cool feedback, Adnan. Is there a chance of you posting your previous splitter function so we can take a gander at the differences the two may have?
Sure, here it is:
ALTER FUNCTION [dbo].[FastSplit] (@s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(',', @s-2)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(',', @s-2, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS ID
FROM Pieces
)
May 3, 2011 at 7:49 am
Except for the way they handled the final element, that's almost identical to the rCTE code I have in the test harness I included with the article. Thanks for posting it, Adnan. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 7:53 am
Jeff Moden (5/3/2011)
ALZDBA (5/3/2011)
It's a pity some obscure character caught the attention of CNN, Bloomberg, Al Jazera, ... the day your great article got published.For sure it would have been a headliner !
Great code, great benchmark info,
Great job Jeff
Thanks again for sharing the knowledge.
Thanks, Johan. Always good to hear from you and thanks for the laugh. 🙂
As a side bar, I've submitted one abstract for PASS so far and will submit another soon. I can't afford to go unless I present because I have to pay for everything out of my own pocket. If I get selected, it would be a real pleasure to meet with you again. Do you know if you're going?
Probably I'll apply for travel approval by the end of this month because SQLPass will have published an agenda by then on which I can build my POI that can serve our needs.
I'll keep you updated.
Like in the previous years, as soon as I get my approval it will show up in my forum signature 😎
Fingers crossed :Whistling:
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 3, 2011 at 7:59 am
As usual, great post Jeff. Five star article.. 🙂
-Roy
May 3, 2011 at 8:06 am
Jeff, please please please write a book, without changing your style. Give us all an opportunity to repay you for what you've done for us.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
May 3, 2011 at 8:18 am
Jeff
Thank you for your patience. I must not be asking my question in the right way. Sorry. I'll try one last time.
There are two parts to the parsing:
* finding delimiter positions
* parsing the string
When I do performance comparisons, I time the individual components. I wanted to know the performance difference between using the CharIndex() in a loop and the Tally Table to populate a temp table of delimiter positions.
Although we would both use an "inchworm" method to parse the string, I was curious about the performance of a cursor on the delimiter table, compared to the NullIf(IsNull()).
As I stated earlier, traditional performance tuning primarily involves simplification, reuse, and pre-calculation. If that methodology and approach is not valid in the TSQL environment, I'll gladly get a new toolbox.
May 3, 2011 at 8:44 am
mark hutchinson (5/3/2011)
JeffThank you for your patience. I must not be asking my question in the right way. Sorry. I'll try one last time.
There are two parts to the parsing:
* finding delimiter positions
* parsing the string
When I do performance comparisons, I time the individual components. I wanted to know the performance difference between using the CharIndex() in a loop and the Tally Table to populate a temp table of delimiter positions.
Although we would both use an "inchworm" method to parse the string, I was curious about the performance of a cursor on the delimiter table, compared to the NullIf(IsNull()).
As I stated earlier, traditional performance tuning primarily involves simplification, reuse, and pre-calculation. If that methodology and approach is not valid in the TSQL environment, I'll gladly get a new toolbox.
The biggest reason for not using a while loop or cursor to step through each position of a string looking for the specified delimiter is that you cannot use those in an inline TVF (iTVF). You would have to use a multiline TVF (mTVF) and those are slower.
May 3, 2011 at 9:06 am
Lynn
Thanks. Then this is an environmental property of 'compiled' TSQL of which I was not aware.
May 3, 2011 at 9:19 am
Jeff Moden (5/2/2011)
gary.rumble (5/2/2011)
Well, so far I got:...
I think your code is too much for my server. 😉
Yowch. The code is even split into batches. What you may have to do is run the code a section at a time up to where the test loop begins and the let the test loop rip.
Thanks for trying, Gary. If for some reason, your server just won't take it, let me know and we'll test your code for you.
I set up a new SS instance on my laptop and got the tests to run. Looks like my code tracks your results fairly closely, but of course it doesn't create the numbers table on the fly so it probably has an advantage there.
I emailed you the results. I couldn't generate your pretty graphs, though.
Thanks for the article.
14090 SW TENNESSEE LN
May 3, 2011 at 9:27 am
Thanks Jeff - great stuff
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 3, 2011 at 9:42 am
Jeff Moden (5/2/2011)
Well, that makes 3 of us. We have a quorum! 😛 Phil Factor wrote an article fairly recently on all the XML, JSON, and other gobilty-gook and suggested that someone should come up with a better way. I was considering writing an article about the all-too-"secret" characters 28-31 and a couple of other goodies in "control character land". Whatcha tink? Worthwhile or not?
I haven't run it but your real Tally Table splitter looks right. You picked up on everything including the use of the "TOP".
Change your test harness script to use CHAR(30) instead of ',' and start leading by example :); examples are what will help with the control characters. Peoplesoft names, valid uses of tabs in text (formatted text blurbs, for example), valid uses of pipes in VARCHAR fields (program documentation, storing REGEX expressions in a database, secure usernames with special characters allowed), etc.
Also, an easy way to get SQL Server to output SELECT results to a special character delimited VARCHAR() table, and directly to a text file would be very worthwhile for the article. The main uses I see are getting data to/from arbitrary SELECT results into a string with one or two delimiters/dimensions, and the same thing to/from a text file.
As far as my previous single CTE real Tally Table (0 based) splitter, please ignore it. It takes too long on higher numbers of elements, much like the original tally table splitter's problem. Instead, try one or both of the two double CTE versions below:
CREATE FUNCTION [dbo].[YourFunctionUnion](@pString [varchar](8000), @pDelimiter [char](1))
RETURNS TABLE
RETURN
WITH
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) N FROM YourDB.dbo.YourTallyTable1Based WITH (nolock) -- the 1 based + Union All is almost identical in performance to a 0 based TOP (Datalength()+1)
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1), -- not including StringNumber actually increases duration and CPU, but uses slightly fewer writes
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
FROM cteStart s WITH (nolock)
;
CREATE FUNCTION [dbo].[YourFunction0Based](@pString [varchar](8000), @pDelimiter [char](1))
RETURNS TABLE
RETURN
WITH
cteTally(N) AS (--==== This limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (DATALENGTH(ISNULL(@pString,1))+1) N FROM YourDB.dbo.YourTallyTable0Based WITH (nolock) -- the 1 based + Union All is almost identical in performance to a 0 based TOP (Datalength()+1)
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1), -- not including StringNumber actually increases duration and CPU, but uses slightly fewer writes
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
FROM cteStart s WITH (nolock)
;
The UNION version is up to about 5% faster on small but reasonably useful numbers of elements and uses the "standard" 1 based tally table, and the 0 based version is up to about 5% faster on medium to large numbers of elements, and uses a special tally table that starts at 0. Both can be slightly slower than your CTE Tally example on very small numbers of elements/string sizes, depending on some combination of SQL version and platform parameters.
Viewing 15 posts - 76 through 90 (of 990 total)
You must be logged in to reply to this topic. Login to reply