December 30, 2012 at 2:28 pm
Lynn Pettis (12/30/2012)
John Hardin (12/30/2012)
L' Eomot Inversé (12/30/2012)
Jeff Moden (12/28/2012)
If you look at the execution plan, it doesn't actually use the "same list". CTEs have some wonderful advantages but they also have a hidden fault. Instead of using the same internal result set from a CTE when things like self joins are used, the entire CTE is re-executed.Now, if only one could force the optimiser to spool the CTE and reuse it, CTEs would be far more useful.
Don't become so blinded by CTEs that you forget about temporary tables or table-type variables; another problem with CTEs is they are not indexed.
On page 27 I posted a suggested version of this that avoided CHARINDEX() and used an indexed table-type variable and it showed performance gains over the pure-CTE version, but I don't think anyone has seen fit to comment on that suggested version... 🙁
Page 27 passed on how many posts per page? I ask as I display 50 posts per page. It would help if you posted the url of your post so we could go directly to that post.
D'oh! I didn't consider non-default posts-per-page settings. Sorry. Here's the direct link: http://www.sqlservercentral.com/Forums/FindPost1129283.aspx
The second code block is what I am referring to.
December 30, 2012 at 5:58 pm
L' Eomot Inversé (12/30/2012)
Why can't the optimiser look and see if spooling would enhance performance (only in the cases where it make no semantic difference, of course) and use it when it does? Or if that's too difficult a task for the optimiser (it shouldn't be, but they haven't done it so I can imagine them claiming it is) why can't we have a query hint that tells them to do it? I hope Paul or someone will jump in and explain all this - maybe tell me I've got it all wrong, but that's OK too, I like learning.
You're right that the optimizer doesn't consider CTE materialization today (they are strictly in-line view definitions, expanded once per reference) and there is no hint for materialization either.
There is a 2006 active Connect suggestion by Adam Machanic for a hint. Though some duplicates of that item have been closed as Won't Fix (= perhaps for the time being) my overall impression from public Microsoft comments is that something along these lines is being actively considered for a future release. We will have to wait and see.
Related: more generalized subtree matching and reuse has been trialled in the product, see this Microsoft Research paper (PDF).
December 30, 2012 at 10:23 pm
Tony.l (12/29/2012)
Hiunbelievable - so good. I have been doing these puter things for 30 years now and i am still amazed how little i know or probably at just how clever other people are. Really well explained and totally useful.
keep up the good work.
APL. My initials not the language.
Thanks for the great feedback, Tony. I feel the same way. I've been working with computers in one form or another since 1968 and I'm still learning something new about them everyday.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2012 at 10:48 pm
John Hardin (12/30/2012)
On page 27 I posted a suggested version of this that avoided CHARINDEX() and used an indexed table-type variable and it showed performance gains over the pure-CTE version, but I don't think anyone has seen fit to comment on that suggested version... 🙁
You'll have to forgive me a bit on that. If you look at even the most recent pages of this thread, there are still people saying "Try this" or "How about this"? Even you said...
How about this to get rid of the CHARINDEX() string operation:
{snip}
...how does that affect performance?
Two days later (lots of water had flowed under the bridge by then and we're not notified by edits), you added...
edited to add: Actually performance goes down a bit because the subquery in making cteStartEnd is not indexed. This version does exhibit performance gains in my very modest testing:
...and followed that up with...
(Note: I'm doing my version as a SP so the above exact code hasn't actually been tested by me...)
I just didn't (and still don't) have the time to test everyone's suggestion. That's why I spent so much time on building a test harness for everyone to use.
If you think your method has merit in the area of performance, please test it and post the test results. Who knows? You might be on to something.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 31, 2012 at 10:24 am
SQL Kiwi (12/30/2012)
There is a 2006 active Connect suggestion by Adam Machanic for a hint. Though some duplicates of that item have been closed as Won't Fix (= perhaps for the time being) my overall impression from public Microsoft comments is that something along these lines is being actively considered for a future release. We will have to wait and see.
Well, I've added my vote to that. It will probably make no difference, but that isn't an excuse for not doing so whem I think it's right.
Related: more generalized subtree matching and reuse has been trialled in the product, see this Microsoft Research paper (PDF).
For some reason the link doesn't work. But looking in the obvious place indicated by your link (http://research.microsoft.com/en-us/um/people/jrzhou/publications.htm) finds it and the link from there, although it appears to be identical to your link, works fine. I have no idea what's happening there.
Edit: yes, I see what was happening: your link is relative, not absolute, so it tries to connect to http://www.sqlservercentral.com/Forums/research.microsoft.com/en-us/um/people/jrzhou/pub/cse.pdf instead of to the real target. I suspect I've done that sometimes too - probably every time I've typed the link instead of copyying and pasting from the address field of my browser.
Tom
December 31, 2012 at 4:52 pm
L' Eomot Inversé (12/31/2012)
For some reason the link doesn't work. But looking in the obvious place indicated by your link (http://research.microsoft.com/en-us/um/people/jrzhou/publications.htm) finds it and the link from there, although it appears to be identical to your link, works fine. I have no idea what's happening there.Edit: yes, I see what was happening: your link is relative, not absolute, so it tries to connect to http://www.sqlservercentral.com/Forums/research.microsoft.com/en-us/um/people/jrzhou/pub/cse.pdf instead of to the real target. I suspect I've done that sometimes too - probably every time I've typed the link instead of copyying and pasting from the address field of my browser.
Thanks Tom, I have corrected the link in the original post.
January 1, 2013 at 6:24 pm
Nice update to this article Jeff!
I especially liked your mysteriously labelled black line (????). Nice trick with the 0 based Tally table too - will need to remember that one.
I have already replaced the old DelimitedSplit8K in my sandbox and now I must seek out other places where it may have been used and do the same.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 2, 2013 at 8:02 am
Interesting stuff, thank you.
Table type parameters for stored procedures have eliminated much of my need for doing this within SQL, but it's always nice to have options.
January 3, 2013 at 7:26 am
Jeff or Paul:
I didn't see the source code for the CLR function. I also didn't read all 473 posts before mine, so maybe there's a link to it somewhere..
Paul, would you mind sharing that with us? I would love to see its guts.
Kurt
January 3, 2013 at 9:42 am
January 4, 2013 at 7:12 am
hi sorry i haven't tested for performance or sure if this has been done but removed the ctelen
IF OBJECT_ID('dbo.DelimitedSplit8K') IS NOT NULL
DROP FUNCTION [dbo].[DelimitedSplit8K]
go
CREATE FUNCTION [dbo].[DelimitedSplit8K]
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
, cteStart(N1,L1) AS (
SELECT 1,CHARINDEX(@pDelimiter,@pString,1) -1
UNION ALL
SELECT t.N+1
, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,t.N+1),0)-t.N,8001) -1
FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
)
--select * from cteStart
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteStart l
January 4, 2013 at 10:30 am
telcogod (1/4/2013)
hi sorry i haven't tested for performance or sure if this has been done but removed the ctelenIF OBJECT_ID('dbo.DelimitedSplit8K') IS NOT NULL
DROP FUNCTION [dbo].[DelimitedSplit8K]
go
CREATE FUNCTION [dbo].[DelimitedSplit8K]
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
, cteStart(N1,L1) AS (
SELECT 1,CHARINDEX(@pDelimiter,@pString,1) -1
UNION ALL
SELECT t.N+1
, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,t.N+1),0)-t.N,8001) -1
FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
)
--select * from cteStart
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteStart l
Pretty good performance compared to DelimitedSplit8K! However, when running the tests using Jeff Moden's test harness the function failed whenever there was only one element and was still not quite as fast. Very close in performance though, so the change you made didn't really make much difference.
RowNumSplitterNameNumberOfRowsNumberOfElementsMinElementLengthMaxElementLengthDurationMinLengthAvgLengthMaxLength
1DelimitedSplit8K100011100.123001510
2DelimitedSplit_TELCO100011100.076001510
3DelimitedSplit8K100021100.1530031121
4DelimitedSplit_TELCO100021100.1860031121
5DelimitedSplit8K100041100.2600092443
6DelimitedSplit_TELCO100041100.1500092443
7DelimitedSplit8K100081100.33000215073
8DelimitedSplit_TELCO100081100.13000215073
9DelimitedSplit8K1000161100.3460064102140
10DelimitedSplit_TELCO1000161100.4660064102140
11DelimitedSplit8K1000321100.33300158206268
12DelimitedSplit_TELCO1000321100.41600158206268
13DelimitedSplit8K1000641100.53600349415511
14DelimitedSplit_TELCO1000641100.48600349415511
15DelimitedSplit8K10001281100.87600701830920
16DelimitedSplit_TELCO10001281100.96600701830920
17DelimitedSplit8K10002561101.59600153016631835
18DelimitedSplit_TELCO10002561101.96000153016631835
19DelimitedSplit8K10005121103.09300307533233541
20DelimitedSplit_TELCO10005121103.32300307533233541
21DelimitedSplit8K100011501107.23000714174667790
22DelimitedSplit_TELCO100011501108.41600714174667790
23DelimitedSplit8K1000110200.01000101420
24DelimitedSplit_TELCO1000110200.01000101420
25DelimitedSplit8K1000210200.03600213041
26DelimitedSplit_TELCO1000210200.05300213041
27DelimitedSplit8K1000410200.10000476282
28DelimitedSplit_TELCO1000410200.08000476282
29DelimitedSplit8K1000810200.1200093126155
30DelimitedSplit_TELCO1000810200.1260093126155
31DelimitedSplit8K10001610200.20000219254292
32DelimitedSplit_TELCO10001610200.23600219254292
33DelimitedSplit8K10003210200.36000458511559
34DelimitedSplit_TELCO10003210200.39300458511559
35DelimitedSplit8K10006410200.7100093110211099
36DelimitedSplit_TELCO10006410200.7600093110211099
37DelimitedSplit8K100012810201.45300194320472158
38DelimitedSplit_TELCO100012810201.48300194320472158
39DelimitedSplit8K100025610202.92000394640954238
40DelimitedSplit_TELCO100025610202.98300394640954238
41DelimitedSplit8K100048010205.37000743776767900
42DelimitedSplit_TELCO100048010205.66300743776767900
43DelimitedSplit8K1000120300.05000202530
44DelimitedSplit_TELCO1000120300.01000202530
45DelimitedSplit8K1000220300.02300415161
46DelimitedSplit_TELCO1000220300.05600415161
47DelimitedSplit8K1000420300.0700084102120
48DelimitedSplit_TELCO1000420300.1030084102120
49DelimitedSplit8K1000820300.13600179206232
50DelimitedSplit_TELCO1000820300.16300179206232
51DelimitedSplit8K10001620300.26600372414449
52DelimitedSplit_TELCO10001620300.32600372414449
53DelimitedSplit8K10003220300.52600768830896
54DelimitedSplit_TELCO10003220300.57000768830896
55DelimitedSplit8K10006420301.04300159316631733
56DelimitedSplit_TELCO10006420301.09600159316631733
57DelimitedSplit8K100012820302.09000318233283441
58DelimitedSplit_TELCO100012820302.20000318233283441
59DelimitedSplit8K100025620304.24000649766546803
60DelimitedSplit_TELCO100025620304.30300649766546803
61DelimitedSplit8K100029020304.75300733175387696
62DelimitedSplit_TELCO100029020304.88600733175387696
63DelimitedSplit8K1000130400.01300303440
64DelimitedSplit_TELCO1000130400.01000303440
65DelimitedSplit8K1000230400.02600617181
66DelimitedSplit_TELCO1000230400.03000617181
67DelimitedSplit8K1000430400.09300126143161
68DelimitedSplit_TELCO1000430400.12300126143161
69DelimitedSplit8K1000830400.18300263287315
70DelimitedSplit_TELCO1000830400.23300263287315
71DelimitedSplit8K10001630400.35300529575619
72DelimitedSplit_TELCO10001630400.38300529575619
73DelimitedSplit8K10003230400.69300109811501212
74DelimitedSplit_TELCO10003230400.73300109811501212
75DelimitedSplit8K10006430401.38300221923012391
76DelimitedSplit_TELCO10006430401.43300221923012391
77DelimitedSplit8K100012830402.87300449646054736
78DelimitedSplit_TELCO100012830402.83300449646054736
79DelimitedSplit8K100021030404.57600741275617710
80DelimitedSplit_TELCO100021030404.67000741275617710
81DelimitedSplit8K1000140500.01600404450
82DelimitedSplit_TELCO1000140500.01000404450
83DelimitedSplit8K1000240500.043008191101
84DelimitedSplit_TELCO1000240500.030008191101
85DelimitedSplit8K1000440500.14000167182201
86DelimitedSplit_TELCO1000440500.14300167182201
87DelimitedSplit8K1000840500.22000342366395
88DelimitedSplit_TELCO1000840500.25300342366395
89DelimitedSplit8K10001640500.43600693735779
90DelimitedSplit_TELCO10001640500.49000693735779
91DelimitedSplit8K10003240500.86000139914701530
92DelimitedSplit_TELCO10003240500.91300139914701530
93DelimitedSplit8K10006440501.71600285229423022
94DelimitedSplit_TELCO10006440501.81300285229423022
95DelimitedSplit8K100012840503.56600576758886012
96DelimitedSplit_TELCO100012840503.59000576758886012
97DelimitedSplit8K100016540504.44600746575907717
98DelimitedSplit_TELCO100016540504.55000746575907717
January 4, 2013 at 8:43 pm
ah well, I have seen that before where you think removing the extra select will improve performance but it doesn't. Anyway I am working on a look ma no ddl version as well. This is brilliant stuff I always knew had to be a better way than select substring(string,charindex(delimiter,string), substring(string(charindex(delimiter,string,(charindex(delimiter,string))) etc. Thanks for checking I do not have a sandbox at work where is ok to freeproccache.
January 8, 2013 at 3:48 am
SQL Kiwi (12/30/2012)
Related: more generalized subtree matching and reuse has been trialled in the product, see this Microsoft Research paper (PDF).
More generalized matching is exactly what I always wanted, and would hate to see a query hint specific for subexpression reuse. CTE's expanding as they do now is logical as on each site, the "inline view" defined in the CTE can be used differently and thus must be optimised seperately as well. A more generalised optimisation as in the paper will exploit the similarities still found afer that process.
As for the paper, I only quickly scanned it to see what they were up to, and hope their table signatures are based on what is there after expanding contributing code to a query/batch. You want CTE's and non inline views expanded to have a wide as possible base for finding matches. Thus the optimisation should be 100% seperate from the SQL code and no query hint should be wished for!
January 16, 2013 at 9:22 am
The performance of this is amazing, as is the thought behind the code. These kind of posts really make you think and learn a whole new set of features in SQL
Thanks Jeff.
A couple of questions (sorry if it has been addressed - I didn't see it mentioned in the thread):
I would not expect a record back if the string being sent in is NULL, a single space or all spaces. In my case, I have records where the string being split is often blank and I would want no records coming back from the CROSS APPLY. I could filter them out using in the WHERE clause. I fixed this by adding LTRIM and RTRIM to the source string in the function.
Also, some of my "not so good" data also had multiple spaces in between the values causing numerous records returned that were blank.
For example:
dbo.DelimitedSplit8k('123 467 789', ' ') -- There are two spaces between 467 and 789
ItemNumber Item
1 123
2 467
3 <-- Empty String
4 789
Thanks Again,
Anton
Viewing 15 posts - 466 through 480 (of 990 total)
You must be logged in to reply to this topic. Login to reply