September 16, 2010 at 10:05 am
Sorry, Jeff, I had a deadline yesterday that kept me busy.
I'll re-do my tests and see what I come up with.
--Brad
September 16, 2010 at 10:38 am
Jeff,
Here's the key to the mystery as to why you and I get differing results.
All of my tests included compile time also, because I did a DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS before each test.
If I get rid of those and re-do the time test (on the WHILE LOOP approach), I get the same results as you do... the Duration is pretty much equal to the CPU... 31ms or less.
Sorry for the confusion.
--Brad
September 16, 2010 at 6:56 pm
Brad Schulz (9/16/2010)
Jeff,Here's the key to the mystery as to why you and I get differing results.
All of my tests included compile time also, because I did a DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS before each test.
If I get rid of those and re-do the time test (on the WHILE LOOP approach), I get the same results as you do... the Duration is pretty much equal to the CPU... 31ms or less.
Sorry for the confusion.
--Brad
Ah... got it. And in a real environment, you'd let each of these cache so you'd realize the 31 ms.
Do you also get the same results if you let the numbers table solution cache?
And, understood on the "busy" thing.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 16, 2010 at 6:58 pm
Peter Brinkhaus (9/13/2010)
I am not an XML Ninja either, but you can try a inline table valued function
CREATE FUNCTION dbo.Split8KX_Inline(@Parameter VARCHAR(MAX), @Delimiter VARCHAR(1))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ItemNumber,
XC.R.query('.').value('/r[1]', 'varchar(8000)') ItemValue
FROM
(
SELECT CAST('<r>'+REPLACE(@Parameter, @Delimiter, '</r><r>')+'</r>' AS XML)
) X(P)
CROSS APPLY
P.nodes('/r') XC(R)
;
I have run the following code against your test data
set statistics time on
-- Print Jeff's
select
count(*)
from
CsvTest
cross apply
dbo.Split8KX(CsvParameter, ',')
set statistics io off
set statistics time off
and
set statistics io on
set statistics time on
print 'Inline table valued function'
select
count(*)
from
CsvTest
cross apply
dbo.Split8KX_Inline(CsvParameter, ',')
set statistics io off
set statistics time off
Here are the test results
(1 row(s) affected)
Table '#060DEAE8'. Scan count 10000, logical reads 10000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CsvTest'. Scan count 1, logical reads 774, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 40966 ms, elapsed time = 42803 ms.
and
(1 row(s) affected)
Table 'CsvTest'. Scan count 5, logical reads 849, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 8846 ms, elapsed time = 2268 ms.
Edit: as you can see from the execution times, the inline table valued function also heavily benefits from the 4 core CPU in my PC.
Peter
I'm finally going to get to test this. I'll let you know the results I get. Thanks again for the code and for the time. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
September 16, 2010 at 7:14 pm
Jeff:
Using the Numbers table approach (using the code I had in my blog) with the procedure in cache results in the following:
CPU = 6486
Reads = 244155
Duration = 1887ms (avg)
The Numbers table (in SQL2008) assumes 30% selectivity (i.e. 300,000 rows) and so it has hash joins and parallelism and is just a downright inefficient plan.
Now here's the interesting thing...
If I do the same thing in SQL2005, it's also a lousy-looking plan (hash joins and parallelism, etc), but the plan has a completely different construction... and it performs waaaaay better:
CPU = 71
Reads = 805
Duration = 44
So SQL2008 is a complete dog in using the Numbers table approach.
I'll have to see if I can scrape together the plans so you can see the difference. I have to run home right now, but perhaps tomorrow I can post pictures or I'll put .SQLPLAN files on the web for you to look at if you're interested.
--Brad
September 16, 2010 at 7:21 pm
Wow... I guess my machine doesn't like XML so much. I stopped the test after 10 minutes on a 10K by 100 run.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 17, 2010 at 2:02 am
Jeff Moden (9/16/2010)
Wow... I guess my machine doesn't like XML so much. I stopped the test after 10 minutes on a 10K by 100 run.
Like Steve already pointed out, the SELECT COUNT(*) I tested the performance with, was a very wrong test. I improved the function by replacing the line
XC.R.query('.').value('/r[1]', 'varchar(8000)') ItemValue
with
XC.R.value('text()[1]', 'varchar(8000)') ItemValue
which gives an improvement of about 30%. Still, the inline version is horrible. I don't see much difference with your function but it is really much faster then the inline version.
Peter
September 17, 2010 at 6:18 am
Yep... sorry, Peter. I meant to say that I tested that method, as well. It's not nearly as bad as the original dot version but, you're correct, it's still pretty bad. It came up at 25 minutes on my machine. And, yeah... I actually am setting up a new machine (64bit 2 CPU laptop this time).
I've managed to collect and test a whole gambit of splitters and I'll include them in the article modernization along with their performance numbers including some of these XML splitters.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 17, 2010 at 7:36 am
Jeff:
Did you read my blog post about XML that I mentioned earlier?:
http://bradsruminations.blogspot.com/2010/01/delimited-string-tennis-again-final.html
I had the same loooonnnnng performance issues as you did until someone suggested I add a .query('.') to the derived table.
In other words, instead of
... FROM (CAST( ... AS XML)) X(P)
change it to this:
... FROM (CAST( ... AS XML).query('.')) X(P)
It makes a world of difference.
and change the value() to:
.value('(./text())[1]','varchar(8000)')
Again, check out the blog article, because it goes into all of this and explains WHY it works better.
--Brad
September 17, 2010 at 5:33 pm
To Reply to Kevin on why the UDF rather the in line code:
I recently used this type of iTVF to build a keyword list from item descriptions from order detail records (the items were mostly bottles of wine) where I would disregard certain parts of the descriptions, such as commas and ampersands. The function created the whole keyword table of items and keywords (80,000 + rows) in under 2 seconds by cross applying it with the item table. Performance was nice, but the real reason for the UDF was to exactly duplicate the logic on which keywords were valid.
In this application the user would enter some keywords separated by a space and look for orders they had placed containing all of the keywords. This whole string was passed to a stored procedure as the search criteria. I used the same iTVF to parse the input and create a temp table for the keywords. Since the logic was encapsulated in the same function used to create the keyword/items table I never had to worry about missing a keyword on either end and the search always worked.
Todd Fifield
September 18, 2010 at 7:40 pm
Brad Schulz (9/17/2010)
Jeff:Did you read my blog post about XML that I mentioned earlier?:
http://bradsruminations.blogspot.com/2010/01/delimited-string-tennis-again-final.html
I had the same loooonnnnng performance issues as you did until someone suggested I add a .query('.') to the derived table.
In other words, instead of
... FROM (CAST( ... AS XML)) X(P)
change it to this:
... FROM (CAST( ... AS XML).query('.')) X(P)
It makes a world of difference.
and change the value() to:
.value('(./text())[1]','varchar(8000)')
Again, check out the blog article, because it goes into all of this and explains WHY it works better.
--Brad
I'll check it out, Brad. And, yep... I was already aware of the text()[1] thingy but it's nice to have confirmation.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 18, 2010 at 8:09 pm
Hey folks... especially BRAD... would you mind participating in a little test at the following URL please? Thanks.
http://www.sqlservercentral.com/Forums/Topic988784-391-1.aspx
--Jeff Moden
Change is inevitable... Change for the better is not.
September 18, 2010 at 10:06 pm
Hmm. How'd I miss this thread? :w00t:
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
September 18, 2010 at 10:45 pm
Paul White NZ (9/18/2010)
Hmm. How'd I miss this thread? :w00t:
Because of being busy writing those great optimizer articles?
September 19, 2010 at 12:31 am
Peter Brinkhaus (9/18/2010)
Paul White NZ (9/18/2010)
Hmm. How'd I miss this thread? :w00t:Because of being busy writing those great optimizer articles?
:blush:
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 31 through 44 (of 44 total)
You must be logged in to reply to this topic. Login to reply