September 15, 2009 at 11:35 am
And for the reason one should test, test, and test again in your specific environment. What works best in one environment may not work in another.
September 15, 2009 at 11:51 am
Bob Hovious (9/15/2009)
To avoid confusing screen display times with runtimes, I'm just tracking the milliseconds to write to a temporary table. On my laptop, the CTE/RowNumber solution ran in an average of 645 milliseconds and the Subquery solution averaged 1519 ms. That means the CTE solution was twice as fast for this test, which was only 120,000 rows.
The CTE does win in this scenario, but not by 2x. You forgot to reset timer to getdate after the CTE version, so all that time is included unfairly in the subquery version for repeated attempts.
This time the IN query used a HASH MATCH instead of the Nested Loops or Merge Join that I saw in my tests.
September 15, 2009 at 1:33 pm
Ahhhh :w00t:
You're absolutely right. The first SET @timer was OUTSIDE the while loop. My bad, and thanks for pointing that out, Seth. I will go back and correct it in the original.
Revised performance comparison, on my laptop for this particular query:
CTE/RowNumber:579 milliseconds
Subquery: 828
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 15, 2009 at 1:40 pm
I can't argue with "It depends."
Lynn, I need a judgment call from an official:
Is the above a more polite way of saying "I agree" or will I be carded?
I'm doubling and redoubling Adventureworks to scale up the test. I'll have another look at the execution plans at a million rows plus.
I'm curious, Seth: Did you capture any times or stats for your million row test?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 15, 2009 at 1:49 pm
Bob Hovious (9/15/2009)
Okay.... I knocked this out before taking off for lunch. Now anybody reading can satisfy themselves using Jonathan's measurement of time to execute, since he doesn't seem to think that logical reads count. There is probably a cleaner test harness setup, but I'm hungry and in a hurry.To avoid confusing screen display times with runtimes, I'm just tracking the milliseconds to write to a temporary table. On my laptop, the CTE/RowNumber solution ran in an average of 645 milliseconds and the Subquery solution averaged 1519 ms. That means the CTE solution was twice as fast for this test, which was only 120,000 rows.
As both queries return exactly the same result set, I prefer selecting to the screen over inserting into tempdb, where there may be many factors beyond one's control. A select used to insert may also generate a different execution plan, and I believe this sort of query is typically used for reporting, not inserting.
I consistently get results like this on my laptop:
Correlated subquery:
(31896 row(s) affected)
Table 'SalesOrderDetail'. Scan count 31466, logical reads 103038, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
1843ms
CTE:
(31896 row(s) affected)
Table 'SalesOrderDetail'. Scan count 1, logical reads 1238, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
1893ms
Even when I know the data is cached, I always run a test like this many times in a short time period. I left STATISTICS IO on to demonstrate that the data is fully cached. Perhaps you can learn form this correlation between the statistics and the actual timings; it seems that the scans in the first query differ from those in the second query. The timing are gotten by simply running this before the first batch:
DECLARE @t datetime;
PRINT 'Correlated subquery:';
SET @t = GETDATE();
This between the batches:
PRINT CAST(DATEDIFF(ms,@t,GETDATE()) AS varchar) + 'ms';
PRINT '';
PRINT 'CTE:';
SET @t = GETDATE();
And this after the last batch:
PRINT CAST(DATEDIFF(ms,@t,GETDATE()) AS varchar) + 'ms';
When I first started posting here, many years ago, I once wrote that a solution I posted was "better" than that of another poster. After some feedback, I found that his solution was faster with his data and my solution was faster with my data. The other poster was wise enough not to write that his solution was "better," and I learned from that.
I used to contribute often to this site, but years ago I stopped even reading these fora when I found many were posting for the sake of posting, i.e. quantity over quality. It was depressing to have to be the villain, always correcting others' posts. I logged in today because Steve wrote that he was (finally) making screen names unique (which we had discussed when he switched board software), and I wanted to see what my new name would be. It seems that many now prefer arguing to learning, so I think I will again take my leave.
--Jonathan
September 15, 2009 at 1:58 pm
where there may be many factors beyond one's control
This is exactly why you DON'T want screen display time in the mix. It includes system processes that have nothing to do with with the database performance.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 15, 2009 at 2:19 pm
Jonathan
I logged in today because Steve wrote that he was (finally) making screen names unique (which we had discussed when he switched board software), and I wanted to see what my new name would be. It seems that many now prefer arguing to learning, so I think I will again take my leave.
This is where we differ. I've always felt that you learned a lot more by 'arguing' your points than simply accepting what others are telling you. Whenever I present an opinion (or argument) on these boards, I know that it will be read by people who know FAR more than me, and thus I personally try to do much more research (Usually) than I would if I just accepted the other persons word for it.
Honestly, that's what sets these forums apart in my opinion than simply reading a book or a blog. Here you *can* challenge the opinion or the methods of others, and right or wrong, you can learn a lot by doing so. I've definitely put forth my opinion on many issues only to have Jeff/Gail/Matt/etc. come in and tell me I'm wrong and why.
September 15, 2009 at 2:20 pm
Bob Hovious (9/15/2009)
where there may be many factors beyond one's control
This is exactly why you DON'T want screen display time in the mix. It includes system processes that have nothing to do with with the database performance.
I hope you someday learn enough to be embarassed if you reread your posts. I'm out of here.
--Jonathan
September 15, 2009 at 2:21 pm
I've posted here for a number of years now, and have taken my share of lumps too. I doubt the process is over. But here is also where it was demonstrated to me that simple timings when displaying a list of rows can be misleading.
Now having said that, I am prepared to concede defeat at a million rows' volume, where a clustered index is in place to support the subquery. The cost of the sort is much too high at that level. I concede not because of anything you said, but because I just tested at that volume using measurements I trust... which is the one thing we all agree on. 😉
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 15, 2009 at 2:26 pm
Bob Hovious (9/15/2009)
I can't argue with "It depends."I'm curious, Seth: Did you capture any times or stats for your million row test?
I examined them, but unfortunately I didn't save them. I can tell you that all 3 methods(Your CTE/Jon's Subquery/My Derived Table) took approximately 38 seconds. In viewing this debate over logical reads, I'm reminded of another place that Logical reads are far higher in one method than the other. At the end of Jeff's article on Tally tables, the comparison of a tally table and a while loop:
Jeff Moden
The Tally table method wins for Duration, CPU time, and RowCounts. It's looses in Reads and that bothers some folks, but it shouldn't. Part of the reason the Tally table method works so well is because parts of the Tally table get cached and those are "Logical Reads" from memory.
September 15, 2009 at 3:35 pm
Jonathan (9/15/2009)
Bob Hovious (9/15/2009)
where there may be many factors beyond one's control
This is exactly why you DON'T want screen display time in the mix. It includes system processes that have nothing to do with with the database performance.
I hope you someday learn enough to be embarassed if you reread your posts. I'm out of here.
Ummmmm.... did you even try what Bob suggested just to see?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 15, 2009 at 4:51 pm
This conversation about logical reads and test code is going way over my head now. The row count in my app is quite small at about 10000, so for me it is neither here nor there which way I go. I implemented Bobs solution a couple of days ago and it works perfectly.
I've run into a whole bunch of new problems since then!
Thanks again Bob.
Cheers
Jeff
September 15, 2009 at 5:09 pm
You're welcome, new Jeff. But by all means, make note of the subquery solution as well. As I finally conceded, when your volumes get higher it will become a better performer at some point. One thing everyone agrees on is "Test...test... test"
One of the great things about SSC is that you can constantly learn new things, or get reminded of old things, because of disagreements. Arguments backed by numbers and tests that can be replicated are always educational. Some discussions become little seminars. Although I appear to have gotten under Jonathan's skin, I re-learned something today, and I'm grateful for that.
Jeff (Moden)... if memory serves, it was you who beat my brains out over an issue where I was using simple timing but displaying all the data rather than writing to a temp table. There must still be a bump on my head somewhere because I still remember. 😛
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 15, 2009 at 5:31 pm
Bob Hovious (9/15/2009)
Jeff (Moden)... if memory serves, it was you who beat my brains out over an issue where I was using simple timing but displaying all the data rather than writing to a temp table. There must still be a bump on my head somewhere because I still remember. 😛
BWAA-HAA! Now that you mention it, I believe it was me. Check the bump for residual pork chop grease. 😛
I've not tried any of the code on this particular thread but I did look at Jonathan's results from his notebook run... with such a huge difference in IO between the two methods but no real change between run duration, I'm thinking that (as so often happens with notebooks) the display may have become the "great equalizer". Of course, I could certainly be wrong on this particular thread because I've simply not tried the code.
I just went through this today... It looked like the final SELECT in a costing/report-building proc was taking 99 seconds but no one could optimize it nor get any benefit from any type of indexing. I short circuited the display to disk using an "INTO #tablename" and the run time went down from 1:39 (m:ss) to about 15 seconds. I'll follow that up tomorrow by getting someone in NY to run the proc from a more local machine to find out for sure if the "long haul" over the wire between NY and MI is the problem. As a sidebar, ya just gotta love output that's destined to become a spreadsheet... why they won't let us summarize the data for them is totally beyond me.
The extended run time also shows up in profiler which really confuses folks.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 15, 2009 at 6:53 pm
Jeff, make sure you go back and look at the code I posted again, as it has been revised. (Seth pointed out that I wasn't resetting the @timer.) So the results aren't nearly so drastic. The CTE is only about 70% of the subquery runtime at 120,000 rows.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
Viewing 15 posts - 31 through 45 (of 49 total)
You must be logged in to reply to this topic. Login to reply