March 25, 2008 at 9:07 pm
Thanks for doing the leg work, as usual 🙂
If they're all the same on the big tables and the OUTER JOIN is the slowest on smaller tables, why would you ever use the OUTER JOIN method to find the exceptions?
It's good to know what works best. I tend to only use except when I need to quickly identify what is missing or nonexistent from tables; It saves me on typing :D. In most cases, I use a mixture of outer join and not in, but after looking at the results I am asking myself. Why should I spend extra time typing when I can do it in less lines and the query will perform better. I think I will stick with not in and not exists from now on.
Thanks again.
March 25, 2008 at 9:26 pm
I think I will stick with not in and not exists from now on.
Heh... Except on Tuesdays... that's when the "SQL Run Fast" button is automatically recycled to overcome all the work it had to do from cursors and the like... 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
March 26, 2008 at 7:27 am
Hmmm.... I'm not so sure that the LEFT JOIN is so bad...
I'd say it's the better one of the four by a longshot.
This is the 'small table' examples from Adventureworks, and it's also with a hot cache.
The three first gets identical plans, the left join has a different plan.
A thing I noted was that the CPU times reported are flaky at best. I got between 16ms and 47ms for the left join query just by repeatedly executing it.
What's more interesting is the I/O load that these queries generate.
Have a look a these, and say again why one would sometimes use a left join... 😉
(btw the first three suffer even more with a cold cache)
-- WHERE NOT IN
Table 'WorkOrder'. Scan count 504, logical reads 1088, physical reads 0...
Table 'Product'. Scan count 1, logical reads 4, physical reads 0...
-- WHERE NOT EXISTS
Table 'WorkOrder'. Scan count 504, logical reads 1088, physical reads 0...
Table 'Product'. Scan count 1, logical reads 4, physical reads 0...
-- EXCEPT
Table 'WorkOrder'. Scan count 504, logical reads 1088, physical reads 0...
Table 'Product'. Scan count 1, logical reads 4, physical reads 0...
-- LEFT JOIN
Table 'WorkOrder'. Scan count 1, logical reads 101, physical reads 0...
Table 'Product'. Scan count 1, logical reads 15, physical reads 0...
Just my opinion, but given these numbers, I'd take the left join as the 'best in this case'.
/Kenneth
March 26, 2008 at 8:16 am
Sure, I get the same on the small test... times for the outer join vary between 16 ms and 67 ms with the norm being 47 ms... but look at the CPU times for the other 3 methods... they're all zero.
Guess it "depends" on what you value the most... speed, IO, or memory ("logical reads"). For me, using memory for almost 0 seconds is pretty much worth the speed advantage.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 26, 2008 at 8:25 am
Though cpu time is really of no concern - at least not in this case.
It's a volatile measure at best (as we've seen, since it fluctuates > 100% between times)
The optimizer couldn't care less about it, all that it knows and counts are I/O.
(btw, I managed to get all 4 at 15-16ms cpu)
Thus, what makes a query expensive or cheap is it's associated I/O cost.
In that respect, the left join beats the others by a mile 🙂
/Kenneth
March 26, 2008 at 12:24 pm
Yep... on the small stuff for sure.
Just a note, though... CPU time makes all the difference in the world to me... most of the 8 hour jobs at work that I've gotten down to 10-30 minutes were very CPU intensive and not so intensive on IO. For me, speed does matter because there's only so much CPU time available in a given 24 hour period.
Of course, IO matters as well... but usually logical reads are pretty easy because the data is already cached and, once there, is very fast. It's what makes things like the Tally table just beat the dickens out of CTE's that do the same thing.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 6 posts - 16 through 20 (of 20 total)
You must be logged in to reply to this topic. Login to reply