December 4, 2007 at 9:12 am
I'm trying to determine which query is better and when I say better, I mean will perform faster. I'm comparing data in two tables by using "not in" vs. "left join". My two queries are as follows:
select distinct tablea.docid, tablea.docname
from tablea
left join tableb on
tablea.parentid = tableb.docid
where tableb.docid is null
select distinct tablea.docid, tablea.docname
from tablea
where parentid not in (select docid from tableb)
I can see how the execution plans are different, but I'm not sure I can interpret these to determine which one is better. I'm guessing the left join is better, but I don't know if the nested loop I see in my execution plan is good or bad. Maybe I need a lesson in reading the execution plans...
December 4, 2007 at 9:39 am
K Currie (12/4/2007)
I'm trying to determine which query is better and when I say better, I mean will perform faster. I'm comparing data in two tables by using "not in" vs. "left join". My two queries are as follows:select distinct tablea.docid, tablea.docname
from tablea
left join tableb on
tablea.parentid = tableb.docid
where tableb.docid is null
select distinct tablea.docid, tablea.docname
from tablea
where parentid not in (select docid from tableb)
I can see how the execution plans are different, but I'm not sure I can interpret these to determine which one is better. I'm guessing the left join is better, but I don't know if the nested loop I see in my execution plan is good or bad. Maybe I need a lesson in reading the execution plans...
Why don't you just test it to find out?
Unless your server is jammed up on a specific resource - fastest usually =best in my mind. Assuming they're about the same speed - I then check resources.
There's a third syntax to check out - not EXISTS.
Try this to see:
declare @g datetime
select @g=getdate()
select distinct tablea.docid, tablea.docname
from tablea
left join tableb on
tablea.parentid = tableb.docid
where tableb.docid is null
select datediff(ms,@g,getdate())
select @g=getdate()
select distinct
tablea.docid,
tablea.docname
from
tablea
where
parentid not in (select docid from tableb)
select datediff(ms,@g,getdate())
select @g=getdate()
select distinct
tablea.docid,
tablea.docname
from
tablea
where
NOT EXISTS (select docid from tableb where tableb.docid=tablea.docid)
select datediff(ms,@g,getdate())
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 4, 2007 at 10:54 am
From my experience the LEFT JOIN typically performs better, but in reality it depends. I will usually start with the LEFT JOIN and only change it if the reads seem high for what I'm trying to do. Your best bet is to follow Matt's advice and to test the difference for each scenario. There is no hard and fast rule where one will always out perform the other.
December 4, 2007 at 10:56 am
Thanks. Both of my queries took less than a second to run so I couldn't tell the difference, but according to your calculation the milliseconds are being displayed, right? And in my case, the left join runs the fastest. Not Exists is next, but double the time and the not in is the worst, triple the time. (I guess I could have turned on profiler too.)
Although this does answer my question, can anyone tell me more info regarding the execution plans? If I reviewed each plan, how would I know which way was better? That would help in reviewing more complex queries.
December 4, 2007 at 11:00 am
Post the plans and we can take a peek at them. Also, run each query in query analyzer using SET STATISTICS IO ON. Make sure to clean out the buffers and cache in between runs. Look at the logical reads value and compare the two.
December 4, 2007 at 11:12 am
I will add that unless you have a significant amount of test data you may not be able to tell a difference becuae of the way data can be in the cache making and reducing reads. Generally LEFT JOIN and NOT EXISTS will out perform NOT IN. But I will say from experience you sometimes can get unexpected occurrances just becuase. I alwasy test on a copy of the production data each method to be sure I don't overlook some un thought of advantage one may present.
December 4, 2007 at 11:18 am
Also it is common to see DISTINCT replaced by GROUP BY becuase of the way each is handled GROUP BY generally will perform faster. As well, you should always use the schema owner. Sorry I knew the answer as to why but forget and don't have an articl I can point you too right now.
SELECT
A.docid
, A.docname
FROM
dbo.tablea A
LEFT JOIN
dbo.tableb B
ON
A.parentid = B.docid
WHERE
B.docid is null
GROUP BY
A.docid
, A.docname
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply