Which Query Is Performs Better

  • I'm trying to determine which query is better and when I say better, I mean will perform faster. I'm comparing data in two tables by using "not in" vs. "left join". My two queries are as follows:

    select distinct tablea.docid, tablea.docname

    from tablea

    left join tableb on

    tablea.parentid = tableb.docid

    where tableb.docid is null

    select distinct tablea.docid, tablea.docname

    from tablea

    where parentid not in (select docid from tableb)

    I can see how the execution plans are different, but I'm not sure I can interpret these to determine which one is better. I'm guessing the left join is better, but I don't know if the nested loop I see in my execution plan is good or bad. Maybe I need a lesson in reading the execution plans...

  • K Currie (12/4/2007)


    I'm trying to determine which query is better and when I say better, I mean will perform faster. I'm comparing data in two tables by using "not in" vs. "left join". My two queries are as follows:

    select distinct tablea.docid, tablea.docname

    from tablea

    left join tableb on

    tablea.parentid = tableb.docid

    where tableb.docid is null

    select distinct tablea.docid, tablea.docname

    from tablea

    where parentid not in (select docid from tableb)

    I can see how the execution plans are different, but I'm not sure I can interpret these to determine which one is better. I'm guessing the left join is better, but I don't know if the nested loop I see in my execution plan is good or bad. Maybe I need a lesson in reading the execution plans...

    Why don't you just test it to find out?

    Unless your server is jammed up on a specific resource - fastest usually =best in my mind. Assuming they're about the same speed - I then check resources.

    There's a third syntax to check out - not EXISTS.

    Try this to see:

    declare @g datetime

    select @g=getdate()

    select distinct tablea.docid, tablea.docname

    from tablea

    left join tableb on

    tablea.parentid = tableb.docid

    where tableb.docid is null

    select datediff(ms,@g,getdate())

    select @g=getdate()

    select distinct

    tablea.docid,

    tablea.docname

    from

    tablea

    where

    parentid not in (select docid from tableb)

    select datediff(ms,@g,getdate())

    select @g=getdate()

    select distinct

    tablea.docid,

    tablea.docname

    from

    tablea

    where

    NOT EXISTS (select docid from tableb where tableb.docid=tablea.docid)

    select datediff(ms,@g,getdate())

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • From my experience the LEFT JOIN typically performs better, but in reality it depends. I will usually start with the LEFT JOIN and only change it if the reads seem high for what I'm trying to do. Your best bet is to follow Matt's advice and to test the difference for each scenario. There is no hard and fast rule where one will always out perform the other.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks. Both of my queries took less than a second to run so I couldn't tell the difference, but according to your calculation the milliseconds are being displayed, right? And in my case, the left join runs the fastest. Not Exists is next, but double the time and the not in is the worst, triple the time. (I guess I could have turned on profiler too.)

    Although this does answer my question, can anyone tell me more info regarding the execution plans? If I reviewed each plan, how would I know which way was better? That would help in reviewing more complex queries.

  • Post the plans and we can take a peek at them. Also, run each query in query analyzer using SET STATISTICS IO ON. Make sure to clean out the buffers and cache in between runs. Look at the logical reads value and compare the two.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I will add that unless you have a significant amount of test data you may not be able to tell a difference becuae of the way data can be in the cache making and reducing reads. Generally LEFT JOIN and NOT EXISTS will out perform NOT IN. But I will say from experience you sometimes can get unexpected occurrances just becuase. I alwasy test on a copy of the production data each method to be sure I don't overlook some un thought of advantage one may present.

  • Also it is common to see DISTINCT replaced by GROUP BY becuase of the way each is handled GROUP BY generally will perform faster. As well, you should always use the schema owner. Sorry I knew the answer as to why but forget and don't have an articl I can point you too right now.

    SELECT

    A.docid

    , A.docname

    FROM

    dbo.tablea A

    LEFT JOIN

    dbo.tableb B

    ON

    A.parentid = B.docid

    WHERE

    B.docid is null

    GROUP BY

    A.docid

    , A.docname

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply