Attempting to replace not in with left outer join

  • I'm attempting to improve a vendor created query.

    Sorry about the formatting and lack of test data - I've been looking at this longer than I care to admit.

    Here is the original -

    SELECT m.lID

    FROM Message m

    inner join Message_Cc mCC on m.lID=mCC.lMessage and mCC.lOfficeRecipient = 200321

    INNER JOIN UserRole d on mCC.szRecipient=d.szRoleName

    inner Join Map_UserAtOfficeToRole a2 on a2.lUserRole = d.lid AND d.nRecordStatus = 1

    INNER JOIN Map_UserToOffice b ON a2.lMap_UserToOffice = b.lID AND b.lOffice=200321

    INNER JOIN [User] c ON b.lUser = c.lID and c.szUsername='rogerb'

    where m.lID not in (SELECT lMessage

    FROM Message_CC_Role_Exception a

    WHERE a.OfficeRecipient = 200321

    AND a.Recipient = 'rogerb')

    (If I run the query without the entire not in statement, I get 648 rows. If I include it I get 3 rows.)

    My edit -

    SELECT m.lID FROM Message m

    inner join Message_Cc mCC ON m.lID = mCC.lMessage

    INNER JOIN UserRole d ON mCC.szRecipient = d.szRoleName

    inner Join Map_UserAtOfficeToRole a2 ON a2.lUserRole = d.lid

    INNER JOIN Map_UserToOffice b ON a2.lMap_UserToOffice = b.lID

    INNER JOIN [User] c ON b.lUser = c.lID

    LEFT OUTER JOIN Message_CC_Role_Exception exc ON exc.lMessage = m.lid

    where mCC.lOfficeRecipient = 200321

    AND d.nRecordStatus = 1

    AND b.lOffice=200321

    and c.szUsername='rogerb'

    and exc.lMessage is null

    If I run this without the LEFT OUTER JOIN and the is null statement I get 648 rows. But If I include it I get 0 rows. I can't understand why I get 0 rows with the outer join.

    Any help is greatly appreciated!

  • I think you should be left joining on all the conditions in the subquery.

    Try this and see if it is any different.

    SELECTm.lID

    FROMMessage m

    INNER JOIN Message_Cc mCC ON m.lID = mCC.lMessage

    AND mCC.lOfficeRecipient = 200321

    INNER JOIN UserRole d ON mCC.szRecipient = d.szRoleName

    INNER JOIN Map_UserAtOfficeToRole a2 ON a2.lUserRole = d.lid

    ANDd.nRecordStatus = 1

    INNER JOIN Map_UserToOffice b ON a2.lMap_UserToOffice = b.lID

    AND b.lOffice = 200321

    INNER JOIN [User] c ON b.lUser = c.lID

    AND c.szUsername = 'rogerb'

    left join Message_CC_Role_Exception a on a.lMessage = m.lID

    AND a.OfficeRecipient = 200321

    ANDa.Recipient = 'rogerb'

    WHEREa.lMessage is null

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • NOT EXISTS would possibly help:

    select m.lID

    from Message m

    join Message_Cc mCC on m.lID = mCC.lMessage

    and mCC.lOfficeRecipient = 200321

    join UserRole d on mCC.szRecipient = d.szRoleName

    join Map_UserAtOfficeToRole a2 on a2.lUserRole = d.lid

    and d.nRecordStatus = 1

    join Map_UserToOffice b on a2.lMap_UserToOffice = b.lID

    and b.lOffice = 200321

    join [User] c on b.lUser = c.lID

    and c.szUsername = 'rogerb'

    where not exists ( select 1

    from Message_CC_Role_Exception a

    where a.OfficeRecipient = 200321

    and a.Recipient = 'rogerb'

    and a.lMessage = m.lID )

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • As an aside, you will find that your SQL becomes more readable if you use meaningful aliases.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • If you're trying to improve performance, these may be of interest

    http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

    http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for all the replies.

    I tried Phils suggestion of the not exists with the select 1 and I get 0 rows returned.

    I'm expecting 2 rows.

    I also tried MadAdmins edits and while it certainly is much cleaner to me and makes perfect sense there is not improvement in performance. I guess my thinking was that I expected this query to run faster. At the end of the day it tries to return rows from 650 or so, that are not in 10000 (the subquery returns 10017 rows and the initial query returns 648).

    Returning the 648 rows is very fast and returning the 10017 rows is very fast but restricting the list of one not in the other seems to be the cost.

    SQL Server Execution Times:

    CPU time = 1172 ms, elapsed time = 1198 ms.

  • jackimo (4/15/2015)


    Thanks for all the replies.

    I tried Phils suggestion of the not exists with the select 1 and I get 0 rows returned.

    I'm expecting 2 rows.

    I also tried MadAdmins edits and while it certainly is much cleaner to me and makes perfect sense there is not improvement in performance. I guess my thinking was that I expected this query to run faster. At the end of the day it tries to return rows from 650 or so, that are not in 10000 (the subquery returns 10017 rows and the initial query returns 648).

    Returning the 648 rows is very fast and returning the 10017 rows is very fast but restricting the list of one not in the other seems to be the cost.

    SQL Server Execution Times:

    CPU time = 1172 ms, elapsed time = 1198 ms.

    You could try attaching the actual execution plan if you'd like to dig deeper.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Here are the two execution plans - one for the left outer join and one for the not exists.

    Thanks again for all your help!

  • jackimo (4/15/2015)


    Here are the two execution plans - one for the left outer join and one for the not exists.

    Thanks again for all your help!

    Both of the plans look fairly clean to me, though the actual rows vs estimated rows is heavily wrong in a few places, suggesting that your statistics may be stale.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Query tuning really isn't my forte, but the interesting thing is that I can break this query down into 3 steps (1 first small select, 2nd larger select and then the outer join)

    First select (~650 rows)

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 10 ms.

    Second select (~10000 rows)

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 12 ms.

    Left Outer Join (using tables that I inserted the results of the two select statements)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 8 ms.

    But if I run the original query I end up with;

    SQL Server Execution Times:

    CPU time = 1281 ms, elapsed time = 1272 ms.

    Thinking out loud - should I break the query up using temp tables?

    I you have any other thoughts please let me know.

    Thanks!

  • Another interesting bit is that if I use the left outer join with the query hint - OPTION (HASH JOIN), the query runs in about 1/3 the time.

  • jackimo (4/15/2015)


    Query tuning really isn't my forte, but the interesting thing is that I can break this query down into 3 steps (1 first small select, 2nd larger select and then the outer join)

    First select (~650 rows)

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 10 ms.

    Second select (~10000 rows)

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 12 ms.

    Left Outer Join (using tables that I inserted the results of the two select statements)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 8 ms.

    But if I run the original query I end up with;

    SQL Server Execution Times:

    CPU time = 1281 ms, elapsed time = 1272 ms.

    Thinking out loud - should I break the query up using temp tables?

    I you have any other thoughts please let me know.

    Thanks!

    If you get it to run faster, I would, but then again I do things different that most of the people here. Did you try creating this query using CTE?

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • jackimo (4/16/2015)


    Another interesting bit is that if I use the left outer join with the query hint - OPTION (HASH JOIN), the query runs in about 1/3 the time.

    Which makes it sound like the optimiser is getting the cardinality wrong. Are your statistics up to date?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I tried it with the left outer join as previously suggested.

    I tried it with CTEs the two selects then the join.

    I tried it with table variable the two selects then the join.

    then I tried it with temp tables (two temp tables then the join) and boom, total execution goes way way down.

    not sure what is going on here but I guess I should be happy with the improvement.

  • Can you answer Gail's question?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply