NOT IN vs NOT EXISTS

  • Perhaps my initial post mislead people, but the query in my initial post wasn't what was being "debated", when i was informed that NOT EXISTS is "the worst option", it was a totally different query in a different environment; I simply wasn't in a position to test when the discussion came up. It's something that I finally found the time to do yesterday, and was unsurprised by my findings (that in truth there is little difference for the performance of NOT IN and NOT EXISTS, but that NOT EXISTS (for me) was the fastest answer).

    Thus what I was asking here was more if those (far) more educated that me were of the same belief that NOT IN > NOT EXISTS < all other methods, or if I was being fed utter nonsense and told to believe based on the "experts" self opinion (sorry, I probably wasn't clear on that). I've not had the pleasure of talking to the person since, but wanted to find out if there was any evidence to his claim. The topic here, however, strongly suggests not; which was my initial understanding.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Thursday, October 25, 2018 4:59 AM

    Perhaps my initial post mislead people, but the query in my initial post wasn't what was being "debated", when i was informed that NOT EXISTS is "the worst option", it was a totally different query in a different environment; I simply wasn't in a position to test when the discussion came up. It's something that I finally found the time to do yesterday, and was (un)surprised by my findings (that in truth there is little difference for the performance of NOT IN and NOT EXISTS, but that NOT EXISTS (for me) was the fastest answer).

    Thus what I was asking here was more if those (far) more educated that me were of the same belief that NOT IN > NOT EXISTS < all other methods, or if I was being fed utter nonsense and told to believe based on the "experts" self opinion (sorry, I probably wasn't clear on that). I've not had the pleasure of talking to the person since, but wanted to find out if there was any evidence to his claim. The topic here, however, strongly suggests not; which was my initial understanding.

    The optimiser should work out that the NOT EXISTS and NOT IN (or EXISTS and IN) need the same execution plan. So there should be no difference. You need to check that the same execution plan is generated for both.

  • Thom A - Thursday, October 25, 2018 4:59 AM

    Perhaps my initial post mislead people, but the query in my initial post wasn't what was being "debated", when i was informed that NOT EXISTS is "the worst option", it was a totally different query in a different environment; I simply wasn't in a position to test when the discussion came up. It's something that I finally found the time to do yesterday, and was unsurprised by my findings (that in truth there is little difference for the performance of NOT IN and NOT EXISTS, but that NOT EXISTS (for me) was the fastest answer).

    Thus what I was asking here was more if those (far) more educated that me were of the same belief that NOT IN > NOT EXISTS < all other methods, or if I was being fed utter nonsense and told to believe based on the "experts" self opinion (sorry, I probably wasn't clear on that). I've not had the pleasure of talking to the person since, but wanted to find out if there was any evidence to his claim. The topic here, however, strongly suggests not; which was my initial understanding.

    It seems to me everyone agrees that there is no evidence for his theory.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin - Thursday, October 25, 2018 5:11 AM

    Thom A - Thursday, October 25, 2018 4:59 AM

    Perhaps my initial post mislead people, but the query in my initial post wasn't what was being "debated", when i was informed that NOT EXISTS is "the worst option", it was a totally different query in a different environment; I simply wasn't in a position to test when the discussion came up. It's something that I finally found the time to do yesterday, and was unsurprised by my findings (that in truth there is little difference for the performance of NOT IN and NOT EXISTS, but that NOT EXISTS (for me) was the fastest answer).

    Thus what I was asking here was more if those (far) more educated that me were of the same belief that NOT IN > NOT EXISTS < all other methods, or if I was being fed utter nonsense and told to believe based on the "experts" self opinion (sorry, I probably wasn't clear on that). I've not had the pleasure of talking to the person since, but wanted to find out if there was any evidence to his claim. The topic here, however, strongly suggests not; which was my initial understanding.

    It seems to me everyone agrees that there is no evidence for his theory.

    Without evidence, a theory is all it is.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I have seen it said that EXISTS/NOT EXISTS can be faster because it can "short circuit". That is, as soon as it find a case satisfying its condition, it's done, doesn't have to look any farther, and with an appropriate index it can be more efficient because it may be able to search fewer data pages. Is that true? Seems to make sense, and if you're seeing EXISTS/NOT EXIST having better performance, maybe that's why.

  • dmbaker - Thursday, October 25, 2018 7:13 AM

    I have seen it said that EXISTS/NOT EXISTS can be faster because it can "short circuit". That is, as soon as it find a case satisfying its condition, it's done, doesn't have to look any farther, and with an appropriate index it can be more efficient because it may be able to search fewer data pages. Is that true? Seems to make sense, and if you're seeing EXISTS/NOT EXIST having better performance, maybe that's why.

    Here's an example, where it's stated: 

    "How does SQL Server process an EXISTS expression? To put it as simply as possible: data processing stops as soon as the condition is met, and only as much data is actually read as is needed to test whether the condition is satisfied by *any* row since no rows need to be returned to the client. This means that if an appropriate index exists, the database engine (theoretically) only needs to access a single index page."

    https://milambda.blogspot.com/2006/10/exists-or-not-exists-that-is-question.html

  • Jeffrey Williams 3188 - Wednesday, October 24, 2018 2:50 PM

    One thing to take into consideration is that the first time you ran the test - the NOT IN actually loaded the buffer cache which is probably the majority of the performance hit.  Look at the read-ahead reads for icp_brpolicy and icp_brcledger...

    The second execution of the NOT IN with the added parameter does not have those reads.

    What happens if you switch up the order and run the NOT EXISTS first and then NOT IN?

    Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 1794, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_brcledger'. Scan count 5, logical reads 194280, physical reads 1, read-ahead reads 186739, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_yyclient'. Scan count 8, logical reads 696, physical reads 1, read-ahead reads 628, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Gave that a quick go, Exists is still faster:
    NOT EXISTS clause

    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 0 ms.

    (3168 rows affected)
    Table 'icp_brcledger'. Scan count 38058, logical reads 167729, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_yyclient'. Scan count 8, logical reads 696, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 875 ms, elapsed time = 237 ms.
    NOT IN clause

    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 0 ms.

    (3168 rows affected)
    Table 'icp_brcledger'. Scan count 38058, logical reads 206068, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_yyclient'. Scan count 8, logical reads 696, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 1046 ms, elapsed time = 277 ms.

    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 0 ms.

    Jonathan AC Roberts - Thursday, October 25, 2018 5:11 AM

    The optimiser should work out that the NOT EXISTS and NOT IN (or EXISTS and IN) need the same execution plan. So there should be no difference. You need to check that the same execution plan is generated for both.

    Depending on the query, yes, there is no difference:

    Thom A - Wednesday, October 24, 2018 8:19 AM

    The NOT IN (when using BTX.Polref@ = BPY.Polref@) and NOT EXISTS have identical plans, the LEFT JOIN has a different one.The estimated plan does differ if the BTX.Polref@ = BPY.Polref@ isn't included in the NOT IN, there are additional parallelism added and the Clustered Index Seek on the Ledger changes to a Clustered Index Scan.

    As we seem to have concluded, I suspect that the "expert" was either trying to lead me down the rabbit hole, or they believe what they believe, and no matter what differing opinions they are given, or evidence they receive, their opinion will never change (I suspect the latter, based on their attitude).

    This certainly, however, puts me at rest that I'm fine with using EXISTS, and (at least for me) seems to actually be the better performer.

    Thanks all for the conversation, appreciate everyone's input.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Thursday, October 25, 2018 4:59 AM

    Perhaps my initial post mislead people, but the query in my initial post wasn't what was being "debated", when i was informed that NOT EXISTS is "the worst option", it was a totally different query in a different environment; I simply wasn't in a position to test when the discussion came up. It's something that I finally found the time to do yesterday, and was unsurprised by my findings (that in truth there is little difference for the performance of NOT IN and NOT EXISTS, but that NOT EXISTS (for me) was the fastest answer).

    Thus what I was asking here was more if those (far) more educated that me were of the same belief that NOT IN > NOT EXISTS < all other methods, or if I was being fed utter nonsense and told to believe based on the "experts" self opinion (sorry, I probably wasn't clear on that). I've not had the pleasure of talking to the person since, but wanted to find out if there was any evidence to his claim. The topic here, however, strongly suggests not; which was my initial understanding.

    I can't speak for anyone else but that was exactly what I got out of your post, Thom.  That and the fact that fellow really needs a calibration session.  While there are certainly some good "beliefs" to help guide someone in their programming efforts, it always boils down to just one thing... "It Depends".  It depends on the data that's available, the structure of the data and the related indexes and, of course, the problem at hand.  The only proof of what works best is the results of a good test.

    As for "proof of evidence to his  claim", he made the claim... he should provide the proof.  In the absence of that and the presence of his unmitigated ring knocking, you did a test yourself and know you know more of the possibilities and have confirmed what you already knew... in the absence of proof, "expert opinion" is still just an opinion and should only be used as a suggestion as to what may be possible.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The [NOT] EXISTS() is one of the few we've got in SQL. The can return only TRUE and FALSE; most of the others can return an UNKNOWN logical value. The [NOT] IN () predicate is defined as a string of OR-ed predicates in a shorthand. When the list includes a null, you can give an UNKNOWN risk logical value as a result. These two predicates are not the same
    .
    Existence was generally faster because the first test that is TRUE short-circuits the evaluation and returns a TRUE result. You might also want to look at the little used "<exp> <comp op> [ALL | SOME |ANY] <columnar table exp>" predicates.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Thom A - Thursday, October 25, 2018 7:22 AM

    Jeffrey Williams 3188 - Wednesday, October 24, 2018 2:50 PM

    One thing to take into consideration is that the first time you ran the test - the NOT IN actually loaded the buffer cache which is probably the majority of the performance hit.  Look at the read-ahead reads for icp_brpolicy and icp_brcledger...

    The second execution of the NOT IN with the added parameter does not have those reads.

    What happens if you switch up the order and run the NOT EXISTS first and then NOT IN?

    Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 1794, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_brcledger'. Scan count 5, logical reads 194280, physical reads 1, read-ahead reads 186739, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_yyclient'. Scan count 8, logical reads 696, physical reads 1, read-ahead reads 628, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Gave that a quick go, Exists is still faster:
    NOT EXISTS clause

    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 0 ms.

    (3168 rows affected)
    Table 'icp_brcledger'. Scan count 38058, logical reads 167729, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_yyclient'. Scan count 8, logical reads 696, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 875 ms, elapsed time = 237 ms.
    NOT IN clause

    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 0 ms.

    (3168 rows affected)
    Table 'icp_brcledger'. Scan count 38058, logical reads 206068, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_brpolicy'. Scan count 5, logical reads 26052, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'icp_yyclient'. Scan count 8, logical reads 696, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 1046 ms, elapsed time = 277 ms.

    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 0 ms.

    Depending on the query, yes, there is no difference:

    As we seem to have concluded, I suspect that the "expert" was either trying to lead me down the rabbit hole, or they believe what they believe, and no matter what differing opinions they are given, or evidence they receive, their opinion will never change (I suspect the latter, based on their attitude).

    This certainly, however, puts me at rest that I'm fine with using EXISTS, and (at least for me) seems to actually be the better performer.

    Thanks all for the conversation, appreciate everyone's input.

    I figured they would be almost the same either way - but your first test showed that NOT IN ran significantly longer than the NOT EXISTS.  This shows that they are almost the same which is what I expected.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • jcelko212 32090 - Thursday, October 25, 2018 3:46 PM

    The [NOT] EXISTS() is one of the few we've got in SQL. The can return only TRUE and FALSE; most of the others can return an UNKNOWN logical value. The [NOT] IN () predicate is defined as a string of OR-ed predicates in a shorthand. When the list includes a null, you can give an UNKNOWN risk logical value as a result. These two predicates are not the same
    .
    Existence was generally faster because the first test that is TRUE short-circuits the evaluation and returns a TRUE result. You might also want to look at the little used "<exp> <comp op> [ALL | SOME |ANY] <columnar table exp>" predicates.

    Good reminder, Joe.  I wonder how many people have never found that out the hard way because of the "silent failure".  😀  To your very good point,  here's a quick little demo of what can happen with NOT IN when it bumps into a NULL... and the work around for it.


    -- DROP TABLE #Source,#Lookup
     SELECT SomeInt = 1
       INTO #Lookup
      UNION ALL SELECT 2
      UNION ALL SELECT NULL
      UNION ALL SELECT 4
      UNION ALL SELECT 11
      UNION ALL SELECT 12
    ;
     SELECT TOP 10 SomeInt = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
       INTO #Source
       FROM sys.all_columns
    ;
    --===== This works just fine with the understanding that NULLs
         -- just aren't going to appear.
     SELECT l.SomeInt
       FROM #Lookup l
      WHERE l.SomeInt IN (SELECT s.SomeInt FROM #Source s)
    ;
    --===== So does this and with the same info about NULLs.
     SELECT l.SomeInt
       FROM #Lookup l
      WHERE l.SomeInt NOT IN (SELECT s.SomeInt FROM #Source s)
    ;
    ---------------------------------------------------------------------------------------
    --===== Now we add a NULL to the #Source table.
     INSERT INTO #Source
            (SomeInt)
     SELECT NULL
    ;
    --===== This still works just fine...
     SELECT l.SomeInt
       FROM #Lookup l
      WHERE l.SomeInt IN (SELECT s.SomeInt FROM #Source s)
    ;
    --===== But this does not.  It suddenly returns nothing
         -- and it does so without generating an error.
     SELECT l.SomeInt
       FROM #Lookup l
      WHERE l.SomeInt NOT IN (SELECT s.SomeInt FROM #Source s)
    ;
    ---------------------------------------------------------------------------------------
    --===== Of course, if you know about that "little" problem,
         -- then you also know how to fix it because WHERE NOT IN
         -- is damned convenient to use. ;-)
         -- And, of course, don't forget that NULL rows still
         -- won't be returned.
     SELECT l.SomeInt
       FROM #Lookup l
      WHERE l.SomeInt NOT IN (SELECT s.SomeInt FROM #Source s WHERE s.SomeInt IS NOT NULL)
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jcelko212 32090 - Thursday, October 25, 2018 3:46 PM

    Existence was generally faster because the first test that is TRUE short-circuits the evaluation and returns a TRUE result. You might also want to look at the little used "<exp> <comp op> [ALL | SOME |ANY] <columnar table exp>" predicates.

    Can't say I'd ever hard of that Syntax Joe. For those that haven't either you can find the documentation here: SOME | ANY (Transact-SQL) and ALL (Transact-SQL).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • dmbaker - Thursday, October 25, 2018 7:13 AM

    I have seen it said that EXISTS/NOT EXISTS can be faster because it can "short circuit". That is, as soon as it find a case satisfying its condition, it's done, doesn't have to look any farther, and with an appropriate index it can be more efficient because it may be able to search fewer data pages. Is that true? Seems to make sense, and if you're seeing EXISTS/NOT EXIST having better performance, maybe that's why.

    dmbaker could this be row goals? Seems to fit the description.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • IN vs EXISTS: https://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
    NOT IN vs NOT EXISTS (beware the behaviour difference with NULLs) https://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thom A - Friday, October 26, 2018 1:56 AM

    jcelko212 32090 - Thursday, October 25, 2018 3:46 PM

    Existence was generally faster because the first test that is TRUE short-circuits the evaluation and returns a TRUE result. You might also want to look at the little used "<exp> <comp op> [ALL | SOME |ANY] <columnar table exp>" predicates.

    Can't say I'd ever hard of that Syntax Joe. For those that haven't either you can find the documentation here: SOME | ANY (Transact-SQL) and ALL (Transact-SQL).

    I've heard of it, but I've never found a reason to use it given that SOME/ANY is logically equivalent to EXISTS and ALL can be expressed as NOT EXISTS.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply