Alternate technique to "NOT IN" operator

  • Hi all,

    I would like to know, if any method exists to substitute a NOT IN comparison. Here is an example of the situation I encounter.

    Table: EMP

    EmpcodeSalary

    10011500.00

    10021250.00

    Table: EMPLEAVE

    EmpcodeNumDays

    10013

    10035

    Here EMP is the master table and EMPLEAVE is the child table and they both can be linked using the column "Empcode". My requirement is to pick out those employee codes in EMPLEAVE that do not have a matching entry in EMP. The output should be like the one given below

    EmpcodeNumDays

    10035

    I tried a sub-query coupled with a 'NOT IN' comparison and the query is given below for your convenience. This technique works, but when the number of rows in the sub-query is too high, the performance decreases (especially when we use TOP n) in the outer most query.

    SELECT empleave.* FROM empleave WHERE empcode NOT IN (SELECT empcode FROM emp)

    Is there any other way to get this output without using 'NOT IN'.

    TIA

  • Something like the following code is what you want.

    SELECT EMPLEAVE.* FROM EMPLEAVE

    left outer join EMP on EMPLEAVE.Empcode = EMP.Empcode

    Where EMP.Empcode is null


    Joseph

  • Even NOT EXISTS is better than NOT IN

    SELECT empleave.* FROM empleave WHERE NOT EXISTS (SELECT * FROM emp WHERE Emp.EmpCode = empleave.EmpCode)

    Guarddata-

  • Hi! I would say that EXISTS is not always better then IN. Ok, BOL says they are semantically equal but that does not mean they are the same performance wise...

    BOL IN:

    quote:


    USE pubs

    SELECT pub_name

    FROM publishers

    WHERE pub_id IN

    (SELECT pub_id

    FROM titles

    WHERE type = 'business')

    (...) is evaluated in two steps. First, the inner query returns the identification numbers of the publishers that have published business books (1389 and 0736). Second, these values are substituted into the outer query (...)


    What does it mean? Well, it means the inner query is just executed once.

    BOL EXISTS:

    quote:


    USE pubs

    SELECT pub_name

    FROM publishers

    WHERE EXISTS

    (SELECT *

    FROM titles

    WHERE pub_id = publishers.pub_id

    AND type = 'business')

    (...)first pub_id = 1389. Are there any rows in the titles table in which pub_id is 1389 and type is business? If so, Algodata Infosystems should be one of the values selected. The same process is repeated for each of the other publisher names


    So, what does this mean? The query is executed for every outer record.

    Semantics are important. While they produce the same result, the ways of doing this can differ tremendously.

    Imagine table TITLES as a x GB table. Imagine the case when column 'pub_id' or 'type' aren't indexed. Say that type = 'business' returns 100 rows. In the first case (using the operator IN) we have to do on tablescan (with subsequest in-memory iterations).

    In the second case we are left out with a table scan for every existance check.

    So if the tables in the EXISTS subquery is badly indexed (or index use is impossible due to computations or similar) having a static precomputed IN clause can sometimes improve performance beyond belief.

    True that EXISTS is more powerful in it's capability as a sub-query but it also has the same flaws as a sub-query. Hence, as always, it is almost impossible to say: Always Write Your Queries Like This.

    I agree with Andy saying 'Often...' instead of 'Always...'

    Regards, Hans!

    Edited by - hanslindgren on 07/22/2003 02:41:26 AM

  • Often when using exists or in subqueries I use the "top 1" in the select so query eecution stops of first row retrieved.

  • AnzioBake:

    quote:


    Often when using exists or in subqueries I use the "top 1" in the select so query eecution stops of first row retrieved.


    ...which is unneeded since the execution still stops as soon as it can be evaluated as 'true'. The EXISTS query, still has to be executed.

    It does not matter if you put SELECT *, SELECT TOP 1 * or SELECT TOP 1 1, it still just returns 'true' or 'false'. It runs the query until it can decide what to return Which in the case of SELECTs always is the first row found.

    Regards, Hans!

  • I agree with hanslindgren, if you look at the Execution Plan of both the "NOT EXISTS" and "NOT IN" you'll find it is exactly the same! sometimes on complex queries the optimiser will provide a plan slightly different for the two statements, leading to peoples opinions of one being faster than the other, depending on what the optimiser thinks is fastest.

    As always - the user can optimise queries further by keeping an eye on the execution plans to make sure it is the most efficient execution of a specific statement.

  • Thank you, gentlemen, for catching my mistake. Nice to have a second pair of eyes to keep out mistakes.

    Guarddata-

  • You've had the alternatives in previous replies. Having Declarative Referential Integrity (DRI) in place would have prevented those rows from getting inserted.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply