July 21, 2003 at 7:46 am
Hi all,
I would like to know, if any method exists to substitute a NOT IN comparison. Here is an example of the situation I encounter.
Table: EMP
EmpcodeSalary
10011500.00
10021250.00
Table: EMPLEAVE
EmpcodeNumDays
10013
10035
Here EMP is the master table and EMPLEAVE is the child table and they both can be linked using the column "Empcode". My requirement is to pick out those employee codes in EMPLEAVE that do not have a matching entry in EMP. The output should be like the one given below
EmpcodeNumDays
10035
I tried a sub-query coupled with a 'NOT IN' comparison and the query is given below for your convenience. This technique works, but when the number of rows in the sub-query is too high, the performance decreases (especially when we use TOP n) in the outer most query.
SELECT empleave.* FROM empleave WHERE empcode NOT IN (SELECT empcode FROM emp)
Is there any other way to get this output without using 'NOT IN'.
TIA
July 21, 2003 at 7:53 am
Often an outer join will do it.
Andy
July 21, 2003 at 7:59 am
Something like the following code is what you want.
SELECT EMPLEAVE.* FROM EMPLEAVE
left outer join EMP on EMPLEAVE.Empcode = EMP.Empcode
Where EMP.Empcode is null
Joseph
July 21, 2003 at 8:06 am
Even NOT EXISTS is better than NOT IN
SELECT empleave.* FROM empleave WHERE NOT EXISTS (SELECT * FROM emp WHERE Emp.EmpCode = empleave.EmpCode)
Guarddata-
July 22, 2003 at 2:40 am
Hi! I would say that EXISTS is not always better then IN. Ok, BOL says they are semantically equal but that does not mean they are the same performance wise...
BOL IN:
quote:
USE pubsSELECT pub_name
FROM publishers
WHERE pub_id IN
(SELECT pub_id
FROM titles
WHERE type = 'business')
(...) is evaluated in two steps. First, the inner query returns the identification numbers of the publishers that have published business books (1389 and 0736). Second, these values are substituted into the outer query (...)
What does it mean? Well, it means the inner query is just executed once.
BOL EXISTS:
quote:
USE pubsSELECT pub_name
FROM publishers
WHERE EXISTS
(SELECT *
FROM titles
WHERE pub_id = publishers.pub_id
AND type = 'business')
(...)first pub_id = 1389. Are there any rows in the titles table in which pub_id is 1389 and type is business? If so, Algodata Infosystems should be one of the values selected. The same process is repeated for each of the other publisher names
So, what does this mean? The query is executed for every outer record.
Semantics are important. While they produce the same result, the ways of doing this can differ tremendously.
Imagine table TITLES as a x GB table. Imagine the case when column 'pub_id' or 'type' aren't indexed. Say that type = 'business' returns 100 rows. In the first case (using the operator IN) we have to do on tablescan (with subsequest in-memory iterations).
In the second case we are left out with a table scan for every existance check.
So if the tables in the EXISTS subquery is badly indexed (or index use is impossible due to computations or similar) having a static precomputed IN clause can sometimes improve performance beyond belief.
True that EXISTS is more powerful in it's capability as a sub-query but it also has the same flaws as a sub-query. Hence, as always, it is almost impossible to say: Always Write Your Queries Like This.
I agree with Andy saying 'Often...' instead of 'Always...'
Regards, Hans!
Edited by - hanslindgren on 07/22/2003 02:41:26 AM
July 22, 2003 at 4:18 am
Often when using exists or in subqueries I use the "top 1" in the select so query eecution stops of first row retrieved.
July 22, 2003 at 5:24 am
AnzioBake:
quote:
Often when using exists or in subqueries I use the "top 1" in the select so query eecution stops of first row retrieved.
...which is unneeded since the execution still stops as soon as it can be evaluated as 'true'. The EXISTS query, still has to be executed.
It does not matter if you put SELECT *, SELECT TOP 1 * or SELECT TOP 1 1, it still just returns 'true' or 'false'. It runs the query until it can decide what to return Which in the case of SELECTs always is the first row found.
Regards, Hans!
July 22, 2003 at 3:37 pm
I agree with hanslindgren, if you look at the Execution Plan of both the "NOT EXISTS" and "NOT IN" you'll find it is exactly the same! sometimes on complex queries the optimiser will provide a plan slightly different for the two statements, leading to peoples opinions of one being faster than the other, depending on what the optimiser thinks is fastest.
As always - the user can optimise queries further by keeping an eye on the execution plans to make sure it is the most efficient execution of a specific statement.
July 22, 2003 at 3:51 pm
Thank you, gentlemen, for catching my mistake. Nice to have a second pair of eyes to keep out mistakes.
Guarddata-
July 25, 2003 at 1:46 am
You've had the alternatives in previous replies. Having Declarative Referential Integrity (DRI) in place would have prevented those rows from getting inserted.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply