Avoiding NOT EXISTS or NOT IN

Question

Avoiding NOT EXISTS or NOT IN

billk

SSC Veteran

Points: 285
More actions
September 23, 2003 at 4:50 am

#176721

I have 2 tables whith the same structure
and i'want to find all the values from table1 where not exists in table2
Table1(field1,field2,field3)
Table2(field1,field2,field3)
SQL with NOT EXISTS:
SELECT
field1,field2,field3
FROM Table1, Tabl2
WHERE NOT EXISTS
(
SELECT field1,field2,field3
FROM Table1
WHERE table1.field1=table2.field1
AND table1.field2=table2.field2
AND table1.field3=table2.field3
)
AND table1.field1=table2.field1
how can avoid the NOT EXISTs for performance reasons?
Thanks
Bill

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

David Burrows SSC Guru Points: 65143 More actions · Answer 1

I prefer joins but don't know if performance is better though.

SELECT t1.field1,t1.field2,t1.field3 

FROM Table1 t1 

LEFT OUTER JOIN Table2 t2 

ON t2.field1 = t1.field1 

AND t2.field2 = t1.field2 

AND t2.field3 = t1.field3 

WHERE t2.field1 IS NULL

Far away is close at hand in the images of elsewhere.
Anon.

NPeeters SSChampion Points: 12554 More actions · Answer 2

I don't believe there is a performance difference. The query optimiser should take care of this.

Bill, I would like to point out that your query is not entirely correct.



SELECT field1,field2,field3 

FROM Table1

WHERE NOT EXISTS

(

     SELECT Table2.field1,

            Table2.field2,

            Table2.field3 

     FROM Table2

     WHERE table1.field1=table2.field1

       AND table1.field2=table2.field2

       AND table1.field3=table2.field3

)

Two changes :

1. Remove the implicit (old style join) from the outer query.

2. Add explicitly the table qualifier in the NOT EXISTS clause.

Don't know if this makes any difference, but it is worth a try.

jollyguy4all SSC Veteran Points: 270 More actions · Answer 3

Why do you need to do a NOT EXIST or Join queries.

I too have a similar situation where we create labels file for our subscribers every month and if we need to know who are the new subscribers this month then we have to find out the records in the current lables file which were not in the last months.

what is do is as follows:

select * from labels_19_jul_2003 where id not in (select id from labels_19_jun_2003)

and get what i want it can also work vice versa.

Hope this helps

Regards

Hitendra

frehan Say Hey Kid Points: 673 More actions · Answer 4

Maybe not a direct answer to your question but I've used this construction

SELECT field1,field2,field3

FROM

(SELECT field1, field2, field3, 1 as mask FROM Table1

UNION ALL

SELECT field1, field2, field3, 2 as mask FROM Table2) AS Tab

GROUP BY field1, field2, field3 HAVING SUM(tab.mask) = 1

Performance is most likely not better but this can be used with more tables and

it is very easy to find records that exists in table1 and table2 but not in table3 (see below)

SELECT field1,field2,field3

FROM

(SELECT field1, field2, field3, 1 as mask FROM Table1

UNION ALL

SELECT field1, field2, field3, 2 as mask FROM Table2

UNION ALL

SELECT field1, field2, field3, 4 as mask FROM Table3) AS Tab

GROUP BY field1, field2, field3 HAVING SUM(tab.mask) = 3

/Fredrik

David Burrows SSC Guru Points: 65143 More actions · Answer 5

jollyguy4all,

quote:
select * from labels_19_jul_2003 where id not in (select id from labels_19_jun_2003)

I used to do it this way but found that performance can be poor, especially the larger the data set. I found using joins a vast improvement, not always but most of the time.

Far away is close at hand in the images of elsewhere.
Anon.

cneuhold Hall of Fame Points: 3375 More actions · Answer 6

hi guys!

concerning: "outer-joins w/is null" vs. "not in" - to my experience if you've got larger tables the performance of the outer-join w/is null soluting significantly degrades compared to the "not in" solution.

best regards,

chris.

NPeeters SSChampion Points: 12554 More actions · Answer 7

I believe the performance of either three solutions is highly dependant on the data that is contained in the tables.

1. The number of records in both tables

2. The number of records that match in both tables

E.g. the NOT IN solution probably won't yield good performance if Table2 contains a lot of records.

The Outer Join and NOT EXISTS solution probably won't yield good performance if both tables have a lot of matching rows.

lambje SSCrazy Points: 2085 More actions · Answer 8

One note. Whenever you do an EXISTS or NOT EXISTS change your subquery to "SELECT 1" instead of referencing cells. May not be a huge gain but results in less data coming back and is always a good practice. At least you aren't doing "SELECT *". This would be especially bad on a wide table.

billk SSC Veteran Points: 285 More actions · Answer 9

after a few tests the solution with the Left Outer Join (DavidBurrows) yields the best performance compared to NOT EXISTS...

SELECT t1.field1,t1.field2,t1.field3 FROM Table1 t1 LEFT OUTER JOIN Table2 t2 ON t2.field1 = t1.field1 AND t2.field2 = t1.field2 AND t2.field3 = t1.field3 WHERE t2.field1 IS NULL

thanks...

bill

katesl SSCommitted Points: 1743 More actions · Answer 10

When I have to be sure there will not be a performance issue, I do this

create table #notInTbl2 (id int)

insert into #notinTbl2 select ID from tbl1

delete from #notInTbl2 where id in

(select id from tbl2)

select id from #notInTbl2

Four statements in place of one, but SQL Server, as was the case with Ingres before SQL Server, is dependably fast with the positive as contrasted with the negative. The incentive for DBMS vendors has been fast inserts and deletes, not fast selects on "not in/not exists."

_________________
"Look, those sheep have been shorn."
data analyst replies, "On the sides that we can see.."

Nicholas Wang Default port Points: 1463 More actions · Answer 11

quote:
One note. Whenever you do an EXISTS or NOT EXISTS change your subquery to "SELECT 1" instead of referencing cells. May not be a huge gain but results in less data coming back and is always a good practice. At least you aren't doing "SELECT *". This would be especially bad on a wide table.

Please see my topic "EXISTS (SELECT *) or (SELECT 1)"

From MS's internal training for SQL Server 2000 Programming, it is said that there wouldn't be a performance different using SELECT *. In fact, it is a better idea to use SELECT * than to pick a column inside the tables as the column you pick may not have the appropriate index.

I think NPeeters replied saying that the Query Optimizer knows that you don't need the data inside an "EXISTS", thus Select * is just fine. I was wondering if this holds true if inside the EXISTS, I'm doing a complicated join of multiple tables, but I guess SELECT * is still good... it is more important to use derived tables inside the joins so to minimize the data set being joined together. (of course need to consider the index too.)

As for NOT IN / NOT EXISTS, I think that lady (her name is Kimberly Tripp-Simonnet) said to avoid NOT if at all possible because Query Optimizer doesn't work w/ NOTs. Thus I do what DavidBurrows does. LEFT OUTER JOIN and returns where right side is NULLS.