September 23, 2003 at 4:50 am
I have 2 tables whith the same structure
and i'want to find all the values from table1 where not exists in table2
Table1(field1,field2,field3)
Table2(field1,field2,field3)
SQL with NOT EXISTS:
SELECT
field1,field2,field3
FROM Table1, Tabl2
WHERE NOT EXISTS
(
SELECT field1,field2,field3
FROM Table1
WHERE table1.field1=table2.field1
AND table1.field2=table2.field2
AND table1.field3=table2.field3
)
AND table1.field1=table2.field1
how can avoid the NOT EXISTs for performance reasons?
Thanks
Bill
September 23, 2003 at 6:14 am
I prefer joins but don't know if performance is better though.
SELECT t1.field1,t1.field2,t1.field3
FROM Table1 t1
LEFT OUTER JOIN Table2 t2
ON t2.field1 = t1.field1
AND t2.field2 = t1.field2
AND t2.field3 = t1.field3
WHERE t2.field1 IS NULL
Far away is close at hand in the images of elsewhere.
Anon.
September 23, 2003 at 8:43 am
I don't believe there is a performance difference. The query optimiser should take care of this.
Bill, I would like to point out that your query is not entirely correct.
SELECT field1,field2,field3
FROM Table1
WHERE NOT EXISTS
(
SELECT Table2.field1,
Table2.field2,
Table2.field3
FROM Table2
WHERE table1.field1=table2.field1
AND table1.field2=table2.field2
AND table1.field3=table2.field3
)
Two changes :
1. Remove the implicit (old style join) from the outer query.
2. Add explicitly the table qualifier in the NOT EXISTS clause.
Don't know if this makes any difference, but it is worth a try.
September 24, 2003 at 12:46 am
Why do you need to do a NOT EXIST or Join queries.
I too have a similar situation where we create labels file for our subscribers every month and if we need to know who are the new subscribers this month then we have to find out the records in the current lables file which were not in the last months.
what is do is as follows:
select * from labels_19_jul_2003 where id not in (select id from labels_19_jun_2003)
and get what i want it can also work vice versa.
Hope this helps
Regards
Hitendra
September 24, 2003 at 1:20 am
Maybe not a direct answer to your question but I've used this construction
SELECT field1,field2,field3
FROM
(SELECT field1, field2, field3, 1 as mask FROM Table1
UNION ALL
SELECT field1, field2, field3, 2 as mask FROM Table2) AS Tab
GROUP BY field1, field2, field3 HAVING SUM(tab.mask) = 1
Performance is most likely not better but this can be used with more tables and
it is very easy to find records that exists in table1 and table2 but not in table3 (see below)
SELECT field1,field2,field3
FROM
(SELECT field1, field2, field3, 1 as mask FROM Table1
UNION ALL
SELECT field1, field2, field3, 2 as mask FROM Table2
UNION ALL
SELECT field1, field2, field3, 4 as mask FROM Table3) AS Tab
GROUP BY field1, field2, field3 HAVING SUM(tab.mask) = 3
/Fredrik
September 24, 2003 at 2:03 am
jollyguy4all,
quote:
select * from labels_19_jul_2003 where id not in (select id from labels_19_jun_2003)
I used to do it this way but found that performance can be poor, especially the larger the data set. I found using joins a vast improvement, not always but most of the time.
Far away is close at hand in the images of elsewhere.
Anon.
September 24, 2003 at 3:29 am
hi guys!
concerning: "outer-joins w/is null" vs. "not in" - to my experience if you've got larger tables the performance of the outer-join w/is null soluting significantly degrades compared to the "not in" solution.
best regards,
chris.
September 24, 2003 at 4:09 am
I believe the performance of either three solutions is highly dependant on the data that is contained in the tables.
1. The number of records in both tables
2. The number of records that match in both tables
E.g. the NOT IN solution probably won't yield good performance if Table2 contains a lot of records.
The Outer Join and NOT EXISTS solution probably won't yield good performance if both tables have a lot of matching rows.
September 24, 2003 at 7:11 am
One note. Whenever you do an EXISTS or NOT EXISTS change your subquery to "SELECT 1" instead of referencing cells. May not be a huge gain but results in less data coming back and is always a good practice. At least you aren't doing "SELECT *". This would be especially bad on a wide table.
September 25, 2003 at 12:05 am
after a few tests the solution with the Left Outer Join (DavidBurrows) yields the best performance compared to NOT EXISTS...
SELECT t1.field1,t1.field2,t1.field3 FROM Table1 t1 LEFT OUTER JOIN Table2 t2 ON t2.field1 = t1.field1 AND t2.field2 = t1.field2 AND t2.field3 = t1.field3 WHERE t2.field1 IS NULL
thanks...
bill
September 25, 2003 at 9:59 am
When I have to be sure there will not be a performance issue, I do this
create table #notInTbl2 (id int)
insert into #notinTbl2 select ID from tbl1
delete from #notInTbl2 where id in
(select id from tbl2)
select id from #notInTbl2
Four statements in place of one, but SQL Server, as was the case with Ingres before SQL Server, is dependably fast with the positive as contrasted with the negative. The incentive for DBMS vendors has been fast inserts and deletes, not fast selects on "not in/not exists."
_________________
"Look, those sheep have been shorn."
data analyst replies, "On the sides that we can see.."
September 25, 2003 at 10:21 am
quote:
One note. Whenever you do an EXISTS or NOT EXISTS change your subquery to "SELECT 1" instead of referencing cells. May not be a huge gain but results in less data coming back and is always a good practice. At least you aren't doing "SELECT *". This would be especially bad on a wide table.
Please see my topic "EXISTS (SELECT *) or (SELECT 1)"
From MS's internal training for SQL Server 2000 Programming, it is said that there wouldn't be a performance different using SELECT *. In fact, it is a better idea to use SELECT * than to pick a column inside the tables as the column you pick may not have the appropriate index.
I think NPeeters replied saying that the Query Optimizer knows that you don't need the data inside an "EXISTS", thus Select * is just fine. I was wondering if this holds true if inside the EXISTS, I'm doing a complicated join of multiple tables, but I guess SELECT * is still good... it is more important to use derived tables inside the joins so to minimize the data set being joined together. (of course need to consider the index too.)
As for NOT IN / NOT EXISTS, I think that lady (her name is Kimberly Tripp-Simonnet) said to avoid NOT if at all possible because Query Optimizer doesn't work w/ NOTs. Thus I do what DavidBurrows does. LEFT OUTER JOIN and returns where right side is NULLS.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply