June 3, 2011 at 6:14 am
Hello,
We have two tables from two different sources. Say, EMP1 and EMP2
Both the tables have billions of rows. The columns are first name and last name in both the tables.
The need is to find out all the records in EMP2 which are not there in EMP1 What should be the best way to do it? Will the simple outer join suffice?
June 3, 2011 at 6:21 am
Something like this should do:
SELECT *
FROM EMP2
WHERE NOT EXISTS (
SELECT 1
FROM EMP1
WHERE EMP2.Name = EMP1.Name
AND EMP2.[Last Name] = EMP1.[Last Name]
The physical join operator will be determined by the optimizer based on indexes and stats, don't worry about that. It should turn into a LEFT ANTI SEMI-JOIN.
Hope this helps
Gianluca
-- Gianluca Sartori
June 3, 2011 at 6:57 am
Thanks for the reply. What should be the ideal index plan for these two tables?
June 3, 2011 at 7:04 am
It should be an index with both fields in it. It should not matter which one first, especially when both tables have billions of rows, as it would probably end up with a SCAN + HASH.
If one of the tables has a significant amount of rows less than the other one, the optimizer could decide to implement the join with a MERGE operator, and then the order of the fields should be the same on both indexes, having the most selective column first.
-- Gianluca Sartori
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply