Difference between tables

  • Hi All,

    What I am trying to achieve is to obtain the difference b/w two tables.

    Rather than using the 'not in' clause in SQL, I was trying to do the same using a Join query. My test data as below:

    if object_id('tempdb.dbo.#temp_PhoneNumber1') is not null

    drop table #temp_PhoneNumber1

    if object_id('tempdb.dbo.#temp_PhoneNumber2') is not null

    drop table #temp_PhoneNumber2

    select phonenumber into

    #temp_Phonenumber1

    from (

    select '095215225' as phonenumber union all

    select '02102985447' union all

    select '00919739054668'

    )t

    select phonenumber into

    #temp_Phonenumber2

    from (

    select '095215225' as phonenumber union all

    select '02102985447' union all

    select '00919739054668' union all

    select '00919645702533'

    )t

    I was able to get the difference using the following query:

    select t2.phonenumber

    from #temp_Phonenumber2 t2 left join

    (select t2.phonenumber

    from #temp_Phonenumber1 t1 inner join #temp_Phonenumber2 t2

    on t1.phonenumber = t2.phonenumber)t

    on t2.phonenumber = t.phonenumber

    where t.phonenumber is null

    result:

    00919645702533

    Now instead of 'where t.phonenumber is null' if I try

    'and t.phonefullnumber is null' I get all the numbers in t2 instead of the result above. Can someone please explain why this is the case.

  • Instead of your difference query use this

    select * from #temp_Phonenumber2

    except

    select * from #temp_Phonenumber1

  • Tablediff is also useful for these sorts of things and can even generate scripts to fix differences between tables.

    http://msdn.microsoft.com/en-us/library/ms162843.aspx

  • Thanks Kabi and Akeel... Your suggestions definitely work...Just that I am interested in finding out what is wrong with my initial attempts.

  • Thanks Kabi and Akeel... Your suggestions definitely work...Just that I am interested in finding out what is wrong with my initial attempts.

  • Do you mean that you replaced the WHERE with an AND to put the t.phonenumber is null part of the ON clause of the LEFT OUTER JOIN?

    If this is what you mean it is because of the way that SQL Server processes the queries. For this query:

    select

    t2.phonenumber,

    t.phonenumber

    from

    #temp_Phonenumber2 t2 left join

    (

    select

    t2.phonenumber

    from

    #temp_Phonenumber1 t1 inner join

    #temp_Phonenumber2 t2

    on t1.phonenumber = t2.phonenumber

    ) t

    on t2.phonenumber = t.phonenumber and

    t.phonenumber is NULL

    SQL Server essentially says you want all the rows from #temp_PhoneNumber2 because it first does a cross join between both tables, then applies the ON which will give you the 3 matching rows, then it applies the outer join which says now add any rows from the left table that don't have a match in the second table which adds the unmatched row. Thus all 4 rows are returned. And actually the optimzer is smart enough that is actually only does a table scan of #temp_PhoneNumber2 and ignores the derived table and joins all together. Check the execution plan.

    In this query:

    select

    t2.phonenumber

    from

    #temp_Phonenumber2 t2 left join

    (

    select

    t2.phonenumber

    from

    #temp_Phonenumber1 t1 inner join

    #temp_Phonenumber2 t2

    on t1.phonenumber = t2.phonenumber

    ) t

    on t2.phonenumber = t.phonenumber

    WHERE

    t.phonenumber is null

    The first part of the process is the same. The difference is that after applying the outer join and adding in the additional row then the WHERE clause is applied which eliminates the 3 matched rows.

    If you search for SQL Server Logical QUery Processing you'll get plenty of results that explain how this works better than I can. Inside Microsoft SQL Server 2005: T-SQL Querying by Itzik Ben Gan, Lubor Kollar, and Dejan Sarka has a chapter on this as well

  • Thanks Jack for pointing in the right direction 🙂

    Things starting to make much more sense now. A little quote I found from one of the blogs...

    "The placement of predicates in an ON clause does not make a difference for INNER joins, they might cause a different result when OUTER joins are involved. This is because the predicates in the ON clause are applied to the table before the join, whereas the WHERE clause is semantically applied to the result of the join."

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply