Data Compare in Two Tables

  • Added missing T1 alias in first query

    ;WITH TABLE_ONE AS (

    SELECT ROW_NUMBER() OVER(ORDER BY INTFIELD, VARCHARFIELD, DATEFIELD) AS RN, T1.*

    FROM TBL1 T1

    ),

    MATCH_TWO AS (

    SELECT T1.*

    FROM TABLE_ONE AS T1 INNER JOIN TBL2 AS T2

    ON T1.INTFIELD = T2.INTFIELD AND

    T1.VARCHARFIELD = T2.VARCHARFIELD AND

    T1.DATEFIELD BETWEEN DATEADD(minute,-5,T2.DATEFIELD) AND DATEADD(minute,5,T2.DATEFIELD)

    )

    SELECT *

    FROM TABLE_ONE

    WHERE RN NOT IN (SELECT RN FROM MATCH_TWO)

    This time it returns the right records with my test data. The problem is this would be a ressource monster with real data.

    Just look at the estimated subtree cost ratio

    0.0066852 (Left Outer Join)

    0.0572076 (CTE)

    In both cases it does a full table scan on Tbl1 and Tbl2 (my test tables have no key/indexes) but with the CTE it also builds a Work Table and puts a lot of extra effort there.

    I am including my test script, try both our queries with statistics (time and IO) on and also compare execution plan, you will see what I mean.

    /*

    Create table Tbl1(

    ColumnA varchar(20),

    ColumnB datetime,

    ColumnC int

    )

    insert Tbl1

    select 'abc','20090120 08:17:59.120',1

    union

    select 'def','20090121 08:17:59.120',2

    union

    select 'ghi','20090122 08:17:59.120',3

    union

    select 'jkl','20090123 08:17:59.120',4

    Create table Tbl2(

    ColumnA varchar(20),

    ColumnB datetime,

    ColumnC int

    )

    insert Tbl2

    select 'abc','20090120 08:17:59.120',1

    union

    select 'def','20090121 08:22:59.120',2

    union

    select 'ghi','20090122 08:12:58.120',3

    union

    select 'jkl','20090123 08:27:59.120',4

    */

  • Maxim Picard (1/22/2009)


    The problem is this would be a ressource monster with real data.

    Just look at the estimated subtree cost ratio

    0.0066852 (Left Outer Join)

    0.0572076 (CTE)

    In both cases it does a full table scan on Tbl1 and Tbl2 (my test tables have no key/indexes) but with the CTE it also builds a Work Table and puts a lot of extra effort there.

    I am including my test script, try both our queries with statistics (time and IO) on and also compare execution plan, you will see what I mean.

    I ran both and WHERE NOT EXISTS and EXCEPT. The Left Join and WHERE NOT EXISTS are the best in performance and, IMO, readability, and maintainability.

  • Dear All

    It appears the problem has generated quite an interest. I am therefore attaching some sample data for Tbl1 and Tbl2. These are csv files, but as .csv files cannot be uploaded I have changed the ext to .doc. So if there is any problem change .doc to .csv ext type

    Thanks and Regards every one

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply