Duplicate records searching

  • There are two tables A and B. which have the same data structure.

    Both of them have the same data structure and about 50 fields.

    A has 100,000 records and B has 1,000 records.

    How to create a script to find out if some records in B are in A already(duplicate records)?

    That is, records in 50 fields should be the the same in both of them.

  • select * from B

    intersect

    select * from A;

    Normally would not use the * as I would prefer to explicitly name the columns.

  • I cant think of any way other than putting all those 50 columns in a WHERE clause.

    Lets wait for the heavyweights to throw their thoughts on this

  • ColdCoffee (5/7/2012)


    I cant think of any way other than putting all those 50 columns in a WHERE clause.

    Lets wait for the heavyweights to throw their thoughts on this

    Might be quicker to hash the columns and compare against that. Possibly not - would need testing

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • ColdCoffee (5/7/2012)


    I cant think of any way other than putting all those 50 columns in a WHERE clause.

    Lets wait for the heavyweights to throw their thoughts on this

    Uh, intersect operator. From BOL:

    INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.

  • ColdCoffee (5/7/2012)


    I cant think of any way other than putting all those 50 columns in a WHERE clause.

    Lets wait for the heavyweights to throw their thoughts on this

    Ah also - I'm assuming you have a valid unique primary key here. So you'd hash the rest of the columns, then the delta would be join on id, compare on hash. You'll also have to cast any non character columns as you can only hash against char values. For improved performance you might want to store the hashes on the target table and index appropriately

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • ColdCoffee (5/7/2012)


    I cant think of any way other than putting all those 50 columns in a WHERE clause.

    Lets wait for the heavyweights to throw their thoughts on this

    Err- Lynn is pretty good. Was that just a timing of post issue?

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply