Comparing if two sets of rows is different

  • Hi folks

    I need some more inspiration, on how I can compare a set of data with another set of data and returning the sets, which isn't identical.

    I have found two solutions, but there might be better ways of doing this. I need it to perform on large datasets.

    CREATE TABLE A (ID INT, Firstname SYSNAME, LastName SYSNAME, hashcode AS isnull(binary_checksum(Firstname,LastName),0))

    CREATE TABLE B (ID INT, Firstname SYSNAME, LastName SYSNAME, hashcode AS isnull(binary_checksum(Firstname,LastName),0))

    INSERT A VALUES (1, 'Michael', 'Sondergaard')

    INSERT A VALUES (1, 'Niels', 'Holger')

    INSERT A VALUES (1, 'Michael', 'Quist')

    INSERT B VALUES (1, 'Michael', 'Vestergaard')

    INSERT B VALUES (1, 'Niels', 'Holger')

    INSERT B VALUES (1, 'Michael', 'Quist')

    INSERT A VALUES (2, 'Ruben', 'Andersen')

    INSERT A VALUES (2, 'Martin', 'Buhl')

    INSERT A VALUES (2, 'Jesper', 'Jensen')

    INSERT B VALUES (2, 'Ruben', 'Andersen')

    INSERT B VALUES (2, 'Martin', 'Buhl')

    INSERT B VALUES (2, 'Jesper', 'Jensen')

    As you can see in this test. the set 1 in table A is different from set 1 in table B. Because of the last name for Michael. Set 2 is the same in both table A and B

    I have found the following solutions. Solutions 1 seems to be an non performer on large datasets However solution 2 have a theoretically possiblity of 2 sets being different, but reported as equal

    Solution #1

    SELECT ID

    FROM

    (

    SELECT * FROM A

    EXCEPT

    SELECT * FROM B

    UNION

    SELECT * FROM B

    EXCEPT

    SELECT * FROM A

    ) X

    Solution #2

    SELECT ID = X.ID

    FROM

    (SELECT ID, SetHashCode = CHECKSUM_AGG(HashCode) FROM A GROUP BY ID) X

    JOIN

    (SELECT ID, SetHashCode = CHECKSUM_AGG(HashCode) FROM B GROUP BY ID) Y ON

    X.ID = Y.ID AND

    X.SetHashCode != Y.SetHashCode

  • Something like this might run better for you if you only have a few columns to match (like the example) and your id cannot be null.

    SELECT distinct isnull(a.id, b.id)

    FROM A FULL OUTER JOIN B on A.id = B.id

    and a.hashcode = b.hashcode

    and a.firstname = b.firstname

    and a.lastname = b.lastname

    WHERE a.id is null or b.id is null

  • It looks to me as if your first solution should have a UNION ALL in it instead of a UNION. This should speed the query up.

    Unless you have a specific reason, you should not need the outer query.

    SELECT * FROM A

    EXCEPT

    SELECT * FROM B

    UNION ALL

    SELECT * FROM B

    EXCEPT

    SELECT * FROM A

  • Thanks to you all.

    Jamie I like the query plan for your version. The Id should have been a primary key, so your sql statement should hold up

  • Thanks to you all.

    Jamie I like the query plan for your version. The Id should have been a primary key, so your sql statement should hold up

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply