May 26, 2009 at 3:29 am
Hi folks
I need some more inspiration, on how I can compare a set of data with another set of data and returning the sets, which isn't identical.
I have found two solutions, but there might be better ways of doing this. I need it to perform on large datasets.
CREATE TABLE A (ID INT, Firstname SYSNAME, LastName SYSNAME, hashcode AS isnull(binary_checksum(Firstname,LastName),0))
CREATE TABLE B (ID INT, Firstname SYSNAME, LastName SYSNAME, hashcode AS isnull(binary_checksum(Firstname,LastName),0))
INSERT A VALUES (1, 'Michael', 'Sondergaard')
INSERT A VALUES (1, 'Niels', 'Holger')
INSERT A VALUES (1, 'Michael', 'Quist')
INSERT B VALUES (1, 'Michael', 'Vestergaard')
INSERT B VALUES (1, 'Niels', 'Holger')
INSERT B VALUES (1, 'Michael', 'Quist')
INSERT A VALUES (2, 'Ruben', 'Andersen')
INSERT A VALUES (2, 'Martin', 'Buhl')
INSERT A VALUES (2, 'Jesper', 'Jensen')
INSERT B VALUES (2, 'Ruben', 'Andersen')
INSERT B VALUES (2, 'Martin', 'Buhl')
INSERT B VALUES (2, 'Jesper', 'Jensen')
As you can see in this test. the set 1 in table A is different from set 1 in table B. Because of the last name for Michael. Set 2 is the same in both table A and B
I have found the following solutions. Solutions 1 seems to be an non performer on large datasets However solution 2 have a theoretically possiblity of 2 sets being different, but reported as equal
Solution #1
SELECT ID
FROM
(
SELECT * FROM A
EXCEPT
SELECT * FROM B
UNION
SELECT * FROM B
EXCEPT
SELECT * FROM A
) X
Solution #2
SELECT ID = X.ID
FROM
(SELECT ID, SetHashCode = CHECKSUM_AGG(HashCode) FROM A GROUP BY ID) X
JOIN
(SELECT ID, SetHashCode = CHECKSUM_AGG(HashCode) FROM B GROUP BY ID) Y ON
X.ID = Y.ID AND
X.SetHashCode != Y.SetHashCode
May 27, 2009 at 8:55 am
Something like this might run better for you if you only have a few columns to match (like the example) and your id cannot be null.
SELECT distinct isnull(a.id, b.id)
FROM A FULL OUTER JOIN B on A.id = B.id
and a.hashcode = b.hashcode
and a.firstname = b.firstname
and a.lastname = b.lastname
WHERE a.id is null or b.id is null
May 27, 2009 at 11:38 am
It looks to me as if your first solution should have a UNION ALL in it instead of a UNION. This should speed the query up.
Unless you have a specific reason, you should not need the outer query.
SELECT * FROM A
EXCEPT
SELECT * FROM B
UNION ALL
SELECT * FROM B
EXCEPT
SELECT * FROM A
May 28, 2009 at 1:19 am
Thanks to you all.
Jamie I like the query plan for your version. The Id should have been a primary key, so your sql statement should hold up
May 28, 2009 at 1:19 am
Thanks to you all.
Jamie I like the query plan for your version. The Id should have been a primary key, so your sql statement should hold up
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply