There are times when we need to compare two tables and figure out if the data matches. I often see a number of ways of doing this suggested, most are quite slow and inefficient. I’d quite like to share a quick and slightly dirty way of doing this using the CHECKSUM and CHECKSUM_AGG functions.
CHECKSUM()
Just a reminder that CHECKSUM() will generate a checksum for an entire row or selection of columns in the row.
CHECKSUM_AGG()
Will generate a checksum for a dataset.
Comparing Tables
With these two functions together it’s dead simple to compare the data in two tables. We need to create a checksum for the entire table, this can be done simply by first generating a checksum for each row and then using CHECKSUM_AGG() to give us an aggregated checksum for the table.
SELECT CHECKSUM_AGG(CHECKSUM(*)) FROM table_nameThe above will return a checksum for all the data in a table, run it for two or more tables and where the checksums match, you know that the data in those tables matches.
CHECKSUM and CHECKSUM_AGG both generate their checksums through hashing, that means that there is a VERY small chance that two different datasets could produce the same checksum.
Let’s See It In Action
For some reason I’ve got two copies of the Databases tables from the Undercover Catalogue that I want to check match.
SELECT CHECKSUM_AGG(CHECKSUM(*)) FROM SQLUndercover.Catalogue.Databases811887081
SELECT CHECKSUM_AGG(CHECKSUM(*)) FROM SQLUndercover.Catalogue.Databases2811887081
So from the above, we can see that the data in the two tables matches. Just for the sake of argument, let’s try deleting a row out of the table and see what happens to the checksum.
DELETE FROM SQLUndercover.Catalogue.Databases WHERE ServerName = 'DavidSQLTest01'Now if we rerun the checksum code, we get…
175501893
The checksums are different so that means that the data is now also different.
So there you have a very quick way to see if the data in two tables matches.
Please do be aware of the issue with hash collisions that I mentioned above, although the risk is very low. I might do a little investigation into CHECKSUM and hash collisions in a future post and see how prolific they really are.