December 4, 2014 at 2:32 am
Let's say I have a table with 15 columns (any simple type, so no text, varchar(max), xml,..) and 1.000.000 rows. I would like to calculate a hash of all the data in the table, preferably with a strong hashing function.
One way to get the hash (well, checksum in this example) is to use the checksum functionality:
select checksum_agg(binary_checksum(*))
from MyTable
But if I understand correctly the checksum_agg and binary_checksum aren't designed with collision resistance as a primary purpose, so the collisions can happen quite often.
This is my attempt to get a SHA1 hash of a whole table:
declare @hash varchar(40)
set @hash = '0000000000000000000000000000000000000000'
declare @bc int
declare c cursor local fast_forward for
select binary_checksum(*) bc
from MyTable
order by Id
open c
fetch next from c into @bc
while @@fetch_status = 0
begin
set @hash = hashbytes('sha1', @hash + convert(varchar, @bc))
fetch next from c into @bc
end
close c
deallocate c
select convert(varbinary(20), @hash)
This way I got rid of the checksum_agg function, but not binary_checksum. Any ideas? One way is to use CLR function, but I would rather use t-sql only.
Bonus Question: why doesn't the following code produce the same result as my cursor based code?
declare @hash varchar(40)
set @hash = '0000000000000000000000000000000000000000'
select @hash = hashbytes('sha1', @hash + convert(varchar, binary_checksum(*)))
from MyTable
order by Id
select convert(varbinary(20), @hash)
Edit: one option would be to build a dynamic query which would use hashbytes function from the concatenation of all the columns instead of binary_checksum(*)...
December 4, 2014 at 7:46 am
I wrote a CLR function instead:
public static class FingerPrint
{
[SqlFunction(DataAccess = DataAccessKind.Read,SystemDataAccess=SystemDataAccessKind.Read)]
public static string GetTableFingerprint(string table)
{
using (var ms = new MemoryStream())
using (var writer = new StreamWriter(ms))
using (var connection = new SqlConnection("context connection=true"))
using (var command = connection.CreateCommand())
{
command.CommandText = "select * from " + table + " order by Id";
connection.Open();
using (var reader = command.ExecuteReader())
{
var numberOfColumns = reader.FieldCount;
while (reader.Read())
{
var sb = new StringBuilder();
for (int i = 0; i < numberOfColumns; i++)
{
sb.Append(reader);
}
writer.Write(sb.ToString());
}
}
writer.Flush();
ms.Position = 0;
using (var sha = new SHA512Managed())
{
var hash = sha.ComputeHash(ms);
return BitConverter.ToString(hash).Replace("-", "");
}
}
}
}
Usage:
select dbo.GetTableFingerprint('Database.Schema.MyTable')
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply