Get hash of the whole table

  • Let's say I have a table with 15 columns (any simple type, so no text, varchar(max), xml,..) and 1.000.000 rows. I would like to calculate a hash of all the data in the table, preferably with a strong hashing function.

    One way to get the hash (well, checksum in this example) is to use the checksum functionality:

    select checksum_agg(binary_checksum(*))

    from MyTable

    But if I understand correctly the checksum_agg and binary_checksum aren't designed with collision resistance as a primary purpose, so the collisions can happen quite often.

    This is my attempt to get a SHA1 hash of a whole table:

    declare @hash varchar(40)

    set @hash = '0000000000000000000000000000000000000000'

    declare @bc int

    declare c cursor local fast_forward for

    select binary_checksum(*) bc

    from MyTable

    order by Id

    open c

    fetch next from c into @bc

    while @@fetch_status = 0

    begin

    set @hash = hashbytes('sha1', @hash + convert(varchar, @bc))

    fetch next from c into @bc

    end

    close c

    deallocate c

    select convert(varbinary(20), @hash)

    This way I got rid of the checksum_agg function, but not binary_checksum. Any ideas? One way is to use CLR function, but I would rather use t-sql only.

    Bonus Question: why doesn't the following code produce the same result as my cursor based code?

    declare @hash varchar(40)

    set @hash = '0000000000000000000000000000000000000000'

    select @hash = hashbytes('sha1', @hash + convert(varchar, binary_checksum(*)))

    from MyTable

    order by Id

    select convert(varbinary(20), @hash)

    Edit: one option would be to build a dynamic query which would use hashbytes function from the concatenation of all the columns instead of binary_checksum(*)...

  • I wrote a CLR function instead:

    public static class FingerPrint

    {

    [SqlFunction(DataAccess = DataAccessKind.Read,SystemDataAccess=SystemDataAccessKind.Read)]

    public static string GetTableFingerprint(string table)

    {

    using (var ms = new MemoryStream())

    using (var writer = new StreamWriter(ms))

    using (var connection = new SqlConnection("context connection=true"))

    using (var command = connection.CreateCommand())

    {

    command.CommandText = "select * from " + table + " order by Id";

    connection.Open();

    using (var reader = command.ExecuteReader())

    {

    var numberOfColumns = reader.FieldCount;

    while (reader.Read())

    {

    var sb = new StringBuilder();

    for (int i = 0; i < numberOfColumns; i++)

    {

    sb.Append(reader);

    }

    writer.Write(sb.ToString());

    }

    }

    writer.Flush();

    ms.Position = 0;

    using (var sha = new SHA512Managed())

    {

    var hash = sha.ComputeHash(ms);

    return BitConverter.ToString(hash).Replace("-", "");

    }

    }

    }

    }

    Usage:

    select dbo.GetTableFingerprint('Database.Schema.MyTable')

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply