An in-depth look at change detection in SQL Server - Part 02

  • Comments posted to this topic are about the item An in-depth look at change detection in SQL Server - Part 02

    Thanks & Regards,
    Nakul Vachhrajani.
    http://nakulvachhrajani.com

    Follow me on
    Twitter: @sqltwins

  • About HASHBYTES (from http://msdn.microsoft.com/en-us/library/ms174415.aspx):

    1. input size is limited to 8000

    2. output size depends on used algorithm, for MD5 you can use CAST(HASHBYTES('MD5', (EmployeeName + CityName)) AS BINARY(16))

    3. beware addition NULL values

  • While the return type of hashbytes might be 8000 bytes, that's more for future proofing, the actual value length will be:

    Allowed input values are limited to 8000 bytes. The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

    from http://msdn.microsoft.com/en-us/library/ms174415.aspx

    so your absolute longest using one of the currently supported hashes, is going to be 64 bytes, so your actual storage should be 66 bytes. And you could save those two bytes by doing a cast to a fixed length binary since you will know what algorithm you will be using.

  • Hi

    It is dangerous to use hashbytes this way:

    DataHashBytes AS HASHBYTES('MD5', (EmployeeName + CityName))

    When EmployeeName = '' and CityName = 'Paris' and someone changes it to EmployeeName = 'Paris' and CityName = '' the hash stays the same. The solution (not very elegant but still a solution) is to use a delimiter, e.g.:

    DataHashBytes AS HASHBYTES('MD5', (EmployeeName + char(0) + CityName))

    The article is very useful, thank you.

  • I read this excellent article and have to confess there is one case which I don't fully comprehend.

    "BINARY_CHECKSUM fails to detect symmetric changes"

    What is a symmetric change?

    What other strings could be used in your example?

    thx

  • Nakul,

    I have one question, I'm using SQL server 2016 and I have a table with 150 columns with multiple data types
    I implemented hashbytes (SHA1) for Merge statement (Insert / Update)
    I notice that sometimes (I have around 3 millions rows) it not detect changes
    (for example it not detect 7 rows that need to update out of 3 million)
    Is there any limitation for how many columns you can have in hashbytes function?

    Thanks,
    Oded Dror

  • Is there any particular reason to use SHA1 instead of MD5? I would recommend changing the algorithm and then validating if the changes are being detected correctly or not. SQL 2016 comes with a lot of the older algorithms marked as "Deprecated" and will not be supported in a future version of SQL Server.

    Thanks & Regards,
    Nakul Vachhrajani.
    http://nakulvachhrajani.com

    Follow me on
    Twitter: @sqltwins

  • Nakul Vachhrajani - Wednesday, December 6, 2017 9:42 PM

    Is there any particular reason to use SHA1 instead of MD5? I would recommend changing the algorithm and then validating if the changes are being detected correctly or not. SQL 2016 comes with a lot of the older algorithms marked as "Deprecated" and will not be supported in a future version of SQL Server.

    Nakul,

    Thank you
    Oded Dror

  • It's an old article that I hadn't seen before.  I like it a lot and just used it instead of me having to do a long winded explanation to someone as the what and why to use the HashBytes part of it all.

    Thank you for the time you spent researching and writing it up Nakul.

     

    • This reply was modified 5 months, 4 weeks ago by  Jeff Moden.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply