How to use HASHBYTES function in sql server for multiple columns

  • Jason A. Long - Tuesday, September 25, 2018 9:41 AM

    Jeff,
    I know that SHA1 has depricated for password hashing & certificate signing (or anything else related to security) but is there any documentation that it has been depricated as a legitimate parameter value in the HASHBYTES function?
    I ask because I don't see any prohibitions or warning about it on BOL HASHBYTES (Transact-SQL)

    Sorry for butting in, but the deprication is documented.
    According to HASHBYTES (Transact-SQL) 

    Beginning with SQL Server 2016 (13.x), all algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event.

  • I appreciate the information about which algorithm to use. But any help on the actual issue which I am facing currently will be really helpful.

  • DesNorton - Wednesday, September 26, 2018 12:23 PM

    Sorry for butting in, but the deprication is documented.
    According to HASHBYTES (Transact-SQL) 

    Beginning with SQL Server 2016 (13.x), all algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event.

    Yes.  Search for the term "Sha1" (without the quotes) at the following URL
    https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiZ-NyKq9ndAhUvVt8KHU5zDscQFjAAegQICRAB&url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fsql%2Fdatabase-engine%2Fdeprecated-database-engine-features-in-sql-server-2016&usg=AOvVaw0wal0sUj6lgPN6X5AIM1sC

    Note that the documentation is somewhat incorrect in saying that SHA1 doesn't work in 2016.  It does still work but I wouldn't use it on a bet even for something like on this thread just because you don't know how long a solution may last or when MS may decide to drop the bomb because it IS a security issue.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • vikasjagadale8 - Wednesday, September 26, 2018 12:38 PM

    I appreciate the information about which algorithm to use. But any help on the actual issue which I am facing currently will be really helpful.

    Build some dynamic SQL that will generate the HASHBYTES concatenation code for you.  Have it read from sys.columns for the table.

    Note that I don't recommend having the dynamic SQL to do this run all the time.  Just have it generate the code for you and use the generated code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • DesNorton - Wednesday, September 26, 2018 12:23 PM

    Jason A. Long - Tuesday, September 25, 2018 9:41 AM

    Jeff,
    I know that SHA1 has depricated for password hashing & certificate signing (or anything else related to security) but is there any documentation that it has been depricated as a legitimate parameter value in the HASHBYTES function?
    I ask because I don't see any prohibitions or warning about it on BOL HASHBYTES (Transact-SQL)

    Sorry for butting in, but the deprication is documented.
    According to HASHBYTES (Transact-SQL) 

    Beginning with SQL Server 2016 (13.x), all algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event.

    Good catch Des... They normally do a better job of highlighting that sort of thing... Not to mention the fact that they're using depricated values in the examples.

  • vikasjagadale8 - Wednesday, September 26, 2018 12:38 PM

    I appreciate the information about which algorithm to use. But any help on the actual issue which I am facing currently will be really helpful.

    Look eye.

    The 3rd post, in the CREATE TABLE where concat_ws is used to create the computed column.

  • Jeff Moden - Wednesday, September 26, 2018 12:54 PM

    DesNorton - Wednesday, September 26, 2018 12:23 PM

    Sorry for butting in, but the deprication is documented.
    According to HASHBYTES (Transact-SQL) 

    Beginning with SQL Server 2016 (13.x), all algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event.

    Yes.  Search for the term "Sha1" (without the quotes) at the following URL
    https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiZ-NyKq9ndAhUvVt8KHU5zDscQFjAAegQICRAB&url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fsql%2Fdatabase-engine%2Fdeprecated-database-engine-features-in-sql-server-2016&usg=AOvVaw0wal0sUj6lgPN6X5AIM1sC

    Note that the documentation is somewhat incorrect in saying that SHA1 doesn't work in 2016.  It does still work but I wouldn't use it on a bet even for something like on this thread just because you don't know how long a solution may last or when MS may decide to drop the bomb because it IS a security issue.

    Thanks Jeff. Your Googles must be working better than mine... Good find sir! 🙂

  • The SHA1 deprecation isn't applicable here.

    There are two uses for hashing in CS.
    1. Security - hide values, but provide a deterministic way of determining equality/inequality
    2. Bucketing similar values together.

    This is the second case, which isn't a security issue, but since MS could remove the algorithm, why create work later? Use SHA2 now.

  • Steve Jones - SSC Editor - Wednesday, September 26, 2018 7:51 PM

    The SHA1 deprecation isn't applicable here.

    There are two uses for hashing in CS.
    1. Security - hide values, but provide a deterministic way of determining equality/inequality
    2. Bucketing similar values together.

    This is the second case, which isn't a security issue, but since MS could remove the algorithm, why create work later? Use SHA2 now.

    SHA2 is depricated as well... Acording the BOL, everything but SHA_256 & SHA_516 are depricated.

    [highlight="yellow"]Identifies the hashing algorithm to be used to hash the input. This is a required argument with no default. The single quotation marks are required. Beginning with SQL Server 2016 (13.x), all algorithms other than SHA2_256, and SHA2_512 are deprecatedall algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event.[/highlight]



    Both SHA_256 & SHA_512 are massive overkill (both in terms of size on disk and computational requirements) for change tracking.

  • Jason A. Long - Thursday, September 27, 2018 9:22 AM

    SHA2 is depricated as well... Acording the BOL, everything but SHA_256 & SHA_516 are depricated.

    [highlight="yellow"]Identifies the hashing algorithm to be used to hash the input. This is a required argument with no default. The single quotation marks are required. Beginning with SQL Server 2016 (13.x), all algorithms other than SHA2_256, and SHA2_512 are deprecatedall algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event.[/highlight]



    Both SHA_256 & SHA_512 are massive overkill (both in terms of size on disk and computational requirements) for change tracking.

    SHA_256 and SHA_512 use SHA2. They're just the size. SHA3 hasn't been added, which is strange, given they're in the .NET namespace

  • Steve Jones - SSC Editor - Thursday, September 27, 2018 9:28 AM

    Jason A. Long - Thursday, September 27, 2018 9:22 AM

    SHA2 is depricated as well... Acording the BOL, everything but SHA_256 & SHA_516 are depricated.

    [highlight="yellow"]Identifies the hashing algorithm to be used to hash the input. This is a required argument with no default. The single quotation marks are required. Beginning with SQL Server 2016 (13.x), all algorithms other than SHA2_256, and SHA2_512 are deprecatedall algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event.[/highlight]



    Both SHA_256 & SHA_512 are massive overkill (both in terms of size on disk and computational requirements) for change tracking.

    SHA_256 and SHA_512 use SHA2. They're just the size. SHA3 hasn't been added, which is strange, given they're in the .NET namespace

    Gotcha... My mistake. 🙂

  • vikasjagadale8 - Wednesday, September 26, 2018 12:38 PM

    I appreciate the information about which algorithm to use. But any help on the actual issue which I am facing currently will be really helpful.

    Hi,

    I had a similar issue a while back and found this: https://www.sqlservercentral.com/Forums/FindPost1726024.aspx .

    Review the string that is created.  AFAIK it's deterministic, but the string and therefore the hash would change if column names changed.  And depending on your data you'd need to make sure the string isn't truncated. 

    If these aren't issues for you then perhaps this approach would work for you?

    HTH,
    Scott

  • Jason A. Long - Thursday, September 27, 2018 9:22 AM

    Both SHA_256 & SHA_512 are massive overkill (both in terms of size on disk and computational requirements) for change tracking.

    Whilst I don't necessarily disagree with that as a statement, anything other than SHA2_256 or SHA2_512 will raise a deprecation event on SQL 2016 or above and potentially be removed in a future version of SQL Server.

  • Scott In Sydney - Thursday, September 27, 2018 6:49 PM

    vikasjagadale8 - Wednesday, September 26, 2018 12:38 PM

    I appreciate the information about which algorithm to use. But any help on the actual issue which I am facing currently will be really helpful.

    Hi,

    I had a similar issue a while back and found this: https://www.sqlservercentral.com/Forums/FindPost1726024.aspx .

    Review the string that is created.  AFAIK it's deterministic, but the string and therefore the hash would change if column names changed.  And depending on your data you'd need to make sure the string isn't truncated. 

    If these aren't issues for you then perhaps this approach would work for you?

    HTH,
    Scott

    Not sure why column name changes would affect the hash as it is the contents of the columns that are being hashed, not the name of the columns.  Also hashbytes does have an 8000 byte limit on input for SQL Server 2014 and older.

  • Lynn Pettis - Friday, September 28, 2018 8:37 AM

    Not sure why column name changes would affect the hash as it is the contents of the columns that are being hashed, not the name of the columns.  Also hashbytes does have an 8000 byte limit on input for SQL Server 2014 and older.

    The way I've worked around the 8K/4K limit in the passed is to create a ITVFN that chops it up, hashes the segments and then hashes the concatenation of the output hashes. I'll have a look if I can find the code (because I'm a bit lazy and travel tired), straight forward and should be easy to write.
    😎 
    The hashbyte function must be deterministic, otherwise it would be an oxymoron!

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply