September 24, 2018 at 1:31 pm
I have a requirement wherein I have to create hashvalue which consist of all columns of a table. With Checksum this can be done easily, but Checksum is not recommended as per Microsoft: If at least one of the values in the expression list changes, the list checksum will probably change. However, this is not guaranteed. Therefore, to detect whether values have changed, we recommend the use of CHECKSUM only if your application can tolerate an occasional missed change. Otherwise, consider using HashBytes instead. With a specified MD5 hash algorithm, the probability that HashBytes will return the same result, for two different inputs, is much lower compared to CHECKSUM.
HASHBYTES accepts only 2 parameters (algorithm type, column)
Now the problem is even though HASHBYTES is more reliable compared to checksum but there doesn't seem to be an easy way to create it on multiple columns.
An example in the checksum,
create table dbo.chksum_demo1( id int not null, name varchar(25), address varchar(250), HashValue as Checksum (id,name,address) CONSTRAINT PK_chksum_demo1 PRIMARY KEY (Id))
How can we do above using Hashbytes instead of checksum?
September 24, 2018 at 4:03 pm
Hi,
You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed. I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same. I have used "|" in the example:
create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));
insert dbo.hash_demo1( id , name , address)
values (1, 'bob', '123 A street'),
(2, 'mark', '564 A street'),
(3, 'ricky', '75 A street'),
(4, 'ricky7', '5 A street')
;
select * from dbo.hash_demo1;
update dbo.hash_demo1
set address = '70 A street'
where id =1;
select * from dbo.hash_demo1;
The example also shows the hash value changing automatically.
Bevan
September 24, 2018 at 8:52 pm
Bevan Keighley - Monday, September 24, 2018 4:03 PMHi,You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed. I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same. I have used "|" in the example:
create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));insert dbo.hash_demo1( id , name , address)
values (1, 'bob', '123 A street'),
(2, 'mark', '564 A street'),
(3, 'ricky', '75 A street'),
(4, 'ricky7', '5 A street')
;
select * from dbo.hash_demo1;update dbo.hash_demo1
set address = '70 A street'
where id =1;select * from dbo.hash_demo1;
The example also shows the hash value changing automatically.
Bevan
Just a quick observation... According to BOL, CONCAT_WS is only available in SQL Server 2017 and Azure... If the OP is using 2016 they will need to use CONCAT or +...
September 25, 2018 at 12:24 am
I suggest that the calculated column is persisted since the function is deterministic.
😎
Example for both concatenation functions
-- SQL SERVER 2017 OR LATER
CREATE TABLE DBO.HASH_DEMO_2017
(
ID INT NOT NULL CONSTRAINT PK_DBO_HASH_DEMO_2017 PRIMARY KEY CLUSTERED
, [NAME] VARCHAR(25) NOT NULL
, [ADDRESS] VARCHAR(250) NOT NULL
, HASHVALUE AS HASHBYTES ('SHA1',CONCAT_WS('|',ID,NAME,ADDRESS)) PERSISTED
)
;
-- SQL SERVER 2012 OR LATER
CREATE TABLE DBO.HASH_DEMO_2012
(
ID INT NOT NULL CONSTRAINT PK_DBO_HASH_DEMO_2012 PRIMARY KEY CLUSTERED
, [NAME] VARCHAR(25) NOT NULL
, [ADDRESS] VARCHAR(250) NOT NULL
, HASHVALUE AS HASHBYTES ('SHA1',CONCAT([ID],'|',[NAME],'|',[ADDRESS])) PERSISTED
)
;
September 25, 2018 at 12:40 am
Bevan Keighley - Monday, September 24, 2018 4:03 PMHi,You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed. I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same. I have used "|" in the example:
create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));insert dbo.hash_demo1( id , name , address)
values (1, 'bob', '123 A street'),
(2, 'mark', '564 A street'),
(3, 'ricky', '75 A street'),
(4, 'ricky7', '5 A street')
;
select * from dbo.hash_demo1;update dbo.hash_demo1
set address = '70 A street'
where id =1;select * from dbo.hash_demo1;
The example also shows the hash value changing automatically.
Bevan
Thanks, Bevan. it worked. But had 1 doubt, when you say concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same. I have used "|" in the example:
Does that mean if in case separator is not added and if two columns are having the same value won't hash value be different? I mean next time if existing record is updated which has let's say hashvalue as 1001 after update 2 of the columns have changed and have same value won't hashvalue be something like 1004
September 25, 2018 at 7:59 am
vikasjagadale8 - Tuesday, September 25, 2018 12:40 AMBevan Keighley - Monday, September 24, 2018 4:03 PMHi,You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed. I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same. I have used "|" in the example:
create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));insert dbo.hash_demo1( id , name , address)
values (1, 'bob', '123 A street'),
(2, 'mark', '564 A street'),
(3, 'ricky', '75 A street'),
(4, 'ricky7', '5 A street')
;
select * from dbo.hash_demo1;update dbo.hash_demo1
set address = '70 A street'
where id =1;select * from dbo.hash_demo1;
The example also shows the hash value changing automatically.
Bevan
Thanks, Bevan. it worked. But had 1 doubt, when you say concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same. I have used "|" in the example:
Does that mean if in case separator is not added and if two columns are having the same value won't hash value be different? I mean next time if existing record is updated which has let's say hashvalue as 1001 after update 2 of the columns have changed and have same value won't hashvalue be something like 1004
That's not how hashing works... It's not a "change counter". A hash value is a completely unpredictable value that is based solely on the data being hashed.
So, lets say that you're hashing with SHA1 and the original string is "Hello World."... the Hash value is going to be 0xB701146CF2C1262A6385C8B1FB1DB98F05820499.
If you change the period to a question mark (Hello World?) the new hash value is going to be 0x19B2E855874FDA4939883FBDE1714A77C3BBF0B8.
If you then change it back to the original value, the has value will go back to it's original value as well.
See below...DECLARE @string VARCHAR(15) = 'Hello World.';
SELECT @string, HASHBYTES('SHA1', @string);
SET @string = REPLACE(@string, '.', '?');
SELECT @string, HASHBYTES('SHA1', @string);
SET @string = REPLACE(@string, '?', '.');
SELECT @string, HASHBYTES('SHA1', @string);
September 25, 2018 at 8:38 am
Jason A. Long - Tuesday, September 25, 2018 7:59 AMvikasjagadale8 - Tuesday, September 25, 2018 12:40 AMBevan Keighley - Monday, September 24, 2018 4:03 PMHi,You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed. I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same. I have used "|" in the example:
create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));insert dbo.hash_demo1( id , name , address)
values (1, 'bob', '123 A street'),
(2, 'mark', '564 A street'),
(3, 'ricky', '75 A street'),
(4, 'ricky7', '5 A street')
;
select * from dbo.hash_demo1;update dbo.hash_demo1
set address = '70 A street'
where id =1;select * from dbo.hash_demo1;
The example also shows the hash value changing automatically.
Bevan
Thanks, Bevan. it worked. But had 1 doubt, when you say concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same. I have used "|" in the example:
Does that mean if in case separator is not added and if two columns are having the same value won't hash value be different? I mean next time if existing record is updated which has let's say hashvalue as 1001 after update 2 of the columns have changed and have same value won't hashvalue be something like 1004That's not how hashing works... It's not a "change counter". A hash value is a completely unpredictable value that is based solely on the data being hashed.
So, lets say that you're hashing with SHA1 and the original string is "Hello World."... the Hash value is going to be 0xB701146CF2C1262A6385C8B1FB1DB98F05820499.
If you change the period to a question mark (Hello World?) the new hash value is going to be 0x19B2E855874FDA4939883FBDE1714A77C3BBF0B8.
If you then change it back to the original value, the has value will go back to it's original value as well.
See below...DECLARE @string VARCHAR(15) = 'Hello World.';
SELECT @string, HASHBYTES('SHA1', @string);
SET @string = REPLACE(@string, '.', '?');
SELECT @string, HASHBYTES('SHA1', @string);
SET @string = REPLACE(@string, '?', '.');
SELECT @string, HASHBYTES('SHA1', @string);
Further on Jason's point, since the values are concatenated, implicitly converted to a string and divided by a delimiter, any changes in any column will introduce a change in the hashed value on a row level. Individual column values are therefore only contributing to the row hash value, cannot be individually identified as the granularity of the hash product is a row, not a column.
😎
Again, I strongly recommend that you persist the calculated column, allows you to index it and greatly improve the performance of any comparative operations.
September 25, 2018 at 8:48 am
And stop using SHA1. It has been deprecated for sometime now and your code will fail sometime in the not so distant future.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 25, 2018 at 9:10 am
Jeff Moden - Tuesday, September 25, 2018 8:48 AMAnd stop using SHA1. It has been deprecated for sometime now and your code will fail sometime in the not so distant future.
Or extremely distant future. Never know with Microsoft. Look how long text, ntext, and image have been deprecated.
Also, I would say it actually depends on how you are using the hash algorithms which you should use or not, as well as what may be supported on the various systems you are supporting. There are people out there still using SQL Server 2000 and SQL Server 2005 just as examples.
September 25, 2018 at 9:28 am
Eirikur Eiriksson - Tuesday, September 25, 2018 12:24 AMI suggest that the calculated column is persisted since the function is deterministic.
😎Example for both concatenation functions
-- SQL SERVER 2017 OR LATER
CREATE TABLE DBO.HASH_DEMO_2017
(
ID INT NOT NULL CONSTRAINT PK_DBO_HASH_DEMO_2017 PRIMARY KEY CLUSTERED
, [NAME] VARCHAR(25) NOT NULL
, [ADDRESS] VARCHAR(250) NOT NULL
, HASHVALUE AS HASHBYTES ('SHA1',CONCAT_WS('|',ID,NAME,ADDRESS)) PERSISTED
)
;
-- SQL SERVER 2012 OR LATER
CREATE TABLE DBO.HASH_DEMO_2012
(
ID INT NOT NULL CONSTRAINT PK_DBO_HASH_DEMO_2012 PRIMARY KEY CLUSTERED
, [NAME] VARCHAR(25) NOT NULL
, [ADDRESS] VARCHAR(250) NOT NULL
, HASHVALUE AS HASHBYTES ('SHA1',CONCAT([ID],'|',[NAME],'|',[ADDRESS])) PERSISTED
)
;
Actually, iirc, a deterministic computed column can be indexed with out being persisted. Also, I think that has to be SQL Server 2016 or newer. Seem to recall a QotD on this a while back.
Then again, I could be wrong.
September 25, 2018 at 9:41 am
Lynn Pettis - Tuesday, September 25, 2018 9:10 AMJeff Moden - Tuesday, September 25, 2018 8:48 AMAnd stop using SHA1. It has been deprecated for sometime now and your code will fail sometime in the not so distant future.Or extremely distant future. Never know with Microsoft. Look how long text, ntext, and image have been deprecated.
Also, I would say it actually depends on how you are using the hash algorithms which you should use or not, as well as what may be supported on the various systems you are supporting. There are people out there still using SQL Server 2000 and SQL Server 2005 just as examples.
I've got to go with Lynn on this one... Application context is an important consideration. If you're hashing for security purposes, then you need to salt the original value and use SHA_512.
If all you're doing is a bit of change detection, then salting is pointless and SHA_512 is massive overkill.
Given that the different algorithms have different storage & computational demands, The need to avoid collisions should to be balanced against storage & performance demands.
Jeff,
I know that SHA1 has depricated for password hashing & certificate signing (or anything else related to security) but is there any documentation that it has been depricated as a legitimate parameter value in the HASHBYTES function?
I ask because I don't see any prohibitions or warning about it on BOL HASHBYTES (Transact-SQL)
September 26, 2018 at 10:07 am
Eirikur Eiriksson - Tuesday, September 25, 2018 8:38 AMFurther on Jason's point, since the values are concatenated, implicitly converted to a string and divided by a delimiter, any changes in any column will introduce a change in the hashed value on a row level. Individual column values are therefore only contributing to the row hash value, cannot be individually identified as the granularity of the hash product is a row, not a column.
😎Again, I strongly recommend that you persist the calculated column, allows you to index it and greatly improve the performance of any comparative operations.
Thanks for detail explanation. Making a computed column PERSISTED makes sense. But after I made it persisted it gave below error while inserting records in the table
INSERT failed because the following SET options have incorrect settings: 'ANSI_WARNINGS'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.
I then did some google around this and found out that before creating the table if I set below properties ON will solve the issue. But it didn't work
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET NUMERIC_ROUNDABORT ON
SET QUOTED_IDENTIFIER ON
create table dbo.hashbytes_demo
(
id char(32) not null,
name varchar(25),
address varchar(250),
Systemmodstamp datetime2,
HashValue as Hashbytes('MD5', CONCAT('|',id, name, address,Systemmodstamp)),
CONSTRAINT PK_hashbytes_demo PRIMARY KEY (Id)
)
Now the error is: CREATE TABLE failed because the following SET options have incorrect settings: 'NUMERIC_ROUNDABORT'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.
What am I missing here?
September 26, 2018 at 10:08 am
Lynn Pettis - Tuesday, September 25, 2018 9:28 AMActually, iirc, a deterministic computed column can be indexed with out being persisted. Also, I think that has to be SQL Server 2016 or newer. Seem to recall a QotD on this a while back.Then again, I could be wrong.
Thanks for the example, I am using SQL Server 2014
September 26, 2018 at 10:31 am
vikasjagadale8 - Wednesday, September 26, 2018 10:07 AMThanks for detail explanation. Making a computed column PERSISTED makes sense. But after I made it persisted it gave below error while inserting records in the tableINSERT failed because the following SET options have incorrect settings: 'ANSI_WARNINGS'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.
I then did some google around this and found out that before creating the table if I set below properties ON will solve the issue. But it didn't work
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET NUMERIC_ROUNDABORT ON
SET QUOTED_IDENTIFIER ONcreate table dbo.hashbytes_demo
(
id char(32) not null,
name varchar(25),
address varchar(250),
Systemmodstamp datetime2,
HashValue as Hashbytes('MD5', CONCAT('|',id, name, address,Systemmodstamp)),
CONSTRAINT PK_hashbytes_demo PRIMARY KEY (Id)
)Now the error is: CREATE TABLE failed because the following SET options have incorrect settings: 'NUMERIC_ROUNDABORT'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.
What am I missing here?
Check this out: https://docs.microsoft.com/en-us/sql/relational-databases/views/create-indexed-views?view=sql-server-2017
September 26, 2018 at 12:14 pm
Jason A. Long - Tuesday, September 25, 2018 9:41 AMI've got to go with Lynn on this one... Application context is an important consideration. If you're hashing for security purposes, then you need to salt the original value and use SHA_512.
If all you're doing is a bit of change detection, then salting is pointless and SHA_512 is massive overkill.
Given that the different algorithms have different storage & computational demands, The need to avoid collisions should to be balanced against storage & performance demands.Jeff,
I know that SHA1 has depricated for password hashing & certificate signing (or anything else related to security) but is there any documentation that it has been depricated as a legitimate parameter value in the HASHBYTES function?
I ask because I don't see any prohibitions or warning about it on BOL HASHBYTES (Transact-SQL)
I don't believe there's any issue with SHA1 in change detection. For security, it's cryptologically insecure.
Viewing 15 posts - 1 through 15 (of 30 total)
You must be logged in to reply to this topic. Login to reply