March 13, 2019 at 11:33 am
All,
Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version.
Here is the scenario.
I have an on-prem SQL 2016 database and a SQL Azure database. The schema is exactly the same between these two databases. I updated the Azure database using the SQL 2016 as the source.
After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.
Has anyone seen this issue before?
the source and destination has numeric and char datatypes only.
March 13, 2019 at 12:58 pm
You need to anally check to be sure that your method of calculation of CHECKSUM is 100% identical in both environments, and that the data is 100% identical as well. Please post both queries that do the CHECKSUM in each environment, and maybe someone here will see something that you didn't happen to see. All of us occasionally need a 2nd set of eyes on something.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
March 13, 2019 at 1:05 pm
Thanks for the response.
Please find the query. I copied and pasted the same query in both the instances.
SELECT
CONVERT(VARBINARY(32),HASHBYTES('SHA2_256',CONCAT([AGENCY_NUM],LTRIM(RTRIM([CARRIER_CODE])),[TS_D1],[EXT_FLAG],[FM_USERID],[TS_D2],[TOTAL_NUM_TXNS],[NUM_TXN],LTRIM(RTRIM([TXNS]))
))) AS Checksum
This is the table definition. It is also same between source and destination.
[AGENCY_NUM] [numeric](10, 0) NOT NULL,
[CARRIER_CODE] [char](4) NOT NULL,
[TS_D1] [numeric](19, 0) NOT NULL,
[EXT_FLAG] [numeric](5, 0) NOT NULL,
[FM_USERID] [numeric](5, 0) NULL,
[TS_D2] [numeric](19, 0) NULL,
[TOTAL_NUM_TXNS] [numeric](5, 0) NULL,
[NUM_TXN] [numeric](5, 0) NULL,
[TXNS] [varchar](4000) NULL,
March 13, 2019 at 1:07 pm
Are you sure that the results of HASHBYTES will never exceed VARBINARY(32) ? I've never used that function, so I have no idea what it does or how it works, beyond the obvious that it computes a hash value.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
March 13, 2019 at 1:14 pm
Now that I've looked that up, I'm thinking that your varchar(4000) column might be a pain point. Any row with a couple thousand bytes in it is trouble for hashing without going to a larger hash value. I'd be concerned about a higher shot at duplicate hashes at those kinds of lengths. Have you tried using SHA2_512 and VARBINARY(64) ?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
March 13, 2019 at 1:27 pm
Yeah. That column with 4000 bytes is the issue. If I take that out, the checksum value matches between the instances. I even tried the following, but still no luck.
CONVERT(VARBINARY(max),HASHBYTES('SHA2_512',CONCAT([AGENCY_NUM],LTRIM(RTRIM([CARRIER_CODE])),[TS_D1],[EXT_FLAG],[FM_USERID],[TS_D2],[TOTAL_NUM_TXNS],[NUM_TXN],LTRIM(RTRIM([TXNS]))
))) AS Checksum
March 13, 2019 at 1:46 pm
Honestly, a checksum value of only 64 bits for 4,000 bytes just isn't realistic or practical. What is the business case for the checksum value?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
March 14, 2019 at 9:28 am
I completely agree with you. My business scenario demands it. The destination gets updated/inserted every night to keep historical data. I have to update the destination if the checksum values are different between the records.
March 14, 2019 at 1:07 pm
hum.. I've used this even with higher than 4k and never an issue.
from docs - For SQL Server 2014 (12.x) and earlier, allowed input values are limited to 8000 bytes.
otherwise there is no limit.
if the hash is different then something on the data is different even if it looks the same.
I would convert the individual columns to hex and compare those manually to see if there are differences.
then do the same to the concat output.
March 14, 2019 at 1:14 pm
Siva Ramasamy - Thursday, March 14, 2019 9:28 AMI completely agree with you. My business scenario demands it. The destination gets updated/inserted every night to keep historical data. I have to update the destination if the checksum values are different between the records.
Hmmm... wondering if you might have to use CHECKSUM solely on the other columns, and go with direct comparison on the varchar(4000) column. Definitely a huge PITA and considerable resource hog, but what other choice do you have?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
March 14, 2019 at 1:15 pm
Maybe two separate hashes: (1) all the other columns and (2) the 4K column alone?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 14, 2019 at 6:31 pm
Thank you. That did not work either.
I went for a different approach. Started using change tracking instead of checksum.
THanks
March 16, 2019 at 8:10 am
Siva Ramasamy - Wednesday, March 13, 2019 11:33 AMAll,
Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version.Here is the scenario.
I have an on-prem SQL 2016 database and a SQL Azure database. The schema is exactly the same between these two databases. I updated the Azure database using the SQL 2016 as the source.
After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.
Has anyone seen this issue before?
the source and destination has numeric and char datatypes only.
Can you post the DDL and DML for both sources, the hash statement and some values where you are seeing different results please?
😎
If you are correct, this is a bug!
March 16, 2019 at 10:42 am
Eirikur Eiriksson - Saturday, March 16, 2019 8:10 AMSiva Ramasamy - Wednesday, March 13, 2019 11:33 AMAll,
Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version.Here is the scenario.
I have an on-prem SQL 2016 database and a SQL Azure database. The schema is exactly the same between these two databases. I updated the Azure database using the SQL 2016 as the source.
After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.
Has anyone seen this issue before?
the source and destination has numeric and char datatypes only.
Can you post the DDL and DML for both sources, the hash statement and some values where you are seeing different results please?
😎
If you are correct, this is a bug!
Maybe not... we don't actually know that the OP "updated" the data correctly. The difference in the HASHBYTES might be screaming that's true.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 16, 2019 at 10:54 am
Jeff Moden - Saturday, March 16, 2019 10:42 AMEirikur Eiriksson - Saturday, March 16, 2019 8:10 AMSiva Ramasamy - Wednesday, March 13, 2019 11:33 AMAll,
Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version.Here is the scenario.
I have an on-prem SQL 2016 database and a SQL Azure database. The schema is exactly the same between these two databases. I updated the Azure database using the SQL 2016 as the source.
After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.
Has anyone seen this issue before?
the source and destination has numeric and char datatypes only.
Can you post the DDL and DML for both sources, the hash statement and some values where you are seeing different results please?
😎
If you are correct, this is a bug!Maybe not... we don't actually know that the OP "updated" the data correctly. The difference in the HASHBYTES might be screaming that's true.
yeah.. I did ask him to confirm by standard means
I would convert the individual columns to hex and compare those manually to see if there are differences.
then do the same to the concat output.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply