March 3, 2015 at 7:01 am
Hi,
I am in process to develop TSql code to identify change in data.
I read about Binary_checksum and hashbyte. Some people say hashbyte is better than binay_checksum as chances of collision are less.
But if we may consider following, chances exist in hashbyte too. My question is what is the best way to compare data to identify change (I can't configure CDC) ?
select HASHBYTES('SHA','121'+'34'), HASHBYTES('SHA','12'+'134'),BINARY_CHECKSUM('121','34'),BINARY_CHECKSUM('12','134');
March 3, 2015 at 7:12 am
i think your implementation of concatenation on either one is wrong.
'121'+'34' or '12' + '134' = '12134', which is a problem with your calculation.
whether hasbytes or either of the checksum functions, i always use a separator between column values, which pretty much guarantees me what i'm looking for as far as duplicates.
'121' +'|' + '34' or '12' + +'|' +'134' are not the smae, and would not produce an accidental collision due to incorrect concatenation.
I like the checksum values a little better, myself, sicne they can take a parameter array of columns/values
in that rough example i always worry about getting an implicit integer conversion for '121'+'34', which i want to avoid as well.
WITH MyCTE(C1,C2)
AS
(
SELECT '121' ,'34' UNION ALL
SELECT '12','134'
)
select
BINARY_CHECKSUM(C1,C2),
BINARY_CHECKSUM(C1,'|',C2)
FROM MyCTE;
Lowell
March 3, 2015 at 7:46 am
Thank you.
I was so confuse as on internet people are saying binary_checksum cannot guarantee to find differences but hashbyte is much accurate.
As we have seen examples , it seems it is all about data and one is composing it. every hasing algo can have collision
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply