Data Compare

Question

Post reply

Data Compare

thbaig

Hall of Fame

Points: 3003
More actions
March 3, 2015 at 7:01 am

#315156

Hi,
I am in process to develop TSql code to identify change in data.
I read about Binary_checksum and hashbyte. Some people say hashbyte is better than binay_checksum as chances of collision are less.
But if we may consider following, chances exist in hashbyte too. My question is what is the best way to compare data to identify change (I can't configure CDC) ?
select HASHBYTES('SHA','121'+'34'), HASHBYTES('SHA','12'+'134'),BINARY_CHECKSUM('121','34'),BINARY_CHECKSUM('12','134');

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323518 More actions · Answer 1

i think your implementation of concatenation on either one is wrong.

'121'+'34' or '12' + '134' = '12134', which is a problem with your calculation.

whether hasbytes or either of the checksum functions, i always use a separator between column values, which pretty much guarantees me what i'm looking for as far as duplicates.

'121' +'|' + '34' or '12' + +'|' +'134' are not the smae, and would not produce an accidental collision due to incorrect concatenation.

I like the checksum values a little better, myself, sicne they can take a parameter array of columns/values

in that rough example i always worry about getting an implicit integer conversion for '121'+'34', which i want to avoid as well.

WITH MyCTE(C1,C2)

AS

(

SELECT '121' ,'34' UNION ALL

SELECT '12','134'

)

select

BINARY_CHECKSUM(C1,C2),

BINARY_CHECKSUM(C1,'|',C2)

FROM MyCTE;

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

thbaig Hall of Fame Points: 3003 More actions · Answer 2

Thank you.

I was so confuse as on internet people are saying binary_checksum cannot guarantee to find differences but hashbyte is much accurate.

As we have seen examples , it seems it is all about data and one is composing it. every hasing algo can have collision