February 6, 2007 at 2:26 pm
Hi all
Does anyone have an explanation as to why Binary_checksum(98000.00) would be the same as Binary_Checksum(980000.00). It is not the same if it is an Integer. I know there are disclaimers for case sensitive strings, but I would expect numbers to be processed correctly...
Thanks
Michael..
February 7, 2007 at 11:26 am
Here's what the BOL says:
"BINARY_CHECKSUM(*), computed on any row of a table, returns the same value as long the row is not subsequently modified. BINARY_CHECKSUM(*) will return a different value for most, but not all, changes to the row, and can be used to detect most row modifications."
So, I would say that the function works as advertised. The checksum is not guaranteed to be unique across any possible values that it is supplied, but that's true of any checksum.
It should probably work better, but it doesn't. In any case, where you would use a hash value to compare two pieces of data, you still have to check the actual data after a checksum match.
The short answer is that the checksum over numeric values appears to only uses the mantissa, and not the exponent in its calculations.
February 7, 2007 at 12:13 pm
That is what I would call 'False Advertising'.....I am planning to change my process to use a Hashing algorithm via HASHBYTES.
Thanks for the info....
February 7, 2007 at 12:25 pm
By the way, the explanation does not stand for string data.
The following also produces the same checksum...
SELECT CHECKSUM('aa'),
CHECKSUM('bQ')
IN SHORT......DO NOT USE for the purpose intended......
It would be almost funny if it was not so sad....
February 7, 2007 at 12:39 pm
I'll add that I experimented with the checksum functions and am disappointed. Something like CRC would be much better, and even a simple shift-and-add would work much better.
SELECT
BINARY_CHECKSUM(0x0101)
17
SELECT BINARY_CHECKSUM(0x010101)
273
As you can see, the formula seems to be multiply by 16 and add. This works poorly for many strings, I've noticed, resulting in many collisions. If they used multiply by 17 it would work much better as a hash function.
However, I too find it strange that the checksum is the same for the two values mentioned, considering that their binary representations are substantially different. I guess the function must scale the argument to eliminate the mantissa before hashing the bytes.
SELECT
CAST(98000.00 AS VARBINARY)
0x0702000140899500
SELECT
CAST(980000.00 AS VARBINARY)
0x08020001805CD705
SELECT
BINARY_CHECKSUM(98000.00)
-1810540209
SELECT
BINARY_CHECKSUM(980000.00)
-1810540209
SELECT
BINARY_CHECKSUM(CAST(98000.00 AS VARBINARY))
119560197
SELECT
BINARY_CHECKSUM(CAST(980000.00 AS VARBINARY))
-2113318539
Hashing seems to proceed from least-significant byte to most-significant byte....I can't quite seem to decode the algorithm this moment, though.
SELECT
BINARY_CHECKSUM(CAST(4294967296*4294967296*4294967296 AS DECIMAL(36,2)))
1000000000
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply