October 21, 2019 at 9:21 pm
I am trying to use CHECKSUM() to streamline a WHERE clause that detects when a row in one table is different from a row in a second joined table.
Today I was surprised to discover that changing the sign of a decimal value does NOT result in CHECKSUM() calculating a different value.
DECLARE @SomeValue1 decimal(18,8)
DECLARE @SomeValue2 decimal(18,8)
SET @SomeValue1 = 3
SET @SomeValue2 = -3
IF CHECKSUM(@SomeValue1) = CHECKSUM(@SomeValue2) BEGIN
PRINT CAST(@SomeValue1 AS varchar) + ' equals ' + CAST(@SomeValue2 AS varchar)
END
This outputs:
3.00000000 equals -3.00000000
I have read BOL's mention of "noncomparable types", but I don't see a mention of the sign of a decimal value being disregarded by CHECKSUM()
If the value is cast to a varchar, float or int, CHECKSUM() works as expected: if the sign of the value changes, a different result is calculated by CHECKSUM().
I did find mention of this behavior here: https://dba-presents.com/index.php/databases/sql-server/166-forget-about-binary-checksum-in-sql-server
This seems like a significant problem with CHECKSUM() and a significant omission from BOL.
Are there other better approaches to use when comparing a list of nullable columns? Does CHECKSUM() serve any useful purpose?
October 21, 2019 at 10:16 pm
Are there other better approaches to use when comparing a list of nullable columns? Does CHECKSUM() serve any useful purpose?
If you want your comparison to be bulletproof, you need to compare all of the columns. The various hashing algorithms available all introduce the possibility of collisions, however slight that possibility may be.
My favourite way of comparing nullable columns uses either of EXCEPT and INTERSECT ... there is no need to include an IsNull() check if you do this.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 22, 2019 at 10:35 am
checksum has bitten me so many times (it was there when I started), binary checksum is a bit better, hashbytes is even better - but as phil says "hashing algorithms have the potential for collision" - and trust me, it occurs more often that you expect
MVDBA
October 22, 2019 at 11:40 am
It's very easy to find values that have the same checksums. Eg.: SELECT CHECKSUM('AB'),CHECKSUM('BR') returns 2159 for both values. This is the expected behaviour.
BOL says:
If at least one of the values in the expression list changes, the list checksum will probably change. However, this is not guaranteed. Therefore, to detect whether values have changed, we recommend use of CHECKSUM only if your application can tolerate an occasional missed change.
October 22, 2019 at 2:08 pm
Thanks, Phil: EXCEPT and INTERSECT look like the way to go. I recall reading about them but had never actually used them.
October 22, 2019 at 2:24 pm
Thanks, Phil: EXCEPT and INTERSECT look like the way to go. I recall reading about them but had never actually used them.
Here is the general pattern for a MERGE using INTERSECT:
MERGE sometable WITH (HOLDLOCK) trg
USING someothertable src
ON trg.pk = src.pk
WHEN MATCHED AND NOT EXISTS
(
SELECT src.col1, src.col2 INTERSECT SELECT trg.col1, trg.col2
) THEN
UPDATE SET trg.col1 = src.col1
,trg.col2 = src.col2
,trg.ModifiedAt = GETDATE()
WHEN NOT MATCHED BY TARGET THEN
INSERT
(
pk
,col1
,col2
,CreatedAt
)
VALUES
(src.pk, src.col1, src.col2, GETDATE());
--WHEN NOT MATCHED BY SOURCE THEN DELETE;
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 22, 2019 at 2:45 pm
The only thing I use checksums and hashes for is a quick check to prove things are not the same. If the checksum or hash indicates they are, then further proof is required to prove that they actually are the same.
If you use them in that manner, you can greatly increase the performance of your checks because if they are different, they are guaranteed to have different underlying values (if they underlying values were all consumed in the same order). After that initial high speed check, then run the lower speed checks of INTERSECT and/or EXCEPT to check on the greatly reduced set of things that appear to be dupes.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply