checksum reliability

  • Platform: MSSQL 2008 R2

    What: performance problem

    problem description: accessing records in a table with varchar(7500) column using a varchar(7500) local variable

    example:

    create table dbo.some_table

    (Package int,

    Package_key varchar(7500))

    ....some inserts

    select statement:

    declare @some_key varchar(7500) = '12345,67890,45454,3443,4543545,'

    select count(1) from dbo.some_table where Package_key = @some_key

    --this performs slowly, which is expected, for index cannot be created on Package_key column

    Solution:

    1.Alter_table dbo.some_table add cs_key as CHECKSUM(Package_key);

    2.Create index cs_key_ind on dbo.some_table(cs_key)

    3.Rewrite the select:

    select count(1) from dbo.some_table where cs_key = checksum(@some_key)

    So far so good. What “scares” me is the thing I’ve read here:

    http://msdn.microsoft.com/en-us/library/ms189788.aspx

    it says:

    “/*Use the index in a SELECT query. Add a second search

    condition to catch stray cases where checksums match,

    but the values are not the same.*/

    SELECT *

    FROM Production.Product

    WHERE CHECKSUM(N'Bearing Ball') = cs_Pname

    AND Name = N'Bearing Ball';

    GO”

    My question is simple; how accurate or should I say, reliable checksum really is?

    Ideas, explanations, confirmations are highly appreciated.

  • Reading this, I kind of get the idea CHECKSUM is bad to use...

    http://www.sqlservercentral.com/blogs/steve_jones/2009/06/01/sql-server-encryption-hashing-collisions/

    HashBytes might work for you though. Alternatively, could you build a Full Text index on the column?

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply