help to fix hash procedure

  • hi, i create a procedure to get hash code from entire table's contents. it works fine in 99% , but in some cases it returns the same number for different data.

    in this code the ord value is swapped for two records, bat the hash is the same.

    is there a way to fix the procedure ?


    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[TestTable](
    [ID] [int] NOT NULL,
    [DocID] [varchar](10) NOT NULL,
    [Ord] [int] NULL,
    CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
    (
    [ID] ASC,
    [DocID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE PROCEDURE [dbo].[GetCRC]
    (
    @Table nvarchar(100)
    )
    AS
    BEGIN

    Declare @CRC nvarchar(100)
    Declare @CRCResult nvarchar(100)
    Declare @SQL nvarchar(500)
    Declare @ParamDefinition nvarchar(100)

    Set @ParamDefinition = N'@CRC nvarchar(100) OUTPUT';
    Set @SQL = 'Select @CRC = CHECKSUM_AGG(BINARY_CHECKSUM(*)) From ' + @Table

    EXECUTE sp_executesql @SQL , @ParamDefinition, @CRC = @CRC output;
    Set @CRCResult = IsNull(@CRC,'0')

    Select IsNull(@CRCResult,'0') as CRC, @Table as TableName

    END

    GO

    Insert into TestTable (ID,DocID,Ord)
    values(1, 'aaa', 1),
    (1, 'bbb', 2),
    (1, 'ccc', 3),
    (1, 'ddd', 4),
    (1, 'eee', 5),
    (1, 'fff', 6),
    (1, 'ggg', 7)

    exec [dbo].[GetCRC] 'TestTable'

    update TestTable set ord=2 where id=1 and DocID='aaa'
    update TestTable set ord=1 where id=1 and DocID='bbb'

    exec [dbo].[GetCRC] 'TestTable'?

     

  • There is no guarantee that CHECKSUM_AGG() will return a unique value for every different recordset, so 'fixing' the proc is not really an option unless you are prepared to use additional hashing methods to reduce the likelihood of collisions.

    But the likelihood will never be zero.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • yes, using another approach is a valid option, i need, at least, to reduce collisions.

  • Maybe use a checksum + some other value in the row? A PK perhaps?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply