Duplicate key error

  • Ok, this error is driving me crazy.

    Obviously, I fully understand what a duplicate key error is, but I do not understand why I am getting it.

    I do have the following table at work:

    CREATE TABLE [dbo].[InCache](

    [database_id] [int] NOT NULL,

    [schema_name] [char](3) NULL,

    [object_id] [bigint] NOT NULL,

    [object_name] [nvarchar](100) NULL,

    [last_execution_time] [smalldatetime] NOT NULL,

    [SampleDate] [smalldatetime] NOT NULL,

    CONSTRAINT [PK_database_id_schema_ObjectID] PRIMARY KEY CLUSTERED

    (

    [database_id] ASC,

    [object_id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    The purpose of this table is basically keep track of all the store procedures we call. We have a lot of store procedures that we no longer use (in theory) so this table helps to identify those. The table is being fed via sproc, as follows:

    SET NOCOUNT ON;

    DECLARE @MetricDate datetime=GETDATE();

    MERGE INTO dbo.InCache DESTINATION

    USING(

    SELECT

    database_id

    ,OBJECT_SCHEMA_NAME(object_id,database_id) AS [SCHEMA_NAME]

    ,object_id

    ,OBJECT_NAME(object_id,database_id) AS [OBJECT_NAME]

    ,MAX(last_execution_time) last_execution_time

    ,MAX(@MetricDate) AS SampleDate

    --,execution_count

    FROM sys.dm_exec_procedure_stats s

    WHERE s.database_id >4 AND s.database_id<>32767

    GROUP BY database_id, OBJECT_SCHEMA_NAME(object_id,database_id), object_id , OBJECT_NAME(object_id,database_id)

    ) AS SRC ON SRC.[OBJECT_NAME]=DESTINATION.[OBJECT_NAME] AND SRC.database_id=Destination.database_id

    WHEN MATCHED

    AND DESTINATION.last_execution_time <> SRC.last_execution_time THEN

    UPDATE SET

    DESTINATION.last_execution_time = SRC.last_execution_time

    WHEN NOT MATCHED THEN

    INSERT ( database_id, [schema_name], [object_id], [object_name], [last_execution_time], [SampleDate]) VALUES (SRC.database_id, SRC.[schema_name], SRC.[object_id], SRC.[object_name], SRC.[last_execution_time], @MetricDate);

    The problem is that randomly, I am getting the following error:

    Violation of PRIMARY KEY constraint 'PK_database_id_schema_ObjectID'. Cannot insert duplicate key in object 'dbo.InCache'. The duplicate key value is (6, 656109478). [SQLSTATE 23000] (Error 2627) The statement has been terminated. [SQLSTATE 01000] (Error 3621). The step failed.

    My PK is on database id, schema id and object id, which should be unique on each database. So why I am getting above error?

    I have the suspicion that something is wrong on the main DMV inside the sproc, but I do not know what.

    What's wrong on the main DMV or why I am getting those duplicate key errors?

    :ermm:

  • Just for clarification, it looks like the primary key is just on database_id and object_id; schema_id is not included. I don't think that's the issue, though.

    One way I know this could happen is if a stored procedure is renamed. You're matching source and destination on database_id and object_name, instead of object_id. If a stored procedure gets renamed and put back in the plan cache, it will have the same object_id, but will not be matched on object_name, so the INSERT will fire.

    Unless there's a particular reason you're matching on database_id and object_name, I'd just switch to matching on database_id and object_id, which would avoid that problem.

    Cheers!

  • The DMV dm_exec_procedure_stats returns one row for each cached stored procedure plan. There can be multiple cached plans for a single procedure (database_id, object_id). The column 'plan_handle' is the plan identifier and so should be unique for each row in result.

    https://msdn.microsoft.com/en-us/library/cc280701.aspx

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (1/21/2016)


    The DMV dm_exec_procedure_stats returns one row for each cached stored procedure plan. There can be multiple cached plans for a single procedure (database_id, object_id). The column 'plan_handle' is the plan identifier and so should be unique for each row in result.

    https://msdn.microsoft.com/en-us/library/cc280701.aspx

    ?

    Means?

  • Jacob Wilkins (1/21/2016)


    Just for clarification, it looks like the primary key is just on database_id and object_id; schema_id is not included. I don't think that's the issue, though.

    One way I know this could happen is if a stored procedure is renamed. You're matching source and destination on database_id and object_name, instead of object_id. If a stored procedure gets renamed and put back in the plan cache, it will have the same object_id, but will not be matched on object_name, so the INSERT will fire.

    Unless there's a particular reason you're matching on database_id and object_name, I'd just switch to matching on database_id and object_id, which would avoid that problem.

    Cheers!

    Good point.

    No, we do not rename sprocs, but (do not ask me why) we drop and recreate, instead of recompile. So if we have MySproc_A ... and we added some changes, Developers basically drop MySproc_A and create a new one, with same name, that now includes the new changes. I can see that such case can potentially introduce problems, with different objects IDs but same name? But that's why I am using object_name and database_id on the MERGE command, instead of object id.

  • sql-lover (1/21/2016)


    Eric M Russell (1/21/2016)


    The DMV dm_exec_procedure_stats returns one row for each cached stored procedure plan. There can be multiple cached plans for a single procedure (database_id, object_id). The column 'plan_handle' is the plan identifier and so should be unique for each row in result.

    https://msdn.microsoft.com/en-us/library/cc280701.aspx

    ?

    Means?

    Looking at your MERGE more closely, I see now that you're grouping by database and object. I think the problem has to do with including OBJECT_ID in the group. Perhaps just group and merge on DATABASE_ID + OBJECT_NAME.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Because for procedures, the lookup for the plans in cache is database_id, object_id and some of the SET options.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Eric M Russell (1/21/2016)


    sql-lover (1/21/2016)


    Eric M Russell (1/21/2016)


    The DMV dm_exec_procedure_stats returns one row for each cached stored procedure plan. There can be multiple cached plans for a single procedure (database_id, object_id). The column 'plan_handle' is the plan identifier and so should be unique for each row in result.

    https://msdn.microsoft.com/en-us/library/cc280701.aspx

    ?

    Means?

    Looking at your MERGE more closely, I see now that you're grouping by database and object. I think the problem has to do with including OBJECT_ID in the group. Perhaps just group and merge on DATABASE_ID + OBJECT_NAME.

    I see your point, and is consistent with developer's behaviour here. I will take a look. I also changed the PK, is now on db_id, schema_id, object_name. Huge Pk 🙁 ... but certainly unique.

  • sql-lover (1/21/2016)


    Eric M Russell (1/21/2016)


    sql-lover (1/21/2016)


    Eric M Russell (1/21/2016)


    The DMV dm_exec_procedure_stats returns one row for each cached stored procedure plan. There can be multiple cached plans for a single procedure (database_id, object_id). The column 'plan_handle' is the plan identifier and so should be unique for each row in result.

    https://msdn.microsoft.com/en-us/library/cc280701.aspx

    ?

    Means?

    Looking at your MERGE more closely, I see now that you're grouping by database and object. I think the problem has to do with including OBJECT_ID in the group. Perhaps just group and merge on DATABASE_ID + OBJECT_NAME.

    I see your point, and is consistent with developer's behaviour here. I will take a look. I also changed the PK, is now on db_id, schema_id, object_name. Huge Pk 🙁 ... but certainly unique.

    If minimizing the size of the PK is important, perhaps it's a foreign key into other tables, then you can consider using a binary hash of object_name rather than the full character name. For example, HASHBYTES('MD5') returns a 16 byte hash, but you can re-cast it as BIGINT (8 bytes) or INT (4 bytes). However, the shorter the binary coversion, the greater the probability for collisions (resulting in duplicate key violation). If you choose to do this you'll want to experiment with how probable this can potentially be. Even a relatively complex database will only have a few hundred stored procedure names, so maybe it's not likely at all.

    SELECT OBJECT_NAME(object_id,database_id) AS [OBJECT_NAME]

    , HASHBYTES('MD5', OBJECT_NAME(object_id,database_id)) AS OBJECT_MD5

    , CAST( HASHBYTES('MD5', OBJECT_NAME(object_id,database_id)) AS BIGINT) AS OBJECT_INT8

    , CAST( HASHBYTES('MD5', OBJECT_NAME(object_id,database_id)) AS INT) AS OBJECT_INT4

    FROM sys.dm_exec_procedure_stats s

    WHERE s.database_id > 4 AND s.database_id <> 32767;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (1/21/2016)


    sql-lover (1/21/2016)


    Eric M Russell (1/21/2016)


    sql-lover (1/21/2016)


    Eric M Russell (1/21/2016)


    The DMV dm_exec_procedure_stats returns one row for each cached stored procedure plan. There can be multiple cached plans for a single procedure (database_id, object_id). The column 'plan_handle' is the plan identifier and so should be unique for each row in result.

    https://msdn.microsoft.com/en-us/library/cc280701.aspx

    ?

    Means?

    Looking at your MERGE more closely, I see now that you're grouping by database and object. I think the problem has to do with including OBJECT_ID in the group. Perhaps just group and merge on DATABASE_ID + OBJECT_NAME.

    I see your point, and is consistent with developer's behaviour here. I will take a look. I also changed the PK, is now on db_id, schema_id, object_name. Huge Pk 🙁 ... but certainly unique.

    If minimizing the size of the PK is important, perhaps it's a foreign key into other tables, then you can consider using a binary hash of object_name rather than the full character name. For example, HASHBYTES('MD5') returns a 16 byte hash, but you can re-cast it as BIGINT (8 bytes) or INT (4 bytes). However, the shorter the binary coversion, the greater the probability for collisions (resulting in duplicate key violation). If you choose to do this you'll want to experiment with how probable this can potentially be. Even a relatively complex database will only have a few hundred stored procedure names, so maybe it's not likely at all.

    SELECT OBJECT_NAME(object_id,database_id) AS [OBJECT_NAME]

    , HASHBYTES('MD5', OBJECT_NAME(object_id,database_id)) AS OBJECT_MD5

    , CAST( HASHBYTES('MD5', OBJECT_NAME(object_id,database_id)) AS BIGINT) AS OBJECT_INT8

    , CAST( HASHBYTES('MD5', OBJECT_NAME(object_id,database_id)) AS INT) AS OBJECT_INT4

    FROM sys.dm_exec_procedure_stats s

    WHERE s.database_id > 4 AND s.database_id <> 32767;

    nah! ... lol ... the InCache table is my own table and resides on a DBA database... I maintained it on my own. But I'm "anal" with PK sizes.

    The job has been running for quite a while now, and no errors so far. So hopefully the new PK did the trick.

    I have not played around with the hashbyte function, I may do that on my own lab at home though, looks interesting.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply