Deadlock Problem Analysis

  • We are currently experiencing massive! deadlock problems in a newly delevoped system.

    Our setup is as follows:

    DBMS: MS SQL Server 2000 SP3

    C++ Client Applications concurrently use a DB via ADO. Performing DB operations simultanously (users hit the "SAVE" button at the same time) on two or more clients leads to deadlock situations on the DB almost every time.

    Two processes deadlock since each one is trying to obtain a shared-lock on

    a resource that is exclusively locked by the other process (which is due to

    a DELETE T-SQL statement fired via an ADO connection).

    Now, we have some problems to fully understand the deadlock situation:

    1) What operation is responsible for the request of the shared locks, after

       all, both processes are trying to delete some rows from the same table

       by invocing the Execute-method of an ADO command object

       ("p_ptrCommand->Execute(NULL, NULL, adExecuteNoRecords);") ?

    2) We are using the "READ UNCOMMITTED" transaction isolation level in all

       ADO DB connections, so shared locks should coexist with exclusive locks

       (at least in theory). Here is a sniplet form our ADO connection code:

          _bstr_t sbTransactionSet;

          sbTransactionSet = "SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED";

          m_pConnection->Execute(sbTransactionSet, &vtEmpty, adCmdUnknown);

       Have we misunderstood something here?

    We tried almost all deadlock avoidance tips we could find in literature/web

    but could not solve our problem yet.

    Has anyone some good ideas what is wrong with our application?

    Here is the deadlock report from the server logs:

    2004-02-11 13:23:25.81  spid4 ResType:LockOwner Stype:'OR' Mode: S SPID:53 ECID:0 Ec0x350ed4e0) Value:0x194

    2004-02-11 13:23:25.81  spid4 Victim Resource Owner:

    2004-02-11 13:23:25.81  spid4 ResType:LockOwner Stype:'OR' Mode: S SPID:54 ECID:0 Ec0x261f54e0) Value:0x194

    2004-02-11 13:23:25.81  spid4 Requested By:

    2004-02-11 13:23:25.81  spid4 Input Buf: RPC Event: sp_executesql;1

    2004-02-11 13:23:25.81  spid4 SPID: 53 ECID: 0 Statement Type: DELETE Line #: 1

    2004-02-11 13:23:25.81  spid4 Owner:0x1945f720 Mode: X        Flg:0x0 Ref:1 Life:02000000 SPID:53 ECID:0

    2004-02-11 13:23:25.81  spid4 Grant List::

    2004-02-11 13:23:25.81  spid4 KEY: 9:1595868752:1 (d200b80a219b) CleanCnt:1 Mode: X Flags: 0x0

    2004-02-11 13:23:25.81  spid4 Node:2

    2004-02-11 13:23:25.81  spid4 

    2004-02-11 13:23:25.81  spid4 ResType:LockOwner Stype:'OR' Mode: S SPID:53 ECID:0 Ec0x350ed4e0) Value:0x194

    2004-02-11 13:23:25.81  spid4 Requested By:

    2004-02-11 13:23:25.81  spid4 Input Buf: RPC Event: sp_executesql;1

    2004-02-11 13:23:25.81  spid4 SPID: 54 ECID: 0 Statement Type: DELETE Line #: 1

    2004-02-11 13:23:25.81  spid4 Owner:0x1945f980 Mode: X        Flg:0x0 Ref:1 Life:02000000 SPID:54 ECID:0

    2004-02-11 13:23:25.81  spid4 Grant List::

    2004-02-11 13:23:25.81  spid4 KEY: 9:1595868752:1 (9a00e0d35961) CleanCnt:1 Mode: X Flags: 0x0

    2004-02-11 13:23:25.81  spid4 Node:1

    2004-02-11 13:23:25.81  spid4 

    2004-02-11 13:23:25.81  spid4 Wait-for graph

    2004-02-11 13:23:25.81  spid4 

    2004-02-11 13:23:25.81  spid4 ...

    2004-02-11 13:23:30.81  spid4 ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec0x2b5c14e0) Value:0x194

    2004-02-11 13:23:30.81  spid4 Victim Resource Owner:

    2004-02-11 13:23:30.81  spid4 ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec0x2b5c14e0) Value:0x194

    2004-02-11 13:23:30.81  spid4 Requested By:

    2004-02-11 13:23:30.81  spid4 Input Buf: RPC Event: sp_executesql;1

    2004-02-11 13:23:30.81  spid4 SPID: 54 ECID: 0 Statement Type: DELETE Line #: 1

    2004-02-11 13:23:30.81  spid4 Owner:0x1945fa60 Mode: X        Flg:0x0 Ref:1 Life:02000000 SPID:54 ECID:0

    2004-02-11 13:23:30.81  spid4 Grant List::

    2004-02-11 13:23:30.81  spid4 KEY: 9:1595868752:1 (800006ffe624) CleanCnt:1 Mode: X Flags: 0x0

    2004-02-11 13:23:30.81  spid4 Node:2

    2004-02-11 13:23:30.81  spid4 

    2004-02-11 13:23:30.81  spid4 ResType:LockOwner Stype:'OR' Mode: S SPID:54 ECID:0 Ec0x261f54e0) Value:0x194

    2004-02-11 13:23:30.81  spid4 Requested By:

    2004-02-11 13:23:30.81  spid4 Input Buf: RPC Event: sp_executesql;1

    2004-02-11 13:23:30.81  spid4 SPID: 55 ECID: 0 Statement Type: DELETE Line #: 1

    2004-02-11 13:23:30.81  spid4 Owner:0x19469f00 Mode: X        Flg:0x0 Ref:1 Life:02000000 SPID:55 ECID:0

    2004-02-11 13:23:30.81  spid4 Grant List::

    2004-02-11 13:23:30.81  spid4 KEY: 9:1595868752:1 (85001af7e786) CleanCnt:1 Mode: X Flags: 0x0

    2004-02-11 13:23:30.81  spid4 Node:1

    2004-02-11 13:23:30.81  spid4 

    2004-02-11 13:23:30.81  spid4 Wait-for graph

    2004-02-11 13:23:30.81  spid4 

    2004-02-11 13:23:30.81  spid4 ...

  • Try the BeginTransaction property of the connection.  This will build an explicit transaction around the delete.  When the execute statements is finished then use the Rollback or Commit property.

    If this doesn't work I know of some promatic workarounds that work very nicely.

    Yours,

    Scott Rankin
    CoolSavings, Inc
    Database Administrator (DBA)

     
    PS.  Is all your SQL embedded in the front-end or are you using stored procedures?

  • if you are selescting from the table then use [with (nolock)] in Table hint to avoid making lock on the table

    advantage: faster ... no locks

    disadvantage: the data is not update up to this second you select because in this secnd some other user may insert or update a record that youy will not see it.


    Alamir Mohamed
    Alamir_mohamed@yahoo.com

  • Can you post the exact delete statement you are using?


    * Noel

  • Disable the page lock through sp_indexoption.

  • Sorry for the late reply ...

    Most of our modifing database interaction is handled via stored procedures. Using T-SQL directly for deleting data is only one variant we tried, we have also a stored procedure using explict transactions (see code at end of posting). However, the deadlock situation occurs also when we are using the stored procedure(s). We we also playing with the table indices and could solve the problem by adding an index on our test-server, but not on our (faster 2 processor) production server.

    Could you explain your workarounds a bit more?

    CREATE PROCEDURE SYSTEM.sp_INSTANCE_DELETE

    @nINSTANCE_ID bigint

     

    AS

    BEGIN

    DECLARE @nReturn int, @nValue_ID bigint

    if ( not exists( select * from  [SYSTEM].[INSTANCE] where INSTANCE_ID = @nINSTANCE_ID ) )

      return 7

    -- Check for all depending childs --

    if exists( select * from [SYSTEM].[INSTANCE] where ( PARENTINSTANCE_ID =  @nINSTANCE_ID ) AND(  PARENTINSTANCE_ID <> INSTANCE_ID) )

     return 14

    BEGIN TRAN T1

    DELETE from [SYSTEM].[INSTANCE_REF] WHERE INSTANCE_ID = @nINSTANCE_ID OR PARENT_INSTANCE_ID = @nINSTANCE_ID

    IF (@@ERROR > 0 )

     Begin

      ROLLBACK TRAN T1

      return @@ERROR  + 10000

     End

    DELETE from [SYSTEM].[INSTANCE] WHERE INSTANCE_ID = @nINSTANCE_ID

    IF (@@ERROR > 0 )

     Begin

      ROLLBACK TRAN T1

      return @@ERROR  + 10000

     End

    COMMIT TRAN T1

    return 0 

    END

    GO

  • Sorry for the late reply ...

    Here is the C++ code doing the DELETE:

     

          _CommandPtr pCmd = PrepareCommand("DELETE FROM SYSTEM.INSTANCE WHERE INSTANCE_ID = ? ", p_pInstance);

          DECIMAL Dec1;

          _variant_t vtINSTANCE_ID = TUDSLONGLONG2VARIANT(p_pInstance->GetID(), Dec1);

          AppendParameter2Cmd(pCmd, "INSTANCE_ID", adBigInt, adParamInput, 23, vtINSTANCE_ID);

         

          if (ExecutePreparedCommand(pCmd, p_pInstance))

          {

             // all values deleted at DB comes into deleted list

             new CudsPlaceHolder(p_pInstance->GetCPPType(), p_pInstance->GetID(), p_pDeletelist);  

             RETURNWITHDEBUG (true);

          }

          else

          {

             // do not add to deleted list

             RETURNWITHDEBUG (false);

          }

    where PrepareCommand does:

    _CommandPtr CudsMSSQLAccess:repareCommand(  char * p_szQuery, CudsError * p_pErrorInstance )

    {

      

       _CommandPtr pCmd = NULL;

       HRESULT hr = S_OK;

      

       if(!m_pConnection || !m_pConnection->IsConnected())

       {

          if(p_pErrorInstance)

          {

             p_pErrorInstance->InsertError(Cuds::eReactionAbort,

                                           Cuds::eTierDbAccess, 

                                           Cuds::eErrGeneralFatal,

                                           "connectionhandler not given or connection is down",

                                           "CudsMSSQLAccess:repareStoredProc");

          }

         

          throw "No Connection";

          return NULL;

       }

      

       if((hr = pCmd.CreateInstance(__uuidof(Command))) != S_OK)

       {

          if(p_pErrorInstance)

          {

             char szMessage[100];

             sprintf(szMessage, "error in call pCmd.CreateInstance %x", hr);

            

             p_pErrorInstance->InsertError(Cuds::eReactionAbort,

                                           Cuds::eTierDbAccess, 

                                           Cuds::eErrGeneralFatal,

                                           szMessage,

                                           "CudsMSSQLAccess:repareStoredProc");

          }

         

          throw "Invalid Handle";

          return NULL;

       } 

      

       try

       {

          pCmd->ActiveConnection = m_pConnection->GetConnection();

          pCmd->CommandType = adCmdText;

          pCmd->CommandText = p_szQuery;

       }

       catch(...)

       {

          if(p_pErrorInstance)

          {

             m_pConnection->SqlErrorHandler(p_pErrorInstance, p_szQuery);

          }

         

          return NULL;

       }

      

       return pCmd;

    }

     

    and ExecutePreparedCommand does:

    {

       HRESULT hr = S_OK;

       _variant_t vtEmpty ;

      

       if((!m_pConnection) || (!m_pConnection->IsConnected()))

       {

          if(p_pErrorInstance)

          {

             p_pErrorInstance->InsertError(Cuds::eReactionAbort,

                                           Cuds::eTierDbAccess, 

                                           Cuds::eErrGeneralFatal,

                                           "no connectionhandler given or connection is down",

                                           "CudsMSSQLAccess::ExecuteCommand");

          }

         

          return false;

       }

       try

       {

          /*

          _variant_t  vtEmpty (DISP_E_PARAMNOTFOUND, VT_ERROR);

          _variant_t  vtEmpty2(DISP_E_PARAMNOTFOUND, VT_ERROR);

         

          p_ptrCommand->Execute(&vtEmpty,&vtEmpty2,adCmdText);

          */

          // @todo check if this modification works

          p_ptrCommand->Execute(NULL, NULL, adExecuteNoRecords);

          return true;

       }

       catch(...)

       {

          if(p_pErrorInstance)

          {

             m_pConnection->SqlErrorHandler(p_pErrorInstance, p_ptrCommand->CommandText);

          }

         

          return false;

       }

       return false;

    }

  • Any delete triggers on the affected tables?  Have you tried using sp_who and sp_lock stored procedures to determine what process is causing the locking problems?  You may also want to try running DBCC OPENTRAN(db_id or db_name) to determine if you have  process that is not properly terminating a transaction.  We had an instance here where a stored procedure called from a Power Builder was not completing its transaction and causing all kinds of problems.  Once we identified the sp we were able to modify the Power Builder app to properly use transactions.  No more problems  

  • DELETE from [SYSTEM].[INSTANCE_REF] WHERE INSTANCE_ID = @nINSTANCE_ID OR PARENT_INSTANCE_ID = @nINSTANCE_ID

    I am assuming this is a Self referenced table, Right?

    if the answer to the above question is yes READ ON else disreagrd this

    Can you modify that to ?

    Begin Tran

    DELETE from [SYSTEM].[INSTANCE_REF]

    WHERE PARENT_INSTANCE_ID = @nINSTANCE_ID

    DELETE from [SYSTEM].[INSTANCE_REF]

    WHERE  INSTANCE_ID = @nINSTANCE_ID

    Commit Tran

    Of Course you can add error checking afterwards

    2. Can you use a different field to perform the delete?

     


    * Noel

  • I would consider doing what Noel suggests but I would do it explicitly in 2 transactions.  I believe that would reduce the locks being held.  I would have to guess that in your original sp SQL is locking based on BOTH criteria and holding those locks until both deletes are finished.  By using Noel's code and 2 transactions I think you will release locks based on the PARENT_INSTANCE_ID before locking based on INSTANCE_ID.

  • Jack,

    You  got the Idea


    * Noel

  • This may be a bit of a newbie type of answer, but from looking at the discussion up to this point, it sounds like the table is self-referencing. Do you have an index on the parent instance ID column? If not, wouldn't SQL Server have to scan the whole table to locate the affected rows? And if this were the case, a production system (presumably holding more data than the test system) might suffer more from the lack of an index. Is it possible that this sort of table scan scenario might be interpreted as a deadlock?

    Again, I am far from an expert, so take this with a few grains of salt.

     

  • Yes, the table _is_ self-referencing and adding an index to the parent-instance ID column reduced the deadlock problem but did not eliminate the problem completely on the 2-processor production system server.

  • Let's hope that we got the idea also

    We will check all of our currently used stored procedures and see where we can apply the principle of using _short_ transactions in order to reduce locking conflicts.

    Noel, what do you mean by "using another field for the delete"?

    Cheers,

       Stefan

  • Stefan.

    I had some problems like those a long time ago. It's a shame I can't remember what was the solution.

    What I meant by using another field is:

    Suppose you can maintain a metadata field in which you keep a sort of hierarchy information (if you are interested read this; that column helps A LOT when you want something like all childs from a parent down, etc)

    When you are deleting from a self referencing table because DRI has to be maintained  you are supposed to delete childs First and then the parent and for every statement that is checked . I believe I created an index on that metadata column and it was fast enough not error out

    HTH

     


    * Noel

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply