Deadlock Issue encountered while Replication and JobRead oocur simultaneously

Question

Post reply

Deadlock Issue encountered while Replication and JobRead oocur simultaneously

Keerthi Pavan Kumar

SSC Journeyman

Points: 94

March 8, 2006 at 2:24 am

#365457

Hi,

One our Reporting Application has gone to Production last week, and we are encountering the below issue only on Production Environment.

[This is going to be a lengthy mail J ]

Scenario:

Our Server has a Continuous Transactional Replication (Pull) from an Upstream System on 1 Database. This Upstream System does a Delete and Insert of records (5-6 Tables) within a transaction for every 2 Hrs into the Database which eventually gets replicated on subscriber. While this happening couple of our Jobs (Job Span: 3-5Hrs) are trying to read the Data from same database (/Tables) where Insert/Delete is in process.

Issue

In production environment we consistently faced that the Job has failed with below error

Executed as user: PARTNERS\_prtsit1. Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. [SQLSTATE 40001] (Error 1205). The step failed.

We are unable to Repro this scenario in any of the other environments inspite of repeated tries.Server Configuration is provided below in mail.

Investigation Done:

Enabled TraceFlag for Error-1204, 1205 and captured the below log in ServerLog Files for one of the failure.

Date

Source

Message

3/6/06 11:16 PM

spid55

Configuration option 'show advanced options' changed from 1 to 1. Run the RECON

3/6/06 11:16 PM

spid55

Error: 15457

3/6/06 11:05 PM

spid4

ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec0x1D8935F0) Value:0x2af

3/6/06 11:05 PM

spid4

Victim Resource Owner:

3/6/06 11:05 PM

spid4

ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec0x1D8935F0) Value:0x2af

3/6/06 11:05 PM

spid4

Requested By:

3/6/06 11:05 PM

spid4

Input Buf: RPC Event: sp_MSins_Task_Dimension;1

3/6/06 11:05 PM

spid4

SPID: 53 ECID: 0 Statement Type: INSERT Line #: 7

3/6/06 11:05 PM

spid4

Owner:0x205acbc0 Mode: IX Flg:0x0 Ref:2 Life:02000000 SPID:53 ECID:0

3/6/06 11:05 PM

spid4

Grant List 3::

3/6/06 11:05 PM

spid4

PAG: 6:1:9041 CleanCnt:2 Mode: IX Flags: 0x2

3/6/06 11:05 PM

spid4

Node:2

3/6/06 11:05 PM

spid4

3/6/06 11:05 PM

spid4

ResType:LockOwner Stype:'OR' Mode: IX SPID:53 ECID:0 Ec0x6973B598) Value:0x20

3/6/06 11:05 PM

spid4

Requested By:

3/6/06 11:05 PM

spid4

Input Buf: Language Event: EXEC Weekly_Tue_NAMERICA

3/6/06 11:05 PM

spid4

SPID: 55 ECID: 0 Statement Type: SELECT INTO Line #: 422

3/6/06 11:05 PM

spid4

Owner:0x2afc0f20 Mode: S Flg:0x0 Ref:1 Life:00000000 SPID:55 ECID:0

3/6/06 11:05 PM

spid4

Grant List 0::

3/6/06 11:05 PM

spid4

TAB: 6:418816554 [] CleanCnt:2 Mode: S Flags: 0x0

3/6/06 11:05 PM

spid4

Node:1

3/6/06 11:05 PM

spid4

3/6/06 11:05 PM

spid4

Wait-for graph

3/6/06 11:05 PM

spid4

3/6/06 11:05 PM

spid4

...

We understood that from the above log that Replication Job has IX lock on Table and our Job has Shared lock on that table.

So SQL Server chose our Process as victim and hence terminated

Questions:

1) Is our understanding on the Log correct??

2) If correct then shouldn’t be process which has shared lock be in “waiting” mode rather than be killed by SQL Server?? ( We simulated on another environment where the query is blocked till Write process got completed). Is there any threshold on wait time??

3) What could be the cause of this behavior on a particular environment?? What is the suggested Fix??

Environment Details:

QA

Microsoft SQL Server 2000 - 8.00.997 (Intel X86)

Dec 17 2004 22:15:27

Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

OS: Windows 2003- SP1 IPAK NT5.25 (V3.0) and SP2

SQL2000: SP3 [IPAKSQL_8010(RTM 8.00.997.SP3a v1.3)

Machine: 8 Processor

SQLServer Processor Setting: All 8 Processors enabled for Usage

Parallelism Option: Use all available Processors

UAT

Microsoft SQL Server 2000 - 8.00.997 (Intel X86)

Dec 17 2004 22:15:27

Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

OS: Windows 2003- SP1 IPAK NT5.25 (V3.0) and SP2

SQL2000: SP3 [IPAKSQL_8010(RTM 8.00.997.SP3a v1.3)

Machine:8 Processor

SQLServer Processor Setting: All 8 Processors enabled for Usage

Parallelism Option: Use all available Processors

Prod

Microsoft SQL Server 2000 - 8.00.2040 (Intel X86)

May 13 2005 18:33:17

Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

OS: Windows 2003- SP1 IPAK NT5.25 (V3.0) and SP2

SQL2000: SP4 [IPAKSQL_8011(RTM 8.00.2040.SP4 v1.1)]

Machine:8 Processor

SQLServer Processor Setting: All 8 Processors enabled for Usage

Parallelism Option: Use all available Processors

Any suggestions would be of great Help. Thank You.

Regards,

Pavan Keerthi

SIT-Report Authoring Test

Microsoft India

Work: +91-040-55113008

Mail:v-pavake@microsoft.com

Kindest Regards,

Keerthi Pavan Kumar

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Mark Yelton SSCommitted Points: 1570 More actions · Answer 1

This might be kind of a "hokey" suggestion but what the heck. For the process doing the read, would it be possible to do a WITH(NOLOCK) on the table to avoid any locking what so ever?

Just a thought.

Mark

umar iqbal SSCommitted Points: 1703 More actions · Answer 2

I am having th e same issue. But its our subscriber where deadlock is happening. Any clue why and how to fix it?

:crazy: :alien:

Umar Iqbal

Erik Bitemo SSCrazy Points: 2062 More actions · Answer 3

Hi,

You have two choices:

1. Use the WITH (NOLOCK) clause mentioned above, this case you'll read out uncommitted transactions as well.

2. Use the WITH (READPAST) clause, this case you'll skip the locked rows. You mentioned that there are insert/delete operations on the table, so this second solution could be better for you since you'll read data which is surely in the database, like this import wouldn't exist.

For Umar: probably the first solution is the better. Your problem is that there's a shared lock normally on selected rows and the distribution agent waits for some rows while already locked others... With replication, you should always be aware of what kind of queries you run against the data while distribution agents run.

HTH,

Erik

-- Erik http://blog.rollback.hu