February 23, 2011 at 7:16 am
I hope that somebody will be able to help me analyze this problem.
We recently had a "deadlock" scenario running our ERP applications (JDE E1) on SQL Server 2005.
We found that we had a large number of blocks that had accumulated over a 16 hour period (overnight).
There was a single blocking chain, whose head process was in "sleeping" mode (AWAITING COMMAND).
In an attempt to analyze the blocking issue, I tried using the dm_tran_locks DMV.
However I was surprised to find that the DMV was populated with 21 rows ALL of which had request_status WAIT but no rows with GRANT status.
How can this be the case? If no locks have been granted, what are the lock requests waiting for?
Have I totally misunderstood the dm_tran_locks DMV?
Can anybody help?
Glenn
February 23, 2011 at 8:27 am
When you say deadlock situation, do you mean that you hit a deadlock error with one process chosen as a deadlock victim?
Or do you mean that you had a blocking situation that didn't clear? Blocking and deadlocking are very, very different. It's always best to be clear which one we're talking about.
As far as sys.dm_tran_locks goes, you should be seeing actual locks as well as waiting locks. If all you see is waiting, you should determine what is blocking all locks from being accepted by the system. Take a look at sys.dm_exec_requests and see if you can identify the head of the blocking chain. You can then determine what that process is holding locks on, what it's doing, what queries it's running, etc., using the other DMVs.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 23, 2011 at 8:38 am
Hi Grant
Thanks for the reply. I was using term "deadlock" loosely. There was no internal deadlock detected by SQL Server but we did indeed have a blocking chain and the head of the blocking chain was an Application process with status "sleeping" (AWAITING COMMAND). I assumed that we were seeing "Database-Application Deadlock", which cannot be spotted internally by SQL Server.
But what was really puzzling was the the head of the blocking chain (I was easily able to identify the process) held no locks (according to the dm_tran_locks DMV) - nor did any other process - all entries in the DMV were WAITing - no record had status GRANT. How is this possible?
Regards
Glenn
February 23, 2011 at 9:21 am
glennhill (2/23/2011)
Hi GrantThanks for the reply. I was using term "deadlock" loosely. There was no internal deadlock detected by SQL Server but we did indeed have a blocking chain and the head of the blocking chain was an Application process with status "sleeping" (AWAITING COMMAND). I assumed that we were seeing "Database-Application Deadlock", which cannot be spotted internally by SQL Server.
But what was really puzzling was the the head of the blocking chain (I was easily able to identify the process) held no locks (according to the dm_tran_locks DMV) - nor did any other process - all entries in the DMV were WAITing - no record had status GRANT. How is this possible?
Regards
Glenn
I'm not 100% certain how that would occur, but if that process was waiting, it too could be blocking the others. It's possible it was a parallel process and one part was waiting on the rest... not sure. But looking at sys.dm_exec_requests would tell you what the first process is waiting on, or possibly you'd need to join to sys.dm_os_waiting_tasks to see if it was a parallelism issue, I find that shows you blocking resources much better anyway. One way or the other, the head of the chain was holding locks.
BTW, that's still not a deadlock. Deadlock is very, very explicit within SQL Server. You're still talking about a blocking process. See this section of BOL.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 24, 2011 at 1:56 am
Thanks Grant.
Some details about the process at the head of the chain:-
- According to sysprocesses, it is not being blocked by anything (blocked=0).
- Its (status,command,lastwaittype) is ('sleeping', 'AWAITING COMMAND', 'MISCELLANEOUS').
- It doesn't appear in dm_os_waiting_tasks.
- It has no lock records (either Granted or Waiting) in dm_tran_locks.
Regards
Glenn
February 24, 2011 at 2:13 am
glennhill (2/23/2011)
However I was surprised to find that the DMV was populated with 21 rows ALL of which had request_status WAIT but no rows with GRANT status.
What resource were the requests waiting on?
February 24, 2011 at 2:18 am
The head of the blocking chain is not waiting on any resource - i.e. waitresource in sysprocesses is blank. The process is sleeping, awaiting command.
February 24, 2011 at 2:30 am
glennhill (2/24/2011)
The head of the blocking chain is not waiting on any resource - i.e. waitresource in sysprocesses is blank. The process is sleeping, awaiting command.
You said the 21 rows from the DMV had a status of WAIT. What resource were they waiting on? The DMV rows with a WAIT status usually show the resource each request is waiting on in other columns from the DMV.
February 24, 2011 at 3:32 am
Sorry, I misunderstood your question.
There are ...
7 PAGE waits
14 KEY waits
Not easy to map back this back to object names. The 'resource_associated_entity_id's no longer exist (as hobt_id in sys.partitions).
(I'm guessing this is because we have rebuilt all indexes since the problem occurred).
If anyone knows any other way to track back to the object, then input would be gratefully received.
But this is slightly off track from the original post.
My real question is how can the the head of the blocking chain be blocking anything when there do not appear to be a locks granted (or at least none recorded in dm_tran_locks)?
Or to put it another way, why do we have WAITing locks when there is no record of any GRANTed locks?
February 24, 2011 at 3:40 am
glennhill (2/24/2011)
Not easy to map back this back to object names. The 'resource_associated_entity_id's no longer exist (as hobt_id in sys.partitions).
Ok, no problem, but the full output from the locks DMV would have made it easier to understand some things that it is difficult to ask a direction question about. Never mind.
My real question is how can the the head of the blocking chain be blocking anything when there do not appear to be a locks granted (or at least none recorded in dm_tran_locks)?
The simple answer is that queries can wait on things other than locks. Memory grants, worker threads, and undetected parallel deadlocks are some examples. There are also circumstances where sys.dm_tran_locks may not be able to show full information because it is not practical for the query behind the DMV to determine what resource is being locked.
February 24, 2011 at 3:49 am
Thanks again for the quick reply.
I do have the full output from dm_tran_locks. But my suspicion is that the 'hobt_id's have changed in sys.partitions due to our index rebuild.
I still find it odd that the dm_tran_locks is specifically saying that it is waiting for page (and key) locks when there is no evidence that a page lock (or any other lock) has been granted to any other resource. If we can't entirely trust the dm_tran_locks DMV we will find it difficult to get to the bottom of this problem, which occurs intermittently.
Can you offer any other advice on how we should debug this problem next time it occurs?
Or do you think this is a case of getting Microsoft support?
Glenn
February 24, 2011 at 4:01 am
glennhill (2/24/2011)
I do have the full output from dm_tran_locks. But my suspicion is that the 'hobt_id's have changed in sys.partitions due to our index rebuild.
Yes, I read and understood that. Thing is, only you can see the full output - it might contain some clues (notwithstanding the changed HoBts). You have to understand that we cannot see your system, the blocking chain you followed, or the sysprocesses information.
I still find it odd that the dm_tran_locks is specifically saying that it is waiting for page (and key) locks when there is no evidence that a page lock (or any other lock) has been granted to any other resource. If we can't entirely trust the dm_tran_locks DMV we will find it difficult to get to the bottom of this problem, which occurs intermittently.
It's not that you can't trust it exactly, but you do have to bear in mind the limitations. The DMV does not provide a transactionally-consistent snapshot for example, so rapidly changing locks can produce an inconsistent-looking view of things. Without more information, it's tough to speculate exactly why you saw what you saw.
February 24, 2011 at 4:03 am
What other information would be helpful?
February 24, 2011 at 4:07 am
glennhill (2/24/2011)
Can you offer any other advice on how we should debug this problem next time it occurs?
I would look not just at sys.dm_tran_locks, but also at sys.dm_os_waiting_tasks. I would check what each connection, request, and execution context is doing, and what it is waiting on (whether it is a lock or not). I would look at the DMVs several times to see if the blocking were static, or whether things were making progress but resulting in a rolling-block. I probably wouldn't rely on sysprocesses much, because it is a backward-compatibility view with some approximate behaviours.
February 24, 2011 at 4:28 am
I do have a snapshot of dm_os_waiting_tasks at the time of the problem. The wait_type is LOCK for all 21 rows (LOCK_M_S, LOCK_M_U or LOCK_M_X).
The wait durations of all (8) requests that are directly waiting on the the head of the blocking chain were between 8 and 15 hours.
Not sure what the 'exec_context_id' is meant to tell us (BOL is not helpful) but the value is 0 in six cases, 10 and 14 in the other two.
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply