November 29, 2010 at 10:33 am
We were getting table-exclusive escalation on tables that can never be allowed to be exclusively locked, and that was blocking other queries.
The simple idea was to create a job to make an empty update on these tables and commits every few minutes, so there's something already holding a lock (IX-SHARED on the object) and not giving it up to a TAB-X request.
The code below (table & field names are anonymized) runs for 30 minutes total, looping to commit every 5 minutes. Any failure is trapped & simply rolls back (rolling back effectively nothing). I can't be updating actual rows that could block a valid reader, so the empty update looked the most safe way to go.
Now the problem: Every few weeks I get the job never finishing, the only thing that makes sense is the commit is blocked by other activity. When this happens then all these locks are held throughout the day, and carry into the night to block index maintenance. The job has been getting killed by the time I find out this has all happened, so the details of the actual lock conflicts are gone.
So, without the details of what actual locks existed yesterday, I'm looking for the locking theory that's causing problems. There was no maintenance activity happening during this time (other than maybe stats autoupdating). What kinds of CRUD activity would create locks that can block an IX-SHARED object lock from committing? If I did a rollback rather than commit, does a rollback get any preference in locking/blocking (more preference than a commit)?
Or maybe, is there a simpler and lower-level lock I can take that doesn't lock data, but still prevents escalation to the table?
Set @i=0;
WHILE @i<@LoopMax
BEGIN
Set @DelayTime = DATEADD(second,+@DelaySeconds,GetDate());
BEGIN TRY
BEGIN TRANSACTION;
UPDATE dbo.Table1 set FieldX=FieldX WHERE 0=1;
UPDATE dbo.Table2 set FieldX=FieldX WHERE 0=1;
UPDATE dbo.Table3 set FieldX=FieldX WHERE 0=1;
UPDATE dbo.Table4 set FieldX=FieldX WHERE 0=1;
UPDATE dbo.Table5 set FieldX=FieldX WHERE 0=1;WAITFOR TIME @DelayTime;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
Declare @ErrMsg nvarchar(2000);
Set @ErrMsg = CONVERT(nvarchar(10), ISNULL(ERROR_NUMBER(),0) )
+ ISNULL(ERROR_MESSAGE(),'');
Insert Into dbo.ErrorLog (CreateDate, Source, Message, IsLive)
SelectGetDate() AS CreateDate, 'GSNDBLockMgmt' AS Source,
@ErrMsg AS Message, 1 AS IsLive ;
ROLLBACK TRANSACTION
Print 'catch block';
END CATCH
SET @i=@i+1;
END
November 29, 2010 at 12:51 pm
MSDN Article "How to resolve blocking problems that are caused by lock escalation in SQL Server":
http://support.microsoft.com/kb/323630/en-us
Eddie Wuerch
MCM: SQL
November 29, 2010 at 1:03 pm
This article describes exactly what the code is trying to do (get an IX lock via an empty update), and may have been where the idea came from. This method was to prevent TAB escalation while not getting in the way of user activity.
The article notes this:
This batch does not modify any data or block other queries (unless the other query forces a table lock with the TABLOCK hint or if an administrator has disabled page or row locks by using an sp_indexoption stored procedure).
So is this implying that a TABLOCK can be forced to acquire in spite of the IX already in existence?
The core question is what can happen that wouldn't let an IX-Shared commit - row or page operations shouldn't cause it, it has to be an object-level action like escalation, object maintenance (stats update?), -- or a forced tablock.
November 29, 2010 at 1:12 pm
Tony++ (11/29/2010)
This article describes exactly what the code is trying to do (get an IX lock via an empty update), and may have been where the idea came from. This method was to prevent TAB escalation while not getting in the way of user activity.
Not exactly. If you'll note, the select itself doesn't hold a single row for update/exclusive, just keeps a shared lock open against the table. It will block IX_TAB from becoming X_Tab. Using a select instead of an update can be more stable as well over longer holds. No, I don't know why.
So is this implying that a TABLOCK can be forced to acquire in spite of the IX already in existence?
No, it's stating that if you use the TABLOCK hint in existing queries, this will screw them up.
The core question is what can happen that wouldn't let an IX-Shared commit - row or page operations shouldn't cause it, it has to be an object-level action like escalation, object maintenance (stats update?), -- or a forced tablock.
A few things could happen, but without having access at the time of the error and specific traces going, they're tough at best to track down. A good chunk of the transaction controls are left under the hood. You will probably have more success using the shared lock of the select than the exclusive of the update.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 29, 2010 at 1:34 pm
I'm going for the quick-hit items here. I haven't found the probable smoking gun, but at least the code change gives me the ability to say something was done about the problem... 😉
* I've made the change for select rather than update, thanks for the pointer on that. I can't say that I understand why the select IX should be different from the update IX, but it should be least the same stability, if not better.
* Under penalty of something nasty I haven't thought of yet, nobody gets to end this particular job until they've talked to me. This will let me see what's tangled up in the job before I kill it.
* I'll be checking in on Sunday evenings to see if the job is stuck waiting to commit, at least I can catch it before it impacts the overnight index maintenance.
When I do get this figured out I'll post the answer here. In the meantime I welcome more ideas.
November 29, 2010 at 1:55 pm
Tony++ (11/29/2010)
* I've made the change for select rather than update, thanks for the pointer on that. I can't say that I understand why the select IX should be different from the update IX, but it should be least the same stability, if not better.
The select will open a (shared) s_row/page lock, which will block the IX_tab, to clarify.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 3, 2010 at 7:09 am
It happened again yesterday, and now I'm really puzzled.
The job that started at 8:45AM was still running at 5:30AM this morning. It should have hit its final lock/commit cycle and been done by 9:15AM.
The proc was holding its locks as usual, and had other DML processes changing the data on the tables it's holding the locks on. The proc's sole purpose is to prevent the other DMLs from escalating to table locks, and it did its job there.
However, the lock job didn't hit a deadlock until around 12:20AM this morning, when it deadlocked with the nightly index maintenance job. Even after the deadlock cleared the lock job didn't end, and ***there was nothing blocking it***! I find no reason this should have happened.
So what happened in the 15 hours between 9:15AM and 12:20 if there was no blocking or deadlocking? What can prevent a commit from finishing other than blocking or deadlocking?
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply