May 20, 2010 at 4:38 am
Hey
I was wondering if anyone had any idea if there was a way to force a kill command/rolback that has stalled.
We have recently moved our database from SQL 2000 SP4 to SQL 2008 SP1 CU7. We had a system execute a stored procedure that contained a query against a linked server which hung, as a result we issued a Kill and duely waited for the rollback. That was three days ago and the SPID has been sat there with a wait of SOS_SCHEDULER_YIELD (193907554).
MS are refusing to assist us until we have completed their list of things with the only one outstanding being an update stats over the database which is going to take ages due to database size.
Has anyone else had an issue like this and can give some pointers at how to force the kill\rollback through.
Just to add that we are not expereincing heavy CPU (8 physical cores, 16 logical) load which is (as I understand after google/books etc.) the normal cause of this behaviour. The disk subsystem is twiddling its thumbs and we have got loads of memory free (the SQL Instance has 40GB)
Anything would be appreciated.
JQ
May 20, 2010 at 4:54 am
You cannot kill a rollback.
You could try restarting the instance if you think the rollback is stuck. Beware, the rollback will continue after the restart, while SQL's bringing the DB online. If you're running enterprise edition, the DB will be available while the rollback completes, if you're running Standard edition or lower, the DB with be in the recovering state and inaccessible.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 20, 2010 at 4:59 am
Cheers for that, I know I can't kill a rollback. Rather trying to force the damn thing to get a higher priority so that it actually actions. We are indeed running enterprise, however I am reluctant to restart the instance as I really do not want the db to come up in a suspect state.
Cheers Once again
🙂
May 20, 2010 at 6:44 am
No priority options exist. Either wait as it is now, or restart. I suspect something external from SQL's holding it up. Might be worth recycling MSDTC, see if that frees anything up.
DB will come up initially Recovering (not suspect) It'll run through restart-recovery and, since you're on enterprise, will come online after the roll-forward and complete the rollback once the DB is online.
Suspect state is caused by a rollback/rollforward that fails, usually due to damage to log/data file. This rollback hasn't failed, it's just hung. If it had failed, your DB would already be suspect.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 20, 2010 at 7:26 am
Thanks for the insight.
We suspected as much with the threads being hung, we have tried stopping and starting DTC but alas it has not worked.
It is looking more and more likely that we will have to bounce the instance over to the other node to try and clear this.
Many thanks for your advice.
🙂
May 21, 2010 at 5:43 am
Before you initiate fail-over, it might make sense to check a few things to give you some confidence that the ROLLBACK is truly 'stuck' and/or that recovery is not going to take too long.
Check sys.dm_exec_requests to see if the ROLLBACK is making progress or not (percent_complete)
Get the transaction_id that is 'stuck' from sys.dm_tran_session_transactions
Check sys.dm_tran_active_transactions for the state of that transaction_id
Check sys.dm_tran_database_transactions to see how much log has been generated by the transaction
It would be interesting to see the data from those tables, and to know whether the rolling-back transaction is keeping the log active or not. In any case, it makes sense to back up as much of the log as possible just before fail-over.
Paul
May 21, 2010 at 6:11 am
Pual
Those are some useful DMV's, cheers for the pointers.
In response to some of your queries;
Querying dm_exec_requests the current percent complete is sat at 0 and showing an estimated completion time of 0.
Querying dm_tran_active_transactions the current state of the transactions in question is 0 which after looking at BOL indecates that is has not yet initialised. Also the dtc_state is showing as 0 which does not seem to be a value listed in the BOL entry for the DMV too which is a bit strange.
Querying dm_tran_database_transactions shows that there are 0 log bytes used and that the trasnaction state is listed as 3 (BOL says initialised by not generated log records, which would tally). Also interesting is that the begin time is listed as NULL.
Cheers for the pointers, it has given me more avenues to look at. If you have any advice given the results that I have posted it would be appreciated.
JQ
May 21, 2010 at 6:21 am
So, it looks very much as if the transaction never really 'got off the ground' so to speak, and is now in an odd sort of state. My guess is that is good news from your perspective: it's probably not doing any harm as it is (aside from looking odd) and should have no effect on recovery time, if and when you do decide to fail-over.
I would just check DBCC OPENTRAN to check that the session in question is not listed as the oldest active transaction. I'm guessing that it won't be from the information given, and you probably would have mentioned if your active log was 40GB or so 🙂
Talking of active log, try a DBCC SQLPERF(LOGSPACE) to see log statistics for the database, or even DBCC LOGINFO for more detail (status 2 = active). Probably makes most sense to check those just after a successful log backup.
Paul
November 5, 2010 at 2:29 pm
I'm having this exact same issue. We're running Windows Server 2008 R2 64 bit and 64 bit SQL Server 2008 Standard SP1 CU8. We upgraded to CU8 last weekend and began experiencing these symptoms almost immediately. We have several stored procedures that perform updates to a linked server running SQL Server 2000 and this seems to be where the problem lies particularly. The only was to be rid of the processes in Rollback status is to recycle SQL Server.
November 30, 2010 at 5:16 am
We are also experiencing this issue on SQL 2008 R2 x64. The same SPROCs and processes that are hanging on SQL 2008 R2 ran successfully on SQL 2005 x64.
One specific instance was when a LiteSpeed log backup had executed. The server was under heavy load at the time and it caused the external LiteSpeed process to fail. The SPID in SQL had 4 threads and all four were maxing out a single CPU. Since we only have 8 cores, this resulted in a massive performance hit on that server (even in a KILLED/ROLLBACK state).
We applied Cumulative Update 4 to all of our SQL 2008 R2 servers, but that didn't prevent the issue from happening again.
Does anyone know if CU5 (or a standalone patch) for SQL 2008 R2 will address this issue? We obviously are not alone and I'm sure more people will experience this issue as they migrate to SQL 2008 R2.
November 30, 2010 at 5:46 am
Please post new questions in a new thread. Thanks
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply