March 22, 2010 at 9:07 am
I did a bad thing, I think.
Quick background: In order to populate a particular report addressing a variety of domain-specific exceptions in a manufacturing "work in process" business report based on a 3rd party ERP system database, I have a convoluted extraction and transformation of data that takes 15 steps, and about 6 minutes to run. My .NET client throws COM exceptions after some period, so I want this extraction to occur on demand, but asynchronously from the client. (f I choose a certain table from the output, with the client truncating it first, then when the table has records, I know the work is done.)
My big idea: the client can set a bit on a table with an update trigger, and the trigger would launch the stored procedure. (I suppose in hindsight, having a DTS or SSIS package pick up the process responsibility would be better... but that doesn't help today's issue)
It turns out, the client waits for the trigger to finish operation after all. After the .NET client threw it's exception, in a fit of impatience, I found my process in Enterprise Manager's process info and did right-click "End Process".
That was Friday afternoon; it's still present and appears to be active, though I suspect it's simply a zombie not doing anything. The Status says "runnable", the command states "KILLED/ROLLBACK", it's not waiting, and it has nonzero CPU, Physical I/O and Memory Usage statistics. If I view the command itself (right click the process, select Properties), it's the update "UPDATE processflag set flag=1".
This is on a SQL Sever 2000 Enterprise Edition version 8.00.194.
The following steps do not work:
If I try Detach database - I click Clear connections, then OK, then Error 3701, cannot detach ... it is currently in use .
If I try Take database offline - Error 5070 - Database state cannot be changed while other users are using...
If I try Database>Properties>Options>Restrict Access>Single User, same Error 5070 as try to take offline.
Any subsidiary task like deactivating or modifying the trigger, or querying the updated table, ends up locked on the zombie process, but can be killed without issue.
A google search on the terms "sql server process hang rollback an update trigger" and related search categories found nothing on SQL Server Central that directly addressed this situation, while it did find several other sites I haven't used before with two suggesting that the resolution is rebooting the server. If I was sure this would work and was the only way, I could arrange downtime for the server in off hours; I wouldn't want to do this if SQL Server is going to try to get back into this state after the reboot.
Do any of the experts here have other ideas (or can confirm that a soft or hard reboot will get this process off the list)?
Also, in general, and going back to the issue that led me down this rabbit-hole, what's the best way to "fire and forget" a set of processing as an asynchronous process from a client? Is having a DTS or SSIS package polling for work to do the best approach?
Thanks,
Keith
March 22, 2010 at 9:26 am
Rebooting the server will either result in it trying to finish the rollback, or labelling the database Suspect if it simply cannot finish the rollback. It probably won't solve what you're trying to fix. There's a slight chance that something is hung up in execution, in which case rebooting will fix it. But the probability of it causing even bigger problems is higher, so only do it as a last resort.
If you do have to do a forced reboot, you may have to recover the database from backups if it can't bring it back online. Do you have good backups and can you recover from them if you have to?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 22, 2010 at 9:35 am
GSquared,
Thank you for your advice and clarifying questions.
Fortunately, I happened to set up this particular database new for this report; it consists of working tables to support the data transformation, and some configured business-rule values that I have no problem with recreating if needed. It ends up selecting values from other databases, but all inserting and updating occurs within it's own tables. Heck, if I could drop the whole database and recreate, that would be fine.
So, if I lose this database, I don't lose anything irreplaceable. If the database comes up suspect, will SQL Server let me drop it or detach it?
If SQL restores the zombie process as a zombie after restart, would it be an even more bad thing to move the database files while the SQL Service is not running?
Thanks again,
Keith
March 22, 2010 at 9:53 am
If that's the case, that this can be lost without hurting the business at this point, then make sure your other databases on that server are all both currently okay and backed up, then go ahead and try a hard-kill if you need to (forced shutdown on the service). That will kill the transaction, and will probably result in having to drop and replace the database.
DO NOT DO THIS IF THERE ARE ACTIVE TRANSACTIONS IN ANY OF YOUR OTHER DATABASES!!!!!!
It's been a long time since I worked with SQL 2000, so you might want to get advice from someone who's worked with it more recently. From what I remember, you should be able to try doing a standard restart on the SQL service (from Windows Server Manager). That should result in a clean shut-down and restart. Try that.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 22, 2010 at 4:47 pm
Have you tried; KILL <SPID> WITH STATUSONLY to see if the transaction is 100% rolled back?
If it is then it's been my experience that the hung SPID is waiting to report back to the client but the connection is gone and so the SPID waits. Normally once it's rolled back the SPID will not block anything and a simple restart of the SQL Server Service during your maintenance window will clear the SPID.
However, if you restart the service and the SPID is not fully rolled back you will lose visibility into the SPID as it will begin rolling back under a system SPID. In this case it may cause your db to come up "In Recovery" until the SPID is rolled back. You can look in the application log of the server to see the estimated time remaining to recovery the db.
March 23, 2010 at 12:26 pm
EDogg,
Thanks for that idea. Kill 68 with statusonly reports:
SPID 68: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds.
Either I assume that the 0% is an edge case, and represents completion; or I dunno what happens after the service gets restarted. At least if I lose this particular database, no permanent harm to the business.
After I get my downtime (probably this weekend), I'll post a followup to this thread with observations.
Thanks again Edogg and GSquared for your advice!
Keith
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply