September 8, 2009 at 6:47 pm
Hello, I was hoping somebody could confirm that the below is by SQL 2005 design.
A large update query was being run on a DB and the server crashed for other reasons.
When the server started up the User Database went into recovery state. Both SQL and SQL Agent servers appeared running in config manager.
It was possible to connect to all other User databases and do queries etc.
The issue was that no SQL Agent tasks were able to be done while the User database was being recoverd -- note this is not MSDB but a plain user database.
I could not create any SQL jobs, no scheduled jobs would run. The error would be like "this action can not be completed until SQL Agent is running" -- but it was running.
My guess is all services are OK but the SQL Server won't let SQL Agent actually do anything until the recovery was complete.
This was a pain because the recovery took 5 hours to complete and no scheduled jobs could run.
Could somebody confirm that this is normal and there was nothing that I could have done except wait.
thanks
September 8, 2009 at 7:22 pm
Yes and no.
No - SQL Agent is designed to 'lock up' until all user databases are recovered. This would not work very well in a log-shipping scenario for instance 🙂
My guess is that there was a cross-database transaction in progress at the time of the crash, which involved some resource in MSDB which SQL Agent needs in order to start or run jobs. Locks can be held during recovery to protect consistency until the transaction outcome is resolved by the normal process of redo and undo.
That is my guess - no more.
I am more intrigued by the fact that recovery took 5 hours - can you explain more about that?
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
September 8, 2009 at 7:43 pm
Thanks for the reply.
The query was on three tables in a DW and in only the one database.
It was running for over 5 hours before the crash so I can only assume it would take about that long to roll back!!!
I was watching the recovery % completed go up in the log and it was no go until the % reached 100 then maybe 10 more minutes to complete after than. The "recovery" status went off in the user database and full functionality was available again in SQL Agent.
I guess if the server had not crashed and the transaction has just rolled back then it still might have taken the same rollback time but SQL agent would not be affected i.e the database does not go into "recovery" status whenever a rollback occurs, it just might be locked etc
September 8, 2009 at 8:16 pm
It was running for five hours inside a single transaction? Really?! I guess you have your reasons, but that sounds quite scary.
Maybe you'll never know exactly what happened to the agent, but you could monitor locks next time the 5-hour process runs to see if that gives any clues. I think I would be tempted to spent effort looking at ways to make the 5-hour process faster and more easily recoverable.
If you want to share details about what it does, I'm sure I'm not the only one who would be interested...?
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply