I had an interesting problem recently I thought I'd share with you, might save you a headache one day. It started about 7am, I was just finishing up some quiet time tasks (I start at 5:30) when someone came by to tell me that a job had not completed. They left to run the process manually, I decided to take a look. Processes do fail occasionally and this one had a history of being, shall we say, finicky. When I checked the job history, I got this message:
Strange. I look to see if the agent is running, it shows up as 'starting'. Try to stop it, won't stop. Try to stop the SQL service, it won't stop either because the agent won't stop. Interesting, if annoying. I reboot the box, go back to checking other things.
Comes back up, same problem. Bad. This is actually the desktop edition running on a workstation, we use it to kick off a bunch of small but important tasks that run throughout the day. I know that they loaded a new version of the failed process the day before so my first thought is that they managed to overwrite something they shouldn't have. Our standard practice is that everything has an install package (.msi), so I check it to see, nothing looks strange.
Check the event log, error logs, etc, still not seeing anything. I see one db that is set to autoclose, I fix that, but I know that's not the problem. Check the web, see a note that running in fiber mode can cause issues. I doubt that it would be enabled, but I check anyway - no, not enabled. Seems to be a pretty common error judging by the number of hits on Google.
I'm still guessing the install package broke something, so I decide to run SP3 again, doesn't take long, maybe I'll luck out. No, not that lucky. At this point I'm an hour into it. I can reinstall SQL, continue to troubleshoot, or call for help. We've got some free calls available to us with PSS, so it's an easy choice to call.
PSS calls are almost always a good learning experience. I like to thing I'm a fair troubleshooter, but they do it every day and they focus on only a few segments of the product. We go through the details of what I've done so far, they have me run sp_configure again to check the fiber mode (their first guess as well). Next thing is to check the Outlook profile, apparently there have been cases where a corrupt profile would cause the agent to hang. Profile seems ok, I can open Outlook, send email. Delete the profile anyway. No change.
Reboot again. This time they have me go to a command prompt and run this:
C:\Program Files\Microsoft SQL Server\MSSQL\Binn>sqlagent -c -v > c:\agent.txt
First time, we do it, nothing happens. It creates a zero byte file. For those of you not familiar with the parameters, here is what they mean:
-c Indicates that SQL Server Agent is running from the command prompt and is independent of the Windows NT Service Control Manager. When -c is used, SQL Server Agent cannot be controlled from either the Services application in Control Panel or SQL Server Service Manager.
-v Indicates that SQL Server Agent runs in verbose mode and writes diagnostic information to the command-prompt window. The diagnostic information is the same as the information written to the SQL Server Agent error log.
> C:\Agent.txt Writes the output to a text file using the redirection operator.
Pretty strange. We try again leaving off the redirect to a file, then we get this:
Aha! That's why all the problems. When the agent runs, the message box is popping, but there is no way to see it. Running at the command line finally revealed it. As soon as I clicked ok, the agent started and jobs started running. As you can tell from the path, it's a resource file, shouldn't be a big deal. I grab a copy from a server, paste into the folder. Stop and start the agent, no problem. Elapsed time, about 3 hours total, an hour or so of that with PSS.
How did the file get deleted? Good question and one I haven't run down yet. I still suspect the install package, but testing that theory will have to wait a few days.
Will you ever encounter the same problem? Hopefully not. Even if you don't, maybe the idea of running agent from the command line will help you one day. Definitely you need to have an idea of when to call PSS. Steve Jones wrote Should you call Microsoft for support? (http://www.sqlservercentral.com/columnists/sjones/shouldyoucallmicrosoftforsupport.asp), you might want to look that over while you're thinking about it.