March 19, 2012 at 3:36 pm
Hello,
I inherited a couple of sql 2005 boxes that run jobs through the night. Actually, there are lots of jobs and no wiggle room time-wise. Depending on the process involved a failure can cause a significant delay. The server team is planning to move domain controllers soon and I am trying to assess the risk of failure and what is more likely to be the cause if it happens.
The service accounts for sql agent are configured with an AD service account for each server. In addition the same accounts exist on multiple servers and are used for running SQL Agent Jobs, SSIS packages, linked servers and probably a few other things that I don't know about yet. (I would rather not get into the wisdom of this as it wasn't my design and it's water under the bridge.:rolleyes:)
I understand that AD changes propagate but I believe this takes some time. A few weeks ago we had some unexplained job failures and SQL Agent hung. I know they were doing something with domain controllers but I did not know about it until after the fact. I read that AD changes can be a cause sometimes (along with a whole mess of other things) but I don't know if that actually caused those particular errors. That is the reason for my concern now.
If I understood >where< the risk is likely to occur I would be better prepared to mitigate..(maybe?) I least I might be better able to explain it. So my question to experts: Is my risk greater with the jobs that connect to other servers with said AD account (like say a linked server or a job) or is the fact that the ad account is running the SQL Agent service the bigger problem?
I am a little out of my element with AD/Domain controllers so please let me know if I left anything out.
Thanks!
March 19, 2012 at 4:13 pm
It is a very normal setup to use AD accounts when a domain is available. Could you expound on the errors you got?
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 19, 2012 at 4:58 pm
On one evening the job failed with a login error (was unavailable). I don't have the exact error anymore. We have gotten this before but generally it is because a database isn't available and this wasn't the case( there is a drop/restore of a key db that happens and it has been delayed before.)
The other error wasn't an error per se. Two jobs hung. One finished the step it was on - an SSIS series of packages but didn't do the next step which was a SQL step. The other looked like it stopped at a particular step but apparently kept going because I saw evidence that next steps finished. I did not get an error for either one. Nor was there any clue in event logs. If you tried to stop the job you got an error. I saw no relevant open connections in SQL. Bouncing the server cleared them in history.
In both cases I would chalk it up to a fluke under other circumstances but the timing with the DC work makes them suspicious. With the second error the data was delayed by several hours which is unacceptable. Plus these are 3 am errors which is a bummer.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply