July 14, 2010 at 11:46 am
I have an issue on one server where the scheduled jobs are not running when scheduled.
This server is an A/P cluster, and it's currently running on the secondary node.
SQL Server Agent service is running, under a domain service account.
Jobs CAN be run manually; they just aren't running when scheduled.
I haven't been able to determine why this is happening. A google search isn't helping. Any ideas of what to look into for this?
Thanks!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 14, 2010 at 11:51 am
When you say secondary node, I'm not sure what you mean. In A/P, one node is live as the active node. Doesn't matter which one it is.
Did you have a failover? If you connect to the Virtual clustered node, do you see the jobs/agent/service accounts the same as you might see them from just connecting to the node?
July 14, 2010 at 12:05 pm
Did you have a failover?
Yes, The cluster did failover.
If you connect to the Virtual clustered node, do you see the jobs/agent/service accounts the same as you might see them from just connecting to the node?
Yes. Everything looks and appears to be up and running... but the scheduled jobs just are not running from the currently active node, and haven't since the cluster failed over.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 14, 2010 at 12:30 pm
Interesting. Have you checked SP_who2 and do you see the SQL Agent connected?
Anything in the Agent log?
It almost sounds like the Agent didn't fail over. Have you tried restarting the Agent service from SSMS (as connected to the virtual server)?
July 14, 2010 at 1:39 pm
Well, cluster administrator is saying that the sql agent is running on the active node... but let me try restarting it to see how that affects things.
I also just found out that at least one job (runs a pkg), when manually starts, doesn't actually run.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 14, 2010 at 1:54 pm
Did the schedules within the job get disabled somehow ?
July 14, 2010 at 2:53 pm
Well, I'm usually loath to go around restarting things, but stopping and restarting the SQL Server Agent from cluster administrator fixed this. I'd really like to understand a bit more what was going on, but it has manually run the one I was having trouble getting to run manually, and it ran one with a schedule on it at it's assigned time.
Thanks for your help!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 14, 2010 at 3:09 pm
Thanks for the update. I suspect it was a transient error where somehow the Agent didn't move over and properly connect from the passive node.
If you can schedule downtime, or get patches ready at some point, I'd schedule a job to execute in 10 minutes, then failover and see if it makes it.
July 14, 2010 at 3:50 pm
This particular cluster is a "test" cluster - the first point in our migration path where a cluster is utilized. I need to fail it back to it's normal mode anyway, so I'll test that out.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 15, 2010 at 6:28 am
Well, the scheduled jobs still aren't running from the active node (which is normally the passive node - active after a failover). I'm pretty sure that if I fail it over to the node that it normally runs off of, that it will work correctly. But I'd like to figure out what is going on with this node where scheduled jobs aren't running.
Since this is a test cluster, things aren't as crucial as they would be otherwise, and I can take the time to figure out what is going wrong. Getting this knowledge from the test cluster could prove to be beneficial in the future. At least jobs are running when being kicked off manually.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 15, 2010 at 9:00 am
I'd would be very curious if this worked when you failed back.
So, just to make it clear.
NodeA - Normally active node
NodeB - Normally passive node
- Jobs run on NodeA
- Failover to NodeB, jobs don't run
- Failback to NodeA - jobs run
I wonder if this is what works, or have you tested it. I was thinking that it was the change that caused issues, but it could be the node. Might be an interesting bug if this is not what happens. If it is, I'd learn towards a setup error somewhere.
July 15, 2010 at 9:32 am
Steve Jones - Editor (7/15/2010)
I'd would be very curious if this worked when you failed back.So, just to make it clear.
NodeA - Normally active node
NodeB - Normally passive node
- Jobs run on NodeA
- Failover to NodeB, jobs don't run
- Failback to NodeA - jobs run
This is correct
I wonder if this is what works, or have you tested it. I was thinking that it was the change that caused issues, but it could be the node. Might be an interesting bug if this is not what happens. If it is, I'd learn towards a setup error somewhere.
I think I'm going to go ahead and fail it back to NodeA, and see how things work.
Then back to NodeB, and check again.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 28, 2010 at 7:10 am
Okay, back to this issue.
When the cluster is running off of Node A (the normally active node), scheduled jobs run just fine.
When the cluster is running off of Node B (the normally passive node), scheduled jobs do not run at all.
I have checked that all of the services are running under the same domain login on both nodes.
The jobs are enabled, and the schedules within the jobs are enabled.
Any other ideas of things to check?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 28, 2010 at 9:04 am
No errors in the jobs? Is it that they just don't run?
Is it all jobs? Can you schedule a job from the passive node and will it run?
Is the Agent service running? Is there a connection in SQL Server from the Agent?
Sorry if you've answered, I didn't see these covered.
It seems as though you have some issue with connecting/reconnecting. If you check the actual installation of the Agent on node B, are you sure that the Agent is setup and working for that particular node? Has the correct login?
I'm semi-stumped here. I think you might need to call PSS to diagnose this, though I might schedule this for a time when you can bounce the system if needed.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply