What would keep scheduled jobs from running?

Question

What would keep scheduled jobs from running?

WayneS

SSC Guru

Points: 95461
More actions
July 14, 2010 at 11:46 am

#224144

I have an issue on one server where the scheduled jobs are not running when scheduled.
This server is an A/P cluster, and it's currently running on the secondary node.
SQL Server Agent service is running, under a domain service account.
Jobs CAN be run manually; they just aren't running when scheduled.
I haven't been able to determine why this is happening. A google search isn't helping. Any ideas of what to look into for this?
Thanks!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 736760 More actions · Answer 1

When you say secondary node, I'm not sure what you mean. In A/P, one node is live as the active node. Doesn't matter which one it is.

Did you have a failover? If you connect to the Virtual clustered node, do you see the jobs/agent/service accounts the same as you might see them from just connecting to the node?

WayneS SSC Guru Points: 95461 More actions · Answer 2

Did you have a failover?

Yes, The cluster did failover.

If you connect to the Virtual clustered node, do you see the jobs/agent/service accounts the same as you might see them from just connecting to the node?

Yes. Everything looks and appears to be up and running... but the scheduled jobs just are not running from the currently active node, and haven't since the cluster failed over.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Steve Jones - SSC Editor SSC Guru Points: 736760 More actions · Answer 3

Interesting. Have you checked SP_who2 and do you see the SQL Agent connected?

Anything in the Agent log?

It almost sounds like the Agent didn't fail over. Have you tried restarting the Agent service from SSMS (as connected to the virtual server)?

WayneS SSC Guru Points: 95461 More actions · Answer 4

Well, cluster administrator is saying that the sql agent is running on the active node... but let me try restarting it to see how that affects things.

I also just found out that at least one job (runs a pkg), when manually starts, doesn't actually run.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Steve Jones - SSC Editor SSC Guru Points: 736760 More actions · Answer 5

This apply? http://blogs.msdn.com/b/karthick_pk/archive/2009/01/14/unable-to-start-sqlserver-agent-resource-on-cluster-after-upgrading-to-9-00-3186-or-higher.aspx

homebrew01 SSC Guru Points: 55549 More actions · Answer 6

Did the schedules within the job get disabled somehow ?

WayneS SSC Guru Points: 95461 More actions · Answer 7

Well, I'm usually loath to go around restarting things, but stopping and restarting the SQL Server Agent from cluster administrator fixed this. I'd really like to understand a bit more what was going on, but it has manually run the one I was having trouble getting to run manually, and it ran one with a schedule on it at it's assigned time.

Thanks for your help!

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Steve Jones - SSC Editor SSC Guru Points: 736760 More actions · Answer 8

Thanks for the update. I suspect it was a transient error where somehow the Agent didn't move over and properly connect from the passive node.

If you can schedule downtime, or get patches ready at some point, I'd schedule a job to execute in 10 minutes, then failover and see if it makes it.

WayneS SSC Guru Points: 95461 More actions · Answer 9

This particular cluster is a "test" cluster - the first point in our migration path where a cluster is utilized. I need to fail it back to it's normal mode anyway, so I'll test that out.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 10

Well, the scheduled jobs still aren't running from the active node (which is normally the passive node - active after a failover). I'm pretty sure that if I fail it over to the node that it normally runs off of, that it will work correctly. But I'd like to figure out what is going on with this node where scheduled jobs aren't running.

Since this is a test cluster, things aren't as crucial as they would be otherwise, and I can take the time to figure out what is going wrong. Getting this knowledge from the test cluster could prove to be beneficial in the future. At least jobs are running when being kicked off manually.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Steve Jones - SSC Editor SSC Guru Points: 736760 More actions · Answer 11

I'd would be very curious if this worked when you failed back.

So, just to make it clear.

NodeA - Normally active node

NodeB - Normally passive node

- Jobs run on NodeA

- Failover to NodeB, jobs don't run

- Failback to NodeA - jobs run

I wonder if this is what works, or have you tested it. I was thinking that it was the change that caused issues, but it could be the node. Might be an interesting bug if this is not what happens. If it is, I'd learn towards a setup error somewhere.

WayneS SSC Guru Points: 95461 More actions · Answer 12

Steve Jones - Editor (7/15/2010)
I'd would be very curious if this worked when you failed back.
So, just to make it clear.
NodeA - Normally active node
NodeB - Normally passive node
- Jobs run on NodeA
- Failover to NodeB, jobs don't run
- Failback to NodeA - jobs run

This is correct

I wonder if this is what works, or have you tested it. I was thinking that it was the change that caused issues, but it could be the node. Might be an interesting bug if this is not what happens. If it is, I'd learn towards a setup error somewhere.

I think I'm going to go ahead and fail it back to NodeA, and see how things work.

Then back to NodeB, and check again.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 13

Okay, back to this issue.

When the cluster is running off of Node A (the normally active node), scheduled jobs run just fine.

When the cluster is running off of Node B (the normally passive node), scheduled jobs do not run at all.

I have checked that all of the services are running under the same domain login on both nodes.

The jobs are enabled, and the schedules within the jobs are enabled.

Any other ideas of things to check?

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Steve Jones - SSC Editor SSC Guru Points: 736760 More actions · Answer 14

No errors in the jobs? Is it that they just don't run?

Is it all jobs? Can you schedule a job from the passive node and will it run?

Is the Agent service running? Is there a connection in SQL Server from the Agent?

Sorry if you've answered, I didn't see these covered.

It seems as though you have some issue with connecting/reconnecting. If you check the actual installation of the Agent on node B, are you sure that the Agent is setup and working for that particular node? Has the correct login?

I'm semi-stumped here. I think you might need to call PSS to diagnose this, though I might schedule this for a time when you can bounce the system if needed.