I recently came across a really odd issue with the SQL Agent, there were two agent jobs attached to one schedule one of the jobs was enabled and the other was not enabled – all pretty normal at this point but here is where it got weird.
The disabled job was running on schedule regardless!
In a previous blog post: Duplicate Agent jobs – A good reason not to meddle with Msdb I explained a situation where someone was updating msdb tables manually rather than using the supplied system stored procedures such as msdb.dbo.sp_update_job, It would seem that this was not the only occasion where I would find myself in the midst of the meddlers’ medley.
This time around the meddler decided to disable the job using an update statement against the msdb.dbo.sysjobs table setting enabled from 1 to 0, you have no idea how long it took me to work that out!! but this lead me onto discovering more about the SQL server agent and its general behaviour.
Before we delve into some specifics about the Agent keep in mind that:
The SQL Agent works for SQL Server
Below are some key tasks the SQL Agent carries out when it starts up, this list is not exhaustive it is just the key tasks I witnessed when I monitored the activity during startup.
- SQL Agent – subsystems refresher connects to SQL server
- SQL Agent – subsystems refresher forces a refresh of its subsystems.
EXECUTE msdb.dbo.sp_enum_sqlagent_subsystems_internal @syssubsytems_refresh_needed = 1
- SQL Agent – Temporary worker starts the Job activity monitor.
EXECUTE master.dbo.xp_sqlagent_monitor 'START', N'', 1
- SQL Agent – Generic refresher gets the latest job information to load into its cache.
EXECUTE msdb.dbo.sp_sqlagent_refresh_job
- SQL Agent – Generic refresher creates a new job activity row the latest Agent session.
exec sp_executesql N'EXECUTE [msdb].[dbo].[sp_sqlagent_create_jobactivity] @session_id = @P1, @job_id = NULL',N'@P1 int'
- SQL Agent – Generic refresher gets all the schedule info using a query to load into its cache, some examples are below:
SELECT schedule_id, schedule_uid, name, enabled, freq_type, freq_interval, freq_subday_type, freq_subday_interval, freq_relative_interval, freq_recurrence_factor, active_start_date, active_end_date, active_start_time, active_end_time FROM msdb.dbo.sysschedules sch WHERE (EXISTS(SELECT * FROM sysjobschedules as jobsch WHERE sch.schedule_id = jobsch.schedule_id))
SELECT job_id, schedule_id, next_run_date, next_run_time FROM sysjobschedules ORDER BY job_id
- SQLAgent – Generic Refresher updates the next scheduled run time information per job , example:
exec sp_executesql N'EXECUTE [msdb].[dbo].[sp_sqlagent_update_jobactivity_next_scheduled_date] @session_id = @P1, @job_id = @P2, @is_system = @P3, @last_run_date = NULL, @last_run_time = NULL',N'@P1 int,@P2 uniqueidentifier,@P3 int',10027,'B79033DD-99E9-44CF-AC40-72FC3B73FDF1',0
- SQL Agent – Generic refresher gets all alert info to load into its cache.
EXECUTE msdb.dbo.sp_help_alert @order_by = N'event_id DESC, severity ASC, message_id ASC, database_name DESC', @legacy_format = 1
With all this information cached it is now up to SQL to notify the SQL Agent of any changes made to jobs, schedules and alerts via the Execution of supplied system stored procedures such as msdb.dbo.sp_update_job, part of the stored procedures commands is to notify the agent of the changes – remember I said before that the Agent works for SQL? without accurate information the Agent will simply work with the cache it has regardless of what may have been updated by a user within the msdb tables (this only includes the information that is cached by the agent)
Here is an example of how NOT to disable an Agent job
Please do not run this in production this is just for demonstration only
I created a job for testing purposes called ‘Testjob’ oh how original
I set a schedule of every 30 seconds.
Now lets do things the wrong way by updating sysjobs directly (Not using sp_update_job), I am not including the code as I do not want anyone to copy and paste it
I made the update at 21:00 , just to prove to you that SQL has no clue about my update here is a glimpse at the sysjobs table.
SELECT name, date_created, date_modified FROM sysjobs WHERE enabled = 0
The tiny variance between date_created and date_modified is the time it takes between creating the job and the attaching the schedule to the job, these are carried out as two separate statements.
So no mention of me disabling the job at 21:00.
So lets have a look at the job history:
As you can see its still running regardless, that is because I didn’t use the system stored procedures supplied that notifies the Agent of the change, remember earlier I mentioned about the Agent working for SQL?
The Agent hasn’t been instructed by SQL, it hasn’t been notified so as far as the Agent is concerned , nothing has changed and it is to continue with the instructions it had provided from the last refresh.
Here is the RIGHT way to do it:
EXEC msdb.dbo.sp_update_job @job_name = 'TestJob',@enabled = 0;
I ran these statements immediately after the above:
SELECT GETDATE() as DisabledDatetime; SELECT name, date_created, date_modified FROM sysjobs WHERE enabled = 0;
Here are the results:
Here you can see that the DisabledDateTime has been reflected in the date_modified column in sysjobs albeit a few milliseconds difference, so what about the job history, has it stopped running?
It sure has!
And the reason for that is the system stored procedure uses internal commands to notify the SQL agent of the change and in turn the SQL Agent re caches information for the job in my case it executed the following (it uses job_id instead of job_name as this is SQL’s preferred method)
exec sp_executesql N'EXECUTE msdb.dbo.sp_sqlagent_refresh_job @job_id = @P1',N'@P1 uniqueidentifier','3F3F4712-C8E1-4FEE-8468-4BB9F9D097BC'
This procedure returns 21 columns, That’s 21 columns of information that the Agent Caches about a single job the columns are as follows:
job_id
name
enabled
start_step_id
owner
notify_level_eventlog
notify_level_email
notify_level_netsend
notify_level_page
notify_email_operator_id
notify_netsend_operator_id
notify_page_operator_id delete_level
has_step
version_number
last_run_date
last_run_time
originating_server
description
agent_account
is_system
A change to any one of these columns will not be picked up by the agent unless you use the GUI or sp_update_job, its really easy to use and very easy to script out statements for bulk changes if required – for example:
Disable jobs where the job name is LIKE SQLUndercover
SELECT 'EXEC msdb.dbo.sp_update_job @job_name = '''+name+''',@enabled = 0;' FROM msdb.dbo.sysjobs WHERE name LIKE '%SQLUndercover%'
Enable Jobs where the job name is LIKE SQLUndercover
SELECT 'EXEC msdb.dbo.sp_update_job @job_name = '''+name+''',@enabled = 1;' FROM msdb.dbo.sysjobs WHERE name LIKE '%SQLUndercover%'
Of course there are more changes you could make this way it does not have to be exclusive to enabling and disabling.
Thanks for reading