After my article publication in SQLServerCentral.com http://www.sqlservercentral.com/columnists/lPeysakhovich/controllingunusuallylongrunningjobs.asp
I received few emails from the readers who informed me about misleading part in the article.
“In addition, changes can be done to help analyze those job steps which are still running but duration of the step is already unusually long. It can be achieved by adding additional criteria to the select statement for the field run_status.
Simply exchange line
and soh.run_status = 1 -- successful jobs
with line
and ( soh.run_status = 1 or soh.run_status = 4 ) -- successful and in
progress jobs “
I am giving my apologies to the public, because I didn’t check this
statement but simply rely on the Microsoft documentation in book on line (BOL).
In reality, SQL Server is not using this flag to measure the job progress. This
is will be one of the many reasons for me to check all the statements and BOL
next time. But for sake of error correction and to finalize the previous article
I would like to show the way of checking a situation to define if job is
running.
In addition, there is an article written by Gregory A. Larsen. http://www.databasejournal.com/features/mssql/article.php/10894_3491201_2
His article shows how to find that any job is running. But to give
the answer "Is job running?" for the particular job the article is
using undocumented stored procedure xp_sqlagent_enum_jobs. Microsoft SQL Server
has always had nifty undocumented functions that people are tempting to use.
However, Microsoft has always recommended not using those functions because they
can change in future releases. Using undocumented features rarely burning DBA.
But with SQL Server 2005 it may be the issue.
The usage of system stored procedure sp_help_job which returns
information about jobs that are used by SQLServerAgent service to perform
automated activities in Microsoft® SQL Server™ has some limitations if
you need to find the answer for the particular job. (Read details in Gregory A.
Larsen article. See the link above). Is there any way to find the
state of a particular job without usage of undocumented stored procedure? This
is the article about.
Let’s assume that a job has few steps. Those steps are written in system
table sysjobsteps in database msdb. When a job is running SQL Server is writing
each ended step into msdb..sysjobhistory. By getting the last step, time, and
status of the step it is possible to get the status of the job. First, we need
to know the last time the job was running. Programmatically, you can analyze the
job schedule, but it is not very reliable way because sometimes the job can be
enforced to run manually. The best way is to look for the keyword '(Job
outcome)' in sysjobhistory table because this is a keyword of the last entered
step for the job in the table and
the entry should have the maximum value in the column instance_id. Instance_id
is numeric column in the table sysjobhistory and gets always (max + 1 ) value.
Next statement will return all the steps the job already finished during the
current run.
STATEMENT 1
select * from dbo.sysjobhistory sjh inner join sysjobs sj on sj.job_id = sjh.job_id where sj.name = 'job1' and sjh.instance_id > (select max(instance_id) from sysjobhistory sjh1 where sjh1.step_name = '(Job outcome)' and sjh1.job_id = sj.job_id )
As always, there are some obstacles and issues. If the statement returns no
rows there are two possible logical outcomes. Or job is done, or job is running
the very first step. Again the dilemma - how to know what is true? In the case you really need to know the answer reliably, my
suggestion would be to add the first
step for a job as a quick running fake step. For example, step will
consists of one statement SELECT
getdate(). This step whether failed or succeeded will go to the next job
step and table
sysjobhistory will get the first step entry indicating that the job is
started. This row will be the outcome of the query above and let us know that
the job is started and running. For all SQL statements below there is an
assumption that all jobs have a fake initial step. Otherwise, the behavior of those
statements may be different if job is running step 1.
Next issue is to get the step where job is right now. Based on the statement
above we know that the job is running and the steps that is successfully
finished or failed. The determination of the next running step is related to the
output values from table sysjobsteps. Based on SQL Server BOL (verified) next
definitions can be made.
[@on_success_action =] success_action
Is the action to perform if the step succeeds. success_action
is tinyint, and can be one of these values.
Value | Description (action) |
1 (default) | Quit with success |
2 | Quit with failure |
3 | Go to next step |
4 | Go to step on_success_step_id |
[@on_success_step_id =] success_step_id
Is the ID of the step in this job to execute if the step
succeeds and success_action is 4. success_step_id is int,
with a default of 0.
[@on_fail_action =] fail_action
Is the action to perform if the step fails. fail_action is
tinyint, and can be one of these values.
Value | Description (action) |
1 | Quit with success |
2 (default) | Quit with failure |
3 | Go to next step |
4 | Go to step on_fail_step_id |
[@on_fail_step_id =] fail_step_id
Is the ID of the step in this job to execute if the step fails and fail_action is 4. fail_step_id is int, with a default of 0.
Next batch allows to return information about the next running step.
BEGIN
declare @step_id int, @run_status int, @next_runstep int, @jobname varchar(128), @jobid uniqueidentifier
declare @step_action int , @next_step_id int
set @jobname = 'job1'
-- get the job id
select @jobid = job_id from sysjobs where name = @jobname
-- Find the last running step
select @step_id = max(step_id) from dbo.sysjobhistory sjh where job_id = @jobid
and sjh.instance_id > (select max(instance_id) from sysjobhistory sjh1
where sjh1.step_name = '(Job outcome)' and sjh1.job_id = @jobid )
-- Get the step status
select @run_status = run_status
from sysjobhistory sjh
where job_id = @jobid and step_id = @step_id
and instance_id > (select max(instance_id)
from sysjobhistory sjh1
where sjh1.step_name = '(Job outcome)' and sjh1.job_id = @jobid)
-- get the action for the last finished step
select @step_action = CASE
WHEN @run_status = 1 THEN on_success_action
ELSE on_fail_action
END ,
@next_step_id = CASE
WHEN @run_status = 1 THEN on_success_step_id
ELSE on_fail_step_id
END
from msdb.dbo.sysjobsteps
where job_id = @job_id and step_id = @step_id
-- Based on the status find the next step job is running
select @next_runstep = CASE
WHEN @step_action in (1,2) THEN @step_id
-- this is the last step
WHEN @step_action = 3 THEN @step_id + 1
-- next step
WHEN @step_action = 4 THEN @next_step_id
-- Go to step on_fail_step_id
ELSE -1
-- unknown step
END
SELECT 'Step number ' + cast(@next_runstep as varchar) + ' is running for the job ' + @jobname
END
Now,
based on the batch above, we can find the actual step job is running and get an
answer if job is really running. It is looks logically to create a stored
procedure if you need the information about the step the job is running.
If
the question is only “Whether the job is running?” then the best choice is
to use the first statement (STATEMENT 1) and based on the outcome (returned
number of rows) get the answer. If there are no rows returned then the job is
not running. Otherwise job is running.
What
if the requirement is to find whether the any job is running or not? For
example, before the server automated weekly maintenance started it may necessary
to have this information.
select sj.job_id, sj.name, sjh.step_name from dbo.sysjobhistory sjh
inner join sysjobs sj on sj.job_id = sjh.job_id
where sjh.instance_id > (select max(instance_id) from sysjobhistory sjh1
where sjh1.step_name = '(Job outcome)'
and sjh1.job_id = sjh.job_id
)
And the only issue left with the jobs that run the first time. They don’t have '(Job outcome)' row(s) yet. Next query will combine together both scenarios for one particular job.
IF exists ( select * from dbo.sysjobhistory sjh
inner join sysjobs sj on sj.job_id = sjh.job_id
where sj.name = 'job3'
and step_name = '(Job outcome)' )
select * from dbo.sysjobhistory sjh
inner join sysjobs sj
on sj.job_id = sjh.job_id
where sj.name = 'job3'
and sjh.instance_id > (select max(instance_id)
from sysjobhistory sjh1
where sjh1.step_name = '(Job outcome)' and sjh1.job_id = sj.job_id )
ELSE
select * from dbo.sysjobhistory sjh
inner join sysjobs sj on sj.job_id = sjh.job_id
where sj.name = 'job3'
Or to see any running job steps:
select sj.job_id, sj.name, sjh.step_name
from dbo.sysjobhistory sjh
inner join sysjobs sj on sj.job_id = sjh.job_id
where sjh.instance_id > (select max(instance_id) from sysjobhistory sjh1
where sjh1.step_name = '(Job outcome)'
and sjh1.job_id = sjh.job_id )
union
select sj.job_id, sj.name, sjh.step_name from dbo.sysjobhistory sjh
inner join sysjobs sj on sj.job_id = sjh.job_id
where sjh.step_name <> '(Job outcome)'
and sjh.instance_id = ( select max(instance_id)
from sysjobhistory sjh1
where sjh1.job_id = sjh.job_id )
Conclusion
As an example, even software companies (surprise!) have the outdated
information in the documentation. So, always check your statements to avoid the
code multifunction that may lead to some degree disaster.