April 16, 2010 at 8:14 am
We have a few important replication processes running and I need to be informed immediately if such jobs stops for whatever reason with whatever status. I have Googled the net and looked into the job-tables in msdb, but sofar have not been able to find out how to make an alert on such a job, or have SCOM check these jobs. It seems there is no status for the sp_help_job which tells me a job is running. The jobmonitor clearly has something to look at and tell us if a job is executing or not. There seems to be no value for 'run_status' in tablesysjobhistory which tells a job is 'Executing'.
In sysjobactivity supposedly the attributes start_execution_date and stop_execution_date should tell you if a job is executing, but this is not reliable: A few weeks ago we were told a replication process was not running for three days already. Checks showed there was record for said job with an open stop_execution_date... Further analysis told us, that last year this server had crashed and was restarted, which seemingly resulted in an incorrect cleaning of the sysjobactivity table.
I could include a check on how many records there are in the sysjobactivity table with an open stop_execution_date, but I would hope there is a more simple, elegant solution.
Any ideas?
TIA,
FreeHansje
Greetz,
Hans Brouwer
April 16, 2010 at 11:53 am
Declare @retValue int
EXEC @retValue = msdb.dbo.sp_start_job @job_name = 'packagename'
If @retValue = 1
Raiserror ('Failed to start [packagename]', 16, 1, null)
April 16, 2010 at 12:42 pm
this might be pretty informative to see what raplication is doing. I have this being run every 5 mins and emails me when latency is above a predefined threshold.
SELECT dl.publisher_db,
dl.subscriber_server,
dl.subscriber_db,
dl.distribution_agent,
CASE dl.runstatus
WHEN 1 THEN 'Start'
WHEN 2 THEN 'Succeed'
WHEN 3 THEN 'In progress'
WHEN 4 THEN 'Idle'
WHEN 5 THEN 'Retry'
WHEN 6 THEN 'Fail'
END AS DistributionRunStatus,
dl.distribution_latency,
CASE rl.runstatus
WHEN 1 THEN 'Start'
WHEN 2 THEN 'Succeed'
WHEN 3 THEN 'In progress'
WHEN 4 THEN 'Idle'
WHEN 5 THEN 'Retry'
WHEN 6 THEN 'Fail'
END AS ReaderRunStatus,
rl.reader_latency,
( dl.distribution_latency + Isnull(rl.reader_latency, 0) ) AS total_latency
FROM (SELECT agent_id,
publisher_db,
Upper(ss.datasource) AS subscriber_server,
subscriber_db AS subscriber_db,
[name] AS distribution_agent,
DH.runstatus,
CASE
WHEN DH.runstatus = 1 THEN Datediff(SS, DH.TIME, Getdate())
WHEN DH.runstatus = 3 THEN DH.delivery_latency
ELSE 0
END AS distribution_latency
FROM (SELECT DHMAX.*
FROM distribution.dbo.MSdistribution_history DHMAX WITH(NOLOCK)
INNER JOIN (SELECT agent_id,
MAX([time]) AS TIME
FROM distribution.dbo.MSdistribution_history DH WITH(NOLOCK)
GROUP BY agent_id) MTDH
ON DHMAX.agent_id = MTDH.agent_id
AND DHMAX.TIME = MTDH.TIME) DH
JOIN distribution.dbo.MSdistribution_agents DA WITH(NOLOCK)
ON DH.agent_id = DA.id
JOIN MASTER.dbo.sysservers ss WITH(NOLOCK)
ON da.subscriber_id = ss.srvid) AS DL
LEFT OUTER JOIN (SELECT agent_id,
la.publisher_db,
lh.runstatus,
CASE
WHEN lh.runstatus = 1 THEN Datediff(SS, LH.TIME, Getdate())
WHEN lh.runstatus = 3 THEN lh.delivery_latency
ELSE 0
END AS reader_latency
FROM (SELECT LHMAX.*
FROM distribution.dbo.MSlogreader_history LHMAX WITH(NOLOCK)
INNER JOIN (SELECT agent_id,
MAX([time]) AS [time]
FROM distribution.dbo.MSlogreader_history WITH(NOLOCK)
GROUP BY agent_id) MTLH
ON LHMAX.agent_id = MTLH.agent_id
AND LHMAX.TIME = MTLH.TIME) LH
JOIN distribution.dbo.mslogreader_agents LA WITH(NOLOCK)
ON lH.agent_id = lA.id) AS RL
ON DL.publisher_db = RL.publisher_db
ORDER BY dl.publisher_db,
dl.distribution_agent
April 16, 2010 at 2:34 pm
I use this script, think I may of got it here or just profiled it on replication monitor, can't remember, I am concerned with failures, don't like having the client tell us something is down. Collect the details via ssis punt into a table and if select count(*)>= send an email with the query to check it to the dba on call dl.
use distribution;
SELECT
MSdistribution_agents.name, MSdistribution_agents.publisher_db,
MSdistribution_agents.publication,
MSdistribution_agents.subscriber_db,
case when runstatus = 1 then 'Start'
when runstatus = 2 then 'Succeed'
when runstatus = 3 then 'In progress'
when runstatus = 4 then 'Idle'
when runstatus = 5 then 'Retry'
when runstatus = 5 then 'Fail'
end as RunStatus,
MSdistribution_history.[time], MSdistribution_history.duration,
MSdistribution_history.delivered_transactions, MSdistribution_history.delivered_commands,
MSdistribution_history.error_id,
MSdistribution_history.comments
FROM
MSdistribution_agents (nolock)
INNER JOIN MSdistribution_history (nolock)
ON MSdistribution_agents.id = MSdistribution_history.agent_id
where
MSdistribution_history.[time] >= dateadd(mi, -30, getdate())
and
MSdistribution_history.error_id > 0
order by MSdistribution_history.[time] desc
April 21, 2010 at 1:37 am
Tnx for answering all.
I am uncertain on what the value of run_status tells me. If status is 4(Idle), does this mean that the job is active, but currently there is no activity? I come to this conclusion because of the values 2 and 6(Succeed and Fail).
And Andrew, I'm trying to understand what restrictions you use in the WHERE clause: do you only ask for history records older then 30 minutes? Is that an arbitrary number?
Tnx again
Greetz,
Hans Brouwer
April 21, 2010 at 8:05 am
I am only interested in failures, I have a job collect any failures every 15 minutes, the where clause is so I don't miss any failures if there is a network blip when running the ssis package. We toyed with looking at latency, but it can get mangled depending on load, user or actual data loads, and the usual database maintenance. You can change the where clause to fit your needs, we just can't have our clients telling us something is not working.
Andrew
April 21, 2010 at 10:00 am
here's a proc that I use to get the status of a job. I put it in the master db, that's why the naming convention
Create procedure [sp_IsJobRunning]
@ps_JobName nvarchar(128),
@pn_JobState int out,
@pn_JobOutcome int out
as
--XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
--X DESCRIPTION: Procedure used to determine if a SQL Agent job is running
--X
--X ASSUMPTIONS: Need execute permissions on master..xp_sqlagent_enum_jobs
--X
--X PARAMETERS: @ps_JobName - Name of job to find running
--X @pn_JobState - state of the job
--X Valid Options for Job_State are
--X 0 - Not Idle or Suspended
--X 1 - Executing Job
--X 2 - Waiting For Thread
--X 3 - Between Retries
--X 4 - Idle
--X 5 - Suspended
--X 6 - WaitingForStepToFinish
--X 7 - PerformingCompletionActions
--X @pn_JobOutcome - Outcome status of job
--X Valid options for Last_Run_Outcome are
--X 0 - Failed
--X 1 - Successful
--X 2 - Not Valid
--X 3 - Canceled
--X 4 - Executing
--X 5 - Undetermined State
--X
--X RETURNS: @pn_JobState and @pn_JobOutcome
--XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
begin
set nocount on
Declare @vt_JobResults TABLE (
job_id UNIQUEIDENTIFIER NOT NULL,
last_run_date INT NOT NULL,
last_run_time INT NOT NULL,
next_run_date INT NOT NULL,
next_run_time INT NOT NULL,
next_run_schedule_id INT NOT NULL,
requested_to_run INT NOT NULL, -- BOOL
request_source INT NOT NULL,
request_source_id sysname COLLATE database_default NULL,
running INT NOT NULL, -- BOOL
current_step INT NOT NULL,
current_retry_attempt INT NOT NULL,
job_state INT NOT NULL)
declare @vn_job_id UNIQUEIDENTIFIER
-- Get the Job ID. This is needed to see if the job is actually running
select @vn_job_id = job_id
from msdb..sysjobs
where [name] = @ps_JobName
-- Populate temp table with results of all jobs
insert into @vt_JobResults
exec master..xp_sqlagent_enum_jobs 1,''
-- Check to see if the job is already running
select @pn_jobstate = job_state, @pn_JobOutcome = last_run_outcome
from @vt_JobResults r
inner join msdb.dbo.sysjobservers so
on r.job_id = so.job_id
where r.job_id = @vn_job_id
end
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
September 1, 2010 at 9:41 am
did you receive an answer for the runstatus = '4'? running or not? thanks
September 3, 2010 at 3:03 am
Nope, no answewr on that yet.
Greetz,
Hans Brouwer
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply