Recently, I covered the need to understand job failure frequency and knowing the value of your SQL Agent jobs. You can read the specifics in the article – here.
Sometimes, just knowing the frequency of the job failure is good enough. Sometimes, more information is helpful. Having more information is particularly helpful when meeting with the business to discuss the validity of the job in question.
What do you do in times like this? The most basic answer to that question is – get more data. But that barely scratches the surface. The real question being asked there is how do you go about gathering that data?
There are two methods to gather the data – the hard way and the easy way. Do you like to work hard? Or would you rather work more efficiently?
Efficiency Matters
As was discussed in the previous article, I prefer to do things just a little bit less manually where possible. The consistency of a script matters, but it also is just so much faster than doing things the hard, manual, iterative way. So let’s build a little bit on the script from the previous article.
WITH jobhist AS (SELECT jh.job_id , jh.run_date , jh.run_status , jh.step_id , ROW_NUMBER() OVER (PARTITION BY jh.job_id, jh.run_status ORDER BY jh.run_date DESC) AS rownum FROM dbo.sysjobhistory jh WHERE jh.step_id = 0) , agglastsixty AS (SELECT sjh.job_id , sjh.run_status , COUNT(*) AS RunCount FROMdbo.sysjobhistory sjh INNER JOIN jobhist jh ON jh.job_id = sjh.job_id AND jh.run_status = sjh.run_status AND jh.step_id = sjh.step_id WHERECONVERT(DATE, CONVERT(VARCHAR, sjh.run_date)) > DATEADD( d , -60 , CONVERT(DATE, CONVERT(VARCHAR, jh.run_date)) ) AND jh.rownum = 1 GROUP BYsjh.job_id , sjh.run_status) , aggtotal AS (SELECT sjh.job_id , sjh.run_status , COUNT(*) AS RunCount FROMdbo.sysjobhistory sjh INNER JOIN jobhist jh ON jh.job_id = sjh.job_id AND jh.run_status = sjh.run_status AND jh.step_id = sjh.step_id WHEREjh.rownum = 1 GROUP BYsjh.job_id , sjh.run_status) SELECT j.name AS JobName , sc.name AS CategoryName -- , sp.name AS OwnerName -- , j.owner_sid , j.date_created , j.enabled , CONVERT(DATE, CONVERT(VARCHAR, oa.run_date)) AS RunDate , CASE oa.run_status WHEN 0 THEN 'Failed' WHEN 1 THEN 'Succeeded' WHEN 2 THEN 'Retry' WHEN 3 THEN 'Canceled' WHEN 4 THEN 'In Progress' END AS run_status , als.RunCount AS Last60StatusCount , agt.RunCount AS TotalStatusCount , js.subsystem AS JobStepSubsystem , js.command , CASE WHEN js.subsystem = 'SSIS' AND js.command LIKE '%DECRYPT%' THEN LTRIM(RTRIM(SUBSTRING( js.command , CHARINDEX('/DECRYPT', js.command, 1) + 9 , CHARINDEX('/', js.command, CHARINDEX('/DECRYPT', js.command, 1) + 1) - CHARINDEX('/DECRYPT', js.command, 1) - 9 ) ) ) ELSE 'N/A' END AS PkgPassword , CASE WHEN js.subsystem = 'SSIS' AND js.command LIKE '%FILE%' THEN LTRIM(RTRIM(SUBSTRING( js.command , CHARINDEX('/FILE', js.command, 1) + 9 , CHARINDEX('""', js.command, CHARINDEX('/FILE', js.command, 1) + 1) - CHARINDEX('/FILE', js.command, 1) - 9 ) ) ) ELSE 'N/A' END AS PkgPath , spr.name AS ProxyName , spr.credential_id FROMdbo.sysjobs j LEFT JOINjobhist oa ON oa.job_id = j.job_id LEFT OUTER JOIN agglastsixty als ON als.job_id = oa.job_id AND als.run_status = oa.run_status LEFT OUTER JOIN aggtotal agt ON agt.job_id = oa.job_id AND agt.run_status = oa.run_status INNER JOINsys.server_principals sp ON j.owner_sid = sp.sid INNER JOINdbo.syscategories sc ON j.category_id = sc.category_id INNER JOINdbo.sysjobsteps js ON js.job_id = j.job_id LEFT JOINdbo.sysproxies spr ON js.proxy_id = spr.proxy_id WHEREoa.rownum = 1 --AND oa.run_status = 0 ORDER BYRunDate DESC;
And here is a sample of the output.
With this script, I have the ability to quick show which step is failing, what the command is for that step, what kind of process is running on that step, any passwords (in the event of an SSIS password), and of course the failure frequency. This is golden information at the fingertips. There is no need to click through the GUI to gather this information. You can get it quickly and easily in one fell swoop.
The Wrap
An important part of any DBAs job is to ensure database related jobs are running prim and proper. Sometimes that just doesn’t happen. When jobs are being overlooked, it is useful to be able to gather data related to consistency of job success or failure. This script will help you in your investigation efforts. In addition, I also recommend this article in your agent job audit efforts.
This has been another post in the back to basics series. Other topics in the series include (but are not limited to): Backups, backup history and user logins.