Assistance in analyzing job history

  • I am trying to identify jobs which are long running compared to the baseline, i have created a sql query which "kinda" does what i am looking for but i am not 100% confident that it is accurate. Below is the script. In the script i am first creating my working list of jobs, jobs that have run atleast 4 times, so that i can compare current vs history. Then i am also comparing avg per day vs the execution for the same day of the week. Does this make sense?

    -- This script will compare current day's run againt past 4 weeks execution on same day

    USE msdb;

    SELECT DISTINCT

    j.name ,

    COUNT(*) ExecCounts

    Into #TempJobList

    FROM msdb.dbo.sysjobhistory h

    JOIN msdb.dbo.sysjobs j

    ON j.job_id = h.job_id

    WHERE j.enabled = 1

    GROUP BY j.name

    HAVING COUNT(*) > 4

    ORDER BY j.name

    SELECT

    d.job_id ,

    d.jobname ,

    avgDurationMinutes = AVG(d.durationMinutes) ,

    daydate = CONVERT(CHAR(10), startdatetime, 101)

    ,

    --datename(dw,startdatetime) DayOFWeek,

    count(*) ExecCount

    INTO #Temp1

    FROM (

    SELECT j.job_id ,

    jobname = j.name ,

    startdatetime = CONVERT (DATETIME, RTRIM(run_date))

    + ( run_time * 9 + run_time % 10000 * 6 + run_time % 100

    * 10 ) / 216e4 ,

    durationMinutes = ( CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 1, 3) AS INT)

    * 60 * 60

    + CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 4, 2) AS INT)

    * 60

    + CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 6, 2) AS INT) )

    / 60. ,

    enddatetime = DATEADD(ss,

    ( CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 1, 3) AS INT)

    * 60 * 60

    + CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 4, 2) AS INT)

    * 60

    + CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 6, 2) AS INT) ),

    ( CONVERT (DATETIME, RTRIM(run_date))

    + ( run_time * 9 + run_time

    % 10000 * 6 + run_time % 100

    * 10 ) / 216e4 )) ,

    retries_attempted

    FROM sysjobs j ( NOLOCK )

    JOIN sysjobhistory h

    ON h.job_id = j.job_id

    AND h.step_id = 0 -- look only at the job outcome step for the total job runtime

    AND j.enabled = 1 -- jobs only enabled

    AND h.run_status = 1 -- give jobs only that successfully executed

    --AND j.name IN ( 'DBMAINT - LOGSHIP - RESTORE DoorAccess' ) -- Set the jobname here

    ) d

    WHERE CONVERT(DATE, startdatetime) = CONVERT(DATE, GETDATE() -1 ) -- 1 day old to have 24 hours of data

    AND d.jobname in ( select name from #TempJobList)

    GROUP BY d.jobname ,d.job_id ,

    CONVERT(CHAR(10), startdatetime, 101)

    ORDER BY d.jobname ,

    CAST(CONVERT(CHAR(10), startdatetime, 101) AS DATETIME) DESC

    SELECT

    d.job_id ,

    d.jobname ,

    avgDurationMinutes = AVG(d.durationMinutes) ,

    daydate = CONVERT(CHAR(10), startdatetime, 101)

    ,

    --datename(dw,startdatetime) DayOFWeek,

    count(*) ExecCount

    INTO #Temp2

    FROM (

    SELECT j.job_id ,

    jobname = j.name ,

    startdatetime = CONVERT (DATETIME, RTRIM(run_date))

    + ( run_time * 9 + run_time % 10000 * 6 + run_time % 100

    * 10 ) / 216e4 ,

    durationMinutes = ( CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 1, 3) AS INT)

    * 60 * 60

    + CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 4, 2) AS INT)

    * 60

    + CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 6, 2) AS INT) )

    / 60. ,

    enddatetime = DATEADD(ss,

    ( CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 1, 3) AS INT)

    * 60 * 60

    + CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 4, 2) AS INT)

    * 60

    + CAST(SUBSTRING(( RIGHT('0000000'

    + CONVERT(VARCHAR(7), run_duration),

    7) ), 6, 2) AS INT) ),

    ( CONVERT (DATETIME, RTRIM(run_date))

    + ( run_time * 9 + run_time

    % 10000 * 6 + run_time % 100

    * 10 ) / 216e4 )) ,

    retries_attempted

    FROM sysjobs j ( NOLOCK )

    JOIN sysjobhistory h

    ON h.job_id = j.job_id

    AND h.step_id = 0 -- look only at the job outcome step for the total job runtime

    AND j.enabled = 1

    AND h.run_status = 1

    --WHERE j.name IN ( 'testjob' ) -- Set the jobname here if checking for a particular job

    ) d

    WHERE DATEDIFF(day, startdatetime, GETDATE()) < 30-- check for past 30 days

    AND CONVERT(DATE, startdatetime) != CONVERT(DATE, GETDATE() -1 ) -- ignore current date

    GROUP BY d.jobname ,d.job_id ,

    CONVERT(CHAR(10), startdatetime, 101)

    ORDER BY d.jobname ,

    CAST(CONVERT(CHAR(10), startdatetime, 101) AS DATETIME) DESC

    SELECT t1.jobname ,

    t1.avgDurationMinutes CurrentavgDurationMinutes ,

    t1.daydate CurrentDate ,

    t2.avgDurationMinutes BaselineavgDurationMinutes ,

    t2.daydate BaselineDate ,

    ( ( t1.avgDurationMinutes * 100 ) / t2.avgDurationMinutes - 100 ) [WentUpBy%] ,

    DATENAME(dw, t1.daydate) DayOfWeek ,

    [Occurs] = CASE c.freq_type

    WHEN 1 THEN 'Once'

    WHEN 4 THEN 'Daily'

    WHEN 8 THEN 'Weekly'

    WHEN 16 THEN 'Monthly'

    WHEN 32 THEN 'Monthly relative'

    WHEN 64 THEN 'When SQL Server Agent starts'

    WHEN 128 THEN 'Start whenever the CPU(s) become idle'

    ELSE ''

    END,

    'Units freq_subday_interval' = CASE WHEN c.freq_subday_type = 1

    THEN 'Once'

    WHEN c.freq_subday_type = 2

    THEN 'Seconds'

    WHEN c.freq_subday_type = 4

    THEN 'Minutes'

    WHEN c.freq_subday_type = 8

    THEN 'Hours'

    END

    FROM #Temp1 t1

    JOIN #Temp2 t2

    ON t1.jobname = t2.jobname

    AND DATENAME(dw, t1.daydate) = DATENAME(dw, t2.daydate) -- Checking againts job run for the same day of the week

    JOIN msdb.dbo.sysjobschedules s

    ON t1.job_id = s.job_id

    JOIN msdb.dbo.sysschedules c

    ON s.schedule_id = c.schedule_id

    WHERE t1.avgDurationMinutes > ( t2.avgDurationMinutes * 1.3 )

    AND t1.avgDurationMinutes > 1

    ORDER BY

    t1.jobname,

    t1.avgDurationMinutes DESC ,

    t2.daydate DESC

    DROP TABLE #Temp1

    DROP TABLE #Temp2

    Drop Table #TempJobList

  • So.... what makes you doubt the accruacy? Be aware that if anyone with the necessary permissions manages to delete any of the job history, then yes, it could be inaccurate, and there'd be no easy way to fix that. Also, how will re-runs of such jobs affect the results? What happens when one job ends up waiting on resources for an hour or two, but eventually finishes? Those kinds of things are only easily detected if you track all the details and not just any form of summary.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson (11/19/2015)


    So.... what makes you doubt the accruacy? Be aware that if anyone with the necessary permissions manages to delete any of the job history, then yes, it could be inaccurate, and there'd be no easy way to fix that. Also, how will re-runs of such jobs affect the results? What happens when one job ends up waiting on resources for an hour or two, but eventually finishes? Those kinds of things are only easily detected if you track all the details and not just any form of summary.

    Not concerned about if anyone deletes the history ( almost certain no one would 🙂 ). Jobs are averaged per day, so multiple execution is fine.

    I guess i wanted to know how would others do for what i am looking for, i am sure there is a better way to do this than what i did.

  • curious_sqldba (11/19/2015)


    sgmunson (11/19/2015)


    So.... what makes you doubt the accruacy? Be aware that if anyone with the necessary permissions manages to delete any of the job history, then yes, it could be inaccurate, and there'd be no easy way to fix that. Also, how will re-runs of such jobs affect the results? What happens when one job ends up waiting on resources for an hour or two, but eventually finishes? Those kinds of things are only easily detected if you track all the details and not just any form of summary.

    Not concerned about if anyone deletes the history ( almost certain no one would 🙂 ). Jobs are averaged per day, so multiple execution is fine.

    I guess i wanted to know how would others do for what i am looking for, i am sure there is a better way to do this than what i did.

    I really don't see any obvious issues with your method, but then, I haven't dived into the details either... The only thing I would worry about is lost history data, and that you are only storing averages instead of individual job executions. You might want to look at the outliers and you'd need the detail records to be able to do that. Given that you have no concerns about losing history data, then all I might worry about is carrying forward the detail records and not just the averages. Maybe export those to a spreadsheet on a regular basis?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply