CPU Spikes Caused by Periodic Scheduled Jobs

Question

Post reply

CPU Spikes Caused by Periodic Scheduled Jobs

Igor Micev

SSC-Dedicated

Points: 33110
More actions
December 19, 2016 at 11:12 pm

#332808

Comments posted to this topic are about the item CPU Spikes Caused by Periodic Scheduled Jobs
Igor Micev,My blog: www.igormicev.com

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

skeleton567 SSCertifiable Points: 5878 More actions · Answer 1

Two things are going to cause CPU spikes by scheduled jobs:

1. Poor design that causes a requirement that a job be run too frequently.

2. Delusions that the function of a task is so critical that it must be run so frequently.

Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.

‘When people want to believe something bad enough

facts and logic never prove to be difficult obstacles.’

David Baldacci: The Whole Truth

Rick
Disaster Recovery = Backup ( Backup ( Your Backup ) )

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 2

skeleton567 (12/19/2016)
Two things are going to cause CPU spikes by scheduled jobs:
1. Poor design that causes a requirement that a job be run too frequently.
2. Delusions that the function of a task is so critical that it must be run so frequently.
Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.
‘When people want to believe something bad enough
facts and logic never prove to be difficult obstacles.’
David Baldacci: The Whole Truth

The jobs must be running on every minute, because it's a very busy and fast environment. We've got updates a couple of times per minute.

Igor Micev,My blog: www.igormicev.com

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 3

akljfhnlaflkj

SSC Guru

Points: 76202

December 20, 2016 at 6:47 am

#1918894

Great insight, thanks.

skeleton567 SSCertifiable Points: 5878 More actions · Answer 4

Igor Micev (12/20/2016)
skeleton567 (12/19/2016)
Two things are going to cause CPU spikes by scheduled jobs:
1. Poor design that causes a requirement that a job be run too frequently.
2. Delusions that the function of a task is so critical that it must be run so frequently.
Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.
‘When people want to believe something bad enough
facts and logic never prove to be difficult obstacles.’
David Baldacci: The Whole Truth
The jobs must be running on every minute, because it's a very busy and fast environment. We've got updates a couple of times per minute.

OK, I think you just illustrated Baldacci's point AND my second point above, so I refer you to my first point. Job initiation and termination can be VERY expensive, so you need to attack that part of the design. This is a classic case for thinking OUTSIDE the box. Get over the 'we've always done it this way' thing.

Rick
Disaster Recovery = Backup ( Backup ( Your Backup ) )

skeleton567 SSCertifiable Points: 5878 More actions · Answer 5

skeleton567 (12/20/2016)
Igor Micev (12/20/2016)
skeleton567 (12/19/2016)
Two things are going to cause CPU spikes by scheduled jobs:
1. Poor design that causes a requirement that a job be run too frequently.
2. Delusions that the function of a task is so critical that it must be run so frequently.
Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.
‘When people want to believe something bad enough
facts and logic never prove to be difficult obstacles.’
David Baldacci: The Whole Truth
The jobs must be running on every minute, because it's a very busy and fast environment. We've got updates a couple of times per minute.
OK, I think you just illustrated Baldacci's point AND my second point above, so I refer you to my first point. Job initiation and termination can be VERY expensive, so you need to attack that part of the design. This is a classic case for thinking OUTSIDE the box. Get over the 'we've always done it this way' thing. I don't know the particular aspects of your running job, and I have never used it, but I would be looking at the Service Broker and message queing for a possible alternative.

Rick
Disaster Recovery = Backup ( Backup ( Your Backup ) )

DEK46656 SSCrazy Points: 2111 More actions · Answer 6

Nice article, thanks for sharing. We have a number of systems that have some high “peakedness” for CPU. It hasn’t become an issue yet, but I’ll look into this approach you’ve outlined.

In relation to skeleton567 comment(s), I would suggest a different approach to your job processing. I consider Agent as a “batch” system meant to deal with larger operations. Maintenance plans and such makes sense in Agent, as well as larger “functions” of your database application. If you are scheduling a 1 (or 5) minute job to process application data, I think you could consider other approaches.

I’ve been spending a lot of time lately with Service Broker (SSB), and the concept of asynchronous triggers. The design approach is to use a trigger on a table (in your case, the one being updated sub-minute) that first INSERTs data to another table, then sends a message to a SSB queue. Depending on the data involved (size, complexity, etc) you could put the data in the SSB message itself. Then your “processing” code that currently runs in Agent can be run from within an “activation” procedure.

If you’re interested in this approach, these 2 site www.davewentzel.com and http://www.sqlnotes.info[/url] are probably the best I’ve found so far.

Beer's Law: Absolutum obsoletum
"if it works it's out-of-date"

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 7

skeleton567 (12/20/2016)
Igor Micev (12/20/2016)
skeleton567 (12/19/2016)
Two things are going to cause CPU spikes by scheduled jobs:
1. Poor design that causes a requirement that a job be run too frequently.
2. Delusions that the function of a task is so critical that it must be run so frequently.
Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.
‘When people want to believe something bad enough
facts and logic never prove to be difficult obstacles.’
David Baldacci: The Whole Truth
The jobs must be running on every minute, because it's a very busy and fast environment. We've got updates a couple of times per minute.
OK, I think you just illustrated Baldacci's point AND my second point above, so I refer you to my first point. Job initiation and termination can be VERY expensive, so you need to attack that part of the design. This is a classic case for thinking OUTSIDE the box. Get over the 'we've always done it this way' thing.

Job initiation and termination can be VERY expensive - If I even get that this is true, then the article shows how to reduce the spikes caused even by the expensive initiation and termination.

Anyway, that design is imposed by the application developers, so it would be a little bit difficult for me to make them change that. All I can do for them currently is reducing the spikes because of their decision to go that way. Will have this in mind...

Thanks.

Igor Micev,My blog: www.igormicev.com

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 8

DEK46656 (12/20/2016)
Nice article, thanks for sharing. We have a number of systems that have some high “peakedness” for CPU. It hasn’t become an issue yet, but I’ll look into this approach you’ve outlined.
In relation to skeleton567 comment(s), I would suggest a different approach to your job processing. I consider Agent as a “batch” system meant to deal with larger operations. Maintenance plans and such makes sense in Agent, as well as larger “functions” of your database application. If you are scheduling a 1 (or 5) minute job to process application data, I think you could consider other approaches.
I’ve been spending a lot of time lately with Service Broker (SSB), and the concept of asynchronous triggers. The design approach is to use a trigger on a table (in your case, the one being updated sub-minute) that first INSERTs data to another table, then sends a message to a SSB queue. Depending on the data involved (size, complexity, etc) you could put the data in the SSB message itself. Then your “processing” code that currently runs in Agent can be run from within an “activation” procedure.
If you’re interested in this approach, these 2 site www.davewentzel.com and http://www.sqlnotes.info[/url] are probably the best I’ve found so far.

Agree, the design could be improved by using SSB.

Igor Micev,My blog: www.igormicev.com

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 9

Spreading out jobs to avoid spikes is a good idea BUT... the real fact is that even after all you did, you're still taking spikes to 40% of ALL the CPUs. Someone needs to fix that code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Igor Micev SSC-Dedicated Points: 33110 More actions · Answer 10

Jeff Moden (12/20/2016)
Spreading out jobs to avoid spikes is a good idea BUT... the real fact is that even after all you did, you're still taking spikes to 40% of ALL the CPUs. Someone needs to fix that code.

Hi Jeff,

You're right. The difference is that now the spikes are smaller and narrower.

Igor Micev,My blog: www.igormicev.com

gfish@teamnorthwoods.com SSC Enthusiast Points: 103 More actions · Answer 11

Let me suggest a much simpler way of avoiding the CPU spike caused by running multiple jobs at once. Simply combine them into one job, with the contents of all of the current job converted to steps in the single job. Spacing out the start time certainly helps with the spikes, but there is still a possibility of the jobs overlapping if one takes longer than anticipated. Individual job steps are run sequentially, with no possibility of overlap.

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 12

gfish@teamnorthwoods.com (12/21/2016)
Simply combine them into one job, with the contents of all of the current job converted to steps in the single job.

That does make it much more difficult to temporarily suspend a job.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 13

Igor Micev (12/21/2016)
Jeff Moden (12/20/2016)
Spreading out jobs to avoid spikes is a good idea BUT... the real fact is that even after all you did, you're still taking spikes to 40% of ALL the CPUs. Someone needs to fix that code.
Hi Jeff,
You're right. The difference is that now the spikes are smaller and narrower.

Understood but they also occur during a longer period of time possibly causing other problems for a longer period of time. It is a tradeoff.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

skeleton567 SSCertifiable Points: 5878 More actions · Answer 14

gfish@teamnorthwoods.com (12/21/2016)
Let me suggest a much simpler way of avoiding the CPU spike caused by running multiple jobs at once. Simply combine them into one job, with the contents of all of the current job converted to steps in the single job. Spacing out the start time certainly helps with the spikes, but there is still a possibility of the jobs overlapping if one takes longer than anticipated. Individual job steps are run sequentially, with no possibility of overlap.

Now that is what I referred to as thinking outside the box. I think this is so far the best proposed solution yet on this discussion. What we used to call the KISS method - Keep It Simple, Stupid'. I don't remember from my active days, but I don't think a running job will start again. And especially if this is that original task that runs every minute, skipping a minute won't hurt a thing, as long as you don't tell anybody.

Rick
Disaster Recovery = Backup ( Backup ( Your Backup ) )