April 2, 2017 at 5:45 pm
Hi there,
I'm looking for suggestions for statistics I could monitor to discover what could be causing the cpu "blunted spikes" (seen the visual attached CPUSpikes.png).
Some known information:
Many thanks,
Laurence
April 2, 2017 at 8:33 pm
Have you checked for simple but lengthy blocking and deadlocks?
I'd also be concerned about the nearly flat 30% CPU usage you're seeing.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 2, 2017 at 10:08 pm
Hey Jeff,
Thank you very kindly for your reply.
We have some monitoring software that alerts us to deadlocks and long running queries (that are being blocked) and their blocker as soon as they occur. Additionally, before this spike occurs, there are just about never any deadlocks or blocking.
What's worse, a really simple query like select top 1 * from Table with(nolock) (enum table with only 20 records) will appear to hang (or wait) until the spike has subsided. Also, you cannot connect to the server during this spike.
I have done the following:
Other improvements have also been made, like ensuring there is no fragmentation in the log file etc...
Many thanks,
Laurence
April 3, 2017 at 12:46 am
Sounds like you're taking all the right steps. There are a couple of things that may be an issue. The first is whatever monitoring software you're using. I've seen such things have this effect before although not as sporadically as what you say. Also, I've seen virus checkers that will do this if you haven't told them to exclude MDF/NDF/LDF and other files associated with SQL Server.
About the only other thing that I can suggest without personally watching your system is to get a copy of Adam Machanics WhoIsActive stored procedure and run it when you see one of the spikes occur.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 3, 2017 at 1:54 am
Hi Jeff,
Thank you for your reply and help again.
I will check for anti-virus software tomorrow. There might be something automatic that Amazon do, that we don't know about.
We have configured sp_whoisactive on the server too. The thing is, when the spikes occur, you can't run sp_whoisactive; even the profiler seems to hang, and no queries that incurred a long wait time (low CPU time, but with a long duration) seem problematic.
Again, thanks for taking the time to assist me,
Laurence
April 3, 2017 at 6:52 am
I'm not sure what the interval represented in the graph is, but I've seen something similar when our DR backup software kicks in. Our spikes aren't anywhere near 100% CPU usage, but they give the same appearance, and database applications appear to "hang" for a few seconds.
Like yours, our service disruption is intermittent (and thankfully growing rarer). I'd ask Amazon about it, maybe it's something on their end?
Our server is local, not cloud, but the behavior is remarkably similar to what you're describing.
April 3, 2017 at 8:46 am
laurence.proctor - Monday, April 3, 2017 1:54 AMThe thing is, when the spikes occur, you can't run sp_whoisactive; even the profiler seems to hang, and no queries that incurred a long wait time (low CPU time, but with a long duration) seem problematic.
Yowch! That's just crazy. Roger (above) may be on to something.
Shifting gears, once you figure out the blunted spikes, I'd start looking into the nearly flat line 30% cpu usage. That seems pretty high even on a busy system.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 3, 2017 at 3:14 pm
Hey Roger,
Thanks for the reply.
We currently don't have any specific backup software - we simply use Ola Hallengren's backup code. Also, we only carry out 15-minute interval transaction log backups (which range from 10 MB to 2 GB - we saw spikes yesterday, and our log files were on average 80 MB in size - the spikes did not correlate to the backup times). To add to this, we do carry out daily full backups, which are subsequently uploaded to S3.
Hey Jeff,
I have checked, and we don't have anti-virus software on our server.
What I am doing today is collecting Wait statistics (using Paul Randal's scripts) to see if there is a correlation between the spikes and any waiting we might incur.
I am also going to try and use PSDiag to try and get as much info as possible.
Again, thank you to all who have commented, replied, and assisted.
Regards,
Laurence
April 3, 2017 at 4:52 pm
Hey Jeff,
I forgot to add; the monitoring software is simply an SQL product I developed that only runs queries against the database (that I have checked to ensure are not carrying out large scans, or incurring expensive operations). In all of my tests, this software is less than 1% of the overall consumption on a daily basis.
And by the way, I really appreciate your assistance.
Hey all,
I just wanted to report my findings this morning.
I started collecting Wait Stats (using Paul's great scripts), and found the following:
This was no different when we faced the spikes / not.
Additionally, we saw spikes at the 8:42.20, and our backups had not even been triggered yet. The difference below is that the Spikes are not blunt, but rather peaking and staying at max CPU. I have been seeing this a lot lately (as well as the blunt spikes). Additionally, the spikes mainly seem to occur when we are processing large loads of data (through the SQL Broker).
Any further ideas?
Regards,
Laurence
April 4, 2017 at 9:52 pm
Does anybody have any further ideas?
Thank you to all who have assisted - It is much appreciated.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply