How to find the needle in the haystack

Question

How to find the needle in the haystack

tcronin 95651

SSCertifiable

Points: 6735
More actions
April 8, 2015 at 9:33 am

#299741

Have a production sql 2008r2 server, supports about 600 users, at 8:28 this morning everything fine. At 8:33 CPU goes from 12% to 99% (all sql server). Look at my Idera DM, no blocking, no deadlocks no runaway query. RUn sp_whoisactive and a couple standard what is killing my CPU queries. All the stuff consuming the CPU are the standard stored procs and sql requests that always run. I look at Idera DM and using the historical data provided by the CPU alerts I see the same sp's and sql runnng at 8:28 as at 8:33. The difference is all of them are taking thousands of milliseconds longer. No runaway cartesan joins, no massive update stock nothing. Nothing in the sql logs After 20 minutes of this we restarted the sql server and of course all cleared up. This is a 32 core quad box with 128 gig of RAM. Anyone have any ideas what else I could have looked for?

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398913 More actions · Answer 1

Are there external processes running on the server like virus software? Do you have a minimum for memory? Assuming an external process started using up either CPU or memory, your system could have been starved of it and nothing would really show up within SQL Server.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

tcronin 95651 SSCertifiable Points: 6735 More actions · Answer 2

tcronin 95651

SSCertifiable

Points: 6735

April 8, 2015 at 11:49 am

#1789646

The CPU usages was all sql, yes have memory maxed.

Eirikur Eiriksson SSC Guru Points: 182920 More actions · Answer 3

tcronin 95651 (4/8/2015)
The CPU usages was all sql, yes have memory maxed.

Quick question, any RDP session onto the box at the time?

😎

tcronin 95651 SSCertifiable Points: 6735 More actions · Answer 4

tcronin 95651

SSCertifiable

Points: 6735

April 8, 2015 at 12:01 pm

#1789649

just me but not doing anything

Eirikur Eiriksson SSC Guru Points: 182920 More actions · Answer 5

tcronin 95651 (4/8/2015)
just me but not doing anything

Reason for asking is that I've seen "local resources" such as printers (incompatible drivers), virus scanning software bound to the profile etc. cause this kind of issue, as a rule of the thumb, no RDP on to a pressured system unless absolutely necessary. Having said this, I don't want to jump to the conclusion that the RDP session caused the issue, anything showing in the logs, any other activities on the box such as SSIS/SSRS/SSAS? Could it be a SAN fault?

😎

tcronin 95651 SSCertifiable Points: 6735 More actions · Answer 6

no ssis or srss, SAN guys looked all OK, never seen a sql server ever do this

Grant Fritchey SSC Guru Points: 398913 More actions · Answer 7

It's a pretty good rule to not RDP into production servers unless you absolutely have to. Most of the SQL Server and database administration can be done remotely from SSMS.

tcronin 95651 (4/8/2015)
The CPU usages was all sql, yes have memory maxed.

How do you mean? You have SQL Server set to a max on memory, but less than the max on the server so that the OS also has resources? Or you set SQL Server to max memory on the system? And then, you have a min memory setting. Not the max, but the minimum. If it's set to the default, if the OS needs resources, it can take them away from SQL Server.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

tcronin 95651 SSCertifiable Points: 6735 More actions · Answer 8

tcronin 95651

SSCertifiable

Points: 6735

April 8, 2015 at 12:26 pm

#1789661

with 128 gig sql max is at 118.

Grant Fritchey SSC Guru Points: 398913 More actions · Answer 9

I don't suppose you have before and after wait statistics on the server? It'd be good to know specifically what was causing stuff to run slow.

sp_whoisactive didn't point out anything that looked abnormal beyond some long execution times?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

tcronin 95651 SSCertifiable Points: 6735 More actions · Answer 10

From a resource metric the idera tools saves great snapshots of all resources and detail of what was running. The fact that all the queries became resource hogs at the same time is the confusing one, usually you will find one spid or group that is the issue, in this case they all became hogs at the same time. The only oddity is that 2 days before there were issues with on of the san drives that house the indexes for this database, only bad for a few minutes and then all is good

Alexander Suprun SSCertifiable Points: 6253 More actions · Answer 11

And you already checked that before and after query plans are the same?

What about wait stats during that time? Any abnormality?

Alex Suprun

tcronin 95651 SSCertifiable Points: 6735 More actions · Answer 12

cxpackets were a little high but not to catastrophic. This is a 3rd party app that I can index and help tune some of their sp's but kind of at their mercy. 7000 tables, 55K fields, so it will take a while

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 13

We've had similar problems in the past. It turned out to be tape backups (BERemote.exe to be specific) hitting the backups on the disks, which were in the wrong place.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kevin SSCarpal Tunnel Points: 4962 More actions · Answer 14

Is it possible that the database or log file was growing at the time?