How to find the needle in the haystack

  • Have a production sql 2008r2 server, supports about 600 users, at 8:28 this morning everything fine. At 8:33 CPU goes from 12% to 99% (all sql server). Look at my Idera DM, no blocking, no deadlocks no runaway query. RUn sp_whoisactive and a couple standard what is killing my CPU queries. All the stuff consuming the CPU are the standard stored procs and sql requests that always run. I look at Idera DM and using the historical data provided by the CPU alerts I see the same sp's and sql runnng at 8:28 as at 8:33. The difference is all of them are taking thousands of milliseconds longer. No runaway cartesan joins, no massive update stock nothing. Nothing in the sql logs After 20 minutes of this we restarted the sql server and of course all cleared up. This is a 32 core quad box with 128 gig of RAM. Anyone have any ideas what else I could have looked for?

  • Are there external processes running on the server like virus software? Do you have a minimum for memory? Assuming an external process started using up either CPU or memory, your system could have been starved of it and nothing would really show up within SQL Server.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The CPU usages was all sql, yes have memory maxed.

  • tcronin 95651 (4/8/2015)


    The CPU usages was all sql, yes have memory maxed.

    Quick question, any RDP session onto the box at the time?

    😎

  • just me but not doing anything

  • tcronin 95651 (4/8/2015)


    just me but not doing anything

    Reason for asking is that I've seen "local resources" such as printers (incompatible drivers), virus scanning software bound to the profile etc. cause this kind of issue, as a rule of the thumb, no RDP on to a pressured system unless absolutely necessary. Having said this, I don't want to jump to the conclusion that the RDP session caused the issue, anything showing in the logs, any other activities on the box such as SSIS/SSRS/SSAS? Could it be a SAN fault?

    😎

  • no ssis or srss, SAN guys looked all OK, never seen a sql server ever do this

  • It's a pretty good rule to not RDP into production servers unless you absolutely have to. Most of the SQL Server and database administration can be done remotely from SSMS.

    tcronin 95651 (4/8/2015)


    The CPU usages was all sql, yes have memory maxed.

    How do you mean? You have SQL Server set to a max on memory, but less than the max on the server so that the OS also has resources? Or you set SQL Server to max memory on the system? And then, you have a min memory setting. Not the max, but the minimum. If it's set to the default, if the OS needs resources, it can take them away from SQL Server.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • with 128 gig sql max is at 118.

  • I don't suppose you have before and after wait statistics on the server? It'd be good to know specifically what was causing stuff to run slow.

    sp_whoisactive didn't point out anything that looked abnormal beyond some long execution times?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • From a resource metric the idera tools saves great snapshots of all resources and detail of what was running. The fact that all the queries became resource hogs at the same time is the confusing one, usually you will find one spid or group that is the issue, in this case they all became hogs at the same time. The only oddity is that 2 days before there were issues with on of the san drives that house the indexes for this database, only bad for a few minutes and then all is good

  • And you already checked that before and after query plans are the same?

    What about wait stats during that time? Any abnormality?


    Alex Suprun

  • cxpackets were a little high but not to catastrophic. This is a 3rd party app that I can index and help tune some of their sp's but kind of at their mercy. 7000 tables, 55K fields, so it will take a while

  • We've had similar problems in the past. It turned out to be tape backups (BERemote.exe to be specific) hitting the backups on the disks, which were in the wrong place.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Is it possible that the database or log file was growing at the time?

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply