Erratic SP Performance

  • One of our Sql Servers is experiencing sluggish performance. I've been monitoring performance primarily by running sql traces. It seems the slowdowns are occurring during peak usage during business hours. One of the traces I run monitors for stored procs and other sql batches that take 100 milliseconds or more to run. (Most of the interaction on the server is via stored procs.)

    The numbers below are from an SP that takes a single input parameter and joins three tables. I chose it because it was one of the "simpler" SP's and is executed frequently throughout the day. Since it showed up often in my 100+ ms trace, I started another trace, filtering on just the one SP by name and nothing else.

    I'm wondering what's going on within the server when it takes 388 ms to run, versus the other times when it runs in 2 ms or less. Also, sometimes CPU cycles are zero, while other times it is in the teens.

    Duration

    SPID milliseconds Reads Writes CPU

    71 2 110 0 0

    71 2 110 0 16

    65 2 110 0 15

    71 3 110 0 16

    71 1 107 0 0

    71 1 107 0 0

    71 1 107 0 0

    71 388 107 0 0

    71 2 107 0 0

    58 388 110 0 0

    58 0 113 0 0

    58 0 113 0 0

    51 0 107 0 0

    72 0 110 0 0

    72 0 113 0 0

  • You might be running into either blocking/locking issues, or into parallelism issues.

    The easiest way to narrow down what to target is to track wait states on the server. You can do that through DMVs (search "sql server wait states" in Bing/Google), or you can get a third party tool that will help narrow it down for you. I use Confio Ignite for that, but Foglight and a number of other products work equally well.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I've been looking at sys.dm_os_wait_stats and cross-referencing http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx. I see quite a few actions that include "You should be able to safely ignore".

    Here are the wait stats from the server. Is there an easy/better way to post tabular data in these forums?

    wait_typewaiting_tasks_countwait_time_msmax_wait_time_mssignal_wait_time_msAvgWait

    FT_IFTS_SCHEDULER_IDLE_WAIT32442627063626003891617980982

    CLR_MANUAL_EVENT119661781846133051788514890

    LAZYWRITER_SLEEP137316175445643174757974092731277

    CLR_AUTO_EVENT3751740223788503242710464059

    ONDEMAND_TASK_QUEUE7158888628150295552022698375

    FT_IFTSHC_MUTEX8157860912148950452119732614

    SLEEP_TASK684712713140622899198649180219

    DISPATCHER_QUEUE_SEMAPHORE637210713301434798331306716813

    CHECKPOINT_QUEUE5851038195606512521048177469

    LOGMGR_QUEUE16905890129155394088993965265

    XE_TIMER_EVENT296188951075304428895107530040

    SQLTRACE_INCREMENTAL_FLUSH_SLEEP219078863019012783364045

    REQUEST_FOR_DEADLOCK_SEARCH17528884256705563884256705044

    BROKER_TO_FLUSH403914416108621131285691093

    BROKER_EVENTHANDLER4390293463536904919757336

    CXPACKET53927342498505870851323248534

    OLEDB625309428626915680150

    TRACEWRITE72229269526253538691283

    PAGEIOLATCH_SH92847750226701185218535

    LATCH_EX489693405007311634856348

    BROKER_TASK_STOP7223670249102652785083

    SOS_SCHEDULER_YIELD12205512455068121011552172

    BROKER_RECEIVE_WAITFOR211988405988390599420

    ASYNC_NETWORK_IO10415268161896334125

    IO_COMPLETION360581930705725245

    WRITELOG2064416502557246077

    BACKUPIO2575906350491924580

    SLEEP_BPOOL_FLUSH6188293694177334

    BACKUPTHREAD316727184598518

    LATCH_SH4499255465321675

    LCK_M_SCH_M1322242539701710

    PREEMPTIVE_OS_FILEOPS184920956533011

    PAGEIOLATCH_EX219020799193999

    BACKUPBUFFER5653170702912713

    LCK_M_X2131601154606580

    PREEMPTIVE_OS_AUTHENTICATIONOPS107041289756101

    MSQL_DQ4120191021803004

    ASYNC_IO_COMPLETION921042222892113

    MSQL_XP1254628339605

    PAGELATCH_UP989962445372550

    PREEMPTIVE_OS_GETPROCADDRESS1254596839604

    PREEMPTIVE_OS_GETFILEATTRIBUTES1611567639703

    SQLTRACE_FILE_READ_IO_COMPLETION1724406397025

    PREEMPTIVE_OS_DECRYPTMESSAGE2060382238501

    PAGELATCH_EX7051032854215290

    LOGBUFFER1003100884031

    PREEMPTIVE_OS_LOOKUPACCOUNTSID1369256338701

    PAGEIOLATCH_UP834243660112

    LCK_M_IX412416367258

    PREEMPTIVE_OS_DELETESECURITYCONTEXT2132206453800

    PREEMPTIVE_OS_CREATEFILE1147204139001

    PREEMPTIVE_OS_CRYPTACQUIRECONTEXT1928170938900

    PAGELATCH_SH1051713133594900

    SQLTRACE_FILE_BUFFER161291406280

    THREADPOOL411205386029

    PREEMPTIVE_OS_QUERYREGISTRY14661193300

    CMEMTHREAD1334010543875840

    PREEMPTIVE_OS_DTCOPS11043104301043

    PREEMPTIVE_OS_WAITFORSINGLEOBJECT142390130000

    PREEMPTIVE_OS_SETFILEVALIDDATA35784251702

    DTC78043922114

    EXECSYNC21467802971440

    PREEMPTIVE_OS_GETVOLUMEPATHNAME64473429001

    PREEMPTIVE_OS_CRYPTIMPORTKEY170649836600

    PREEMPTIVE_OS_REPORTEVENT21347738902

    PREEMPTIVE_OS_GENERICOPS4841839508

    PREEMPTIVE_OS_AUTHORIZATIONOPS2412277900

    SQLTRACE_FILE_WRITE_IO_COMPLETION10681433010

    CXROWSET_SYNC23341322380

    PREEMPTIVE_OS_DISCONNECTNAMEDPIPE1238119800

    IMPPROV_IOWAIT261152304

    LCK_M_SCH_S17777077

    SOS_RESERVEDMEMBLOCKLIST245684330

    PREEMPTIVE_OS_REVERTTOSELF166345500

    QUERY_EXECUTION_INDEX_SORT_EVENT_OPEN2837361

    PREEMPTIVE_OS_GETDISKFREESPACE35731300

    CLR_CRST8111160

    PREEMPTIVE_COM_QUERYINTERFACE2101005

    PREEMPTIVE_DTC_ENLIST78201

    FT_IFTS_RWLOCK1016130

    PREEMPTIVE_TRANSIMPORT75100

    SNI_CRITICAL_SECTION555110

    RESOURCE_SEMAPHORE_MUTEX183110

    SOS_DISPATCHER_MUTEX162120

    PREEMPTIVE_DTC_BEGINTRANSACTION72100

    SOS_LOCALALLOCATORLIST22111

    PREEMPTIVE_CREATEPARAM12202

    PREEMPTIVE_OS_MOVEFILE62100

    PREEMPTIVE_OLEDBOPS92100

    WRITE_COMPLETION61100

    PREEMPTIVE_DTC_COMMITREQUESTDONE71100

    PREEMPTIVE_DTC_PREPAREREQUESTDONE71100

    PREEMPTIVE_OS_FLUSHFILEBUFFERS11101

    TRANSACTION_MUTEX61100

  • Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.

    See if you can track, when that query runs slower than usual, what wait type that query is experiencing.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Have you traced the SP:Recompile event? If so, then is there a correlation between an increase in recompiles to some activity in the database, like following the reload of tables?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • GilaMonster (2/27/2012)


    Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.

    See if you can track, when that query runs slower than usual, what wait type that query is experiencing.

    Thanks for the advice, Gail. I had noticed CXPACKET waits previously when I was looking at the output of dm_os_wait_stats. But when I did the math, the average wait was really low.

    I discovered some occasional CXPACKET waits when I looked at sys.dm_os_waiting_tasks. The wait durations varied. some were insignificant, some were significant, others were ridiculously high.

    In the past I had tried varying levels of MAXDOP, but impact wasn't discernible. I decided to change MAXDOP again and I set it to 2, based on this from MS (and the fact that hyper-threading is enabled on the server): http://support.microsoft.com/kb/329204

    I don't see any improvement in performance, though.

  • This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:

    1. Is this being run by a .NET application?

    2. What types of arguments does the stored proc take?

    3. How close are you to being at 80% or more memory utilization on the SQL box?

  • Brian.cs (2/27/2012)


    This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:

    1. Is this being run by a .NET application?

    2. What types of arguments does the stored proc take?

    3. How close are you to being at 80% or more memory utilization on the SQL box?

    1. Often, but not always.

    2. The problem is happening with many SP's. The one I've referenced takes a single input param--an INT.

    3. The physical server currently shows roughly 57% memory utilization.

  • Since the reads / writes and Cpu are inline , i dont think this is a parameter sniffing or recompile issue.

    Could be blocking ??

    Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.

    Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.



    Clear Sky SQL
    My Blog[/url]

  • I suspect this is blocking or latch wait-related. CXPacket would knock the CPU up as well as the duration.

    Personally I'd leave MAXDOP alone without some significant analysis first.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Dave Ballantyne (2/28/2012)


    Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.

    I considered this too. I didn't see many occurences of ASYNC_NETWORK_IO when I repeatedly queried sys.dm_os_waiting_tasks. If I did, the wait duration was so low I disregarded it.

    Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.

    Correct, server side traces only.

  • The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspx

    That was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.

  • Brian.cs (2/28/2012)


    The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspx

    That was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.

    Hmmm. Very interesting. I know that the performance issues are prevalent in the apps (as opposed to executing an offending SP in an SSMS window). I also know that the apps do use MSDTC, which requires a specific configuration on the server. From your experience, maybe you can help me with a couple of questions related to the article on blogs.msdn.com:

    1. If our apps aren't using ideal settings for transaction isolation level and/or transaction timeout, shouldn't I be noticing frequent timeouts and/or deadlocks? That's not happening on our system.

    2. Does the TransactionScope issue apply to all versions of the .Net Framework? (Our apps are compiled for v 2.0)

    I've never run a trace to check a connection for its trx isolation level. It should be easy to figure out, though. I'll give it a try. Thx for the tip! 🙂

    UPDATE: According to sys.dm_exec_requests, the transaction_isolation_level is 2 (ReadCommitted). The scenario from the article does not appear to apply to our situation.

  • One final update (for now)...

    I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.

    Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁

  • Dave Mason (2/28/2012)


    One final update (for now)...

    I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.

    Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁

    With that thought in mind and taking a shot in the dark, what are the initial size and growth settings of TempDB?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply