Erratic SP Performance

Question

Erratic SP Performance

Dave Mason

SSCrazy

Points: 2751
More actions
February 27, 2012 at 9:04 am

#257493

One of our Sql Servers is experiencing sluggish performance. I've been monitoring performance primarily by running sql traces. It seems the slowdowns are occurring during peak usage during business hours. One of the traces I run monitors for stored procs and other sql batches that take 100 milliseconds or more to run. (Most of the interaction on the server is via stored procs.)
The numbers below are from an SP that takes a single input parameter and joins three tables. I chose it because it was one of the "simpler" SP's and is executed frequently throughout the day. Since it showed up often in my 100+ ms trace, I started another trace, filtering on just the one SP by name and nothing else.
I'm wondering what's going on within the server when it takes 388 ms to run, versus the other times when it runs in 2 ms or less. Also, sometimes CPU cycles are zero, while other times it is in the teens.
Duration
SPID milliseconds Reads Writes CPU
71 2 110 0 0
71 2 110 0 16
65 2 110 0 15
71 3 110 0 16
71 1 107 0 0
71 1 107 0 0
71 1 107 0 0
71 388 107 0 0
71 2 107 0 0
58 388 110 0 0
58 0 113 0 0
58 0 113 0 0
51 0 107 0 0
72 0 110 0 0
72 0 113 0 0
Dave MasonSeminole County, FL

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

You might be running into either blocking/locking issues, or into parallelism issues.

The easiest way to narrow down what to target is to track wait states on the server. You can do that through DMVs (search "sql server wait states" in Bing/Google), or you can get a third party tool that will help narrow it down for you. I use Confio Ignite for that, but Foglight and a number of other products work equally well.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Dave Mason SSCrazy Points: 2751 More actions · Answer 2

I've been looking at sys.dm_os_wait_stats and cross-referencing http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx. I see quite a few actions that include "You should be able to safely ignore".

Here are the wait stats from the server. Is there an easy/better way to post tabular data in these forums?

wait_typewaiting_tasks_countwait_time_msmax_wait_time_mssignal_wait_time_msAvgWait

FT_IFTS_SCHEDULER_IDLE_WAIT32442627063626003891617980982

CLR_MANUAL_EVENT119661781846133051788514890

LAZYWRITER_SLEEP137316175445643174757974092731277

CLR_AUTO_EVENT3751740223788503242710464059

ONDEMAND_TASK_QUEUE7158888628150295552022698375

FT_IFTSHC_MUTEX8157860912148950452119732614

SLEEP_TASK684712713140622899198649180219

DISPATCHER_QUEUE_SEMAPHORE637210713301434798331306716813

CHECKPOINT_QUEUE5851038195606512521048177469

LOGMGR_QUEUE16905890129155394088993965265

XE_TIMER_EVENT296188951075304428895107530040

SQLTRACE_INCREMENTAL_FLUSH_SLEEP219078863019012783364045

REQUEST_FOR_DEADLOCK_SEARCH17528884256705563884256705044

BROKER_TO_FLUSH403914416108621131285691093

BROKER_EVENTHANDLER4390293463536904919757336

CXPACKET53927342498505870851323248534

OLEDB625309428626915680150

TRACEWRITE72229269526253538691283

PAGEIOLATCH_SH92847750226701185218535

LATCH_EX489693405007311634856348

BROKER_TASK_STOP7223670249102652785083

SOS_SCHEDULER_YIELD12205512455068121011552172

BROKER_RECEIVE_WAITFOR211988405988390599420

ASYNC_NETWORK_IO10415268161896334125

IO_COMPLETION360581930705725245

WRITELOG2064416502557246077

BACKUPIO2575906350491924580

SLEEP_BPOOL_FLUSH6188293694177334

BACKUPTHREAD316727184598518

LATCH_SH4499255465321675

LCK_M_SCH_M1322242539701710

PREEMPTIVE_OS_FILEOPS184920956533011

PAGEIOLATCH_EX219020799193999

BACKUPBUFFER5653170702912713

LCK_M_X2131601154606580

PREEMPTIVE_OS_AUTHENTICATIONOPS107041289756101

MSQL_DQ4120191021803004

ASYNC_IO_COMPLETION921042222892113

MSQL_XP1254628339605

PAGELATCH_UP989962445372550

PREEMPTIVE_OS_GETPROCADDRESS1254596839604

PREEMPTIVE_OS_GETFILEATTRIBUTES1611567639703

SQLTRACE_FILE_READ_IO_COMPLETION1724406397025

PREEMPTIVE_OS_DECRYPTMESSAGE2060382238501

PAGELATCH_EX7051032854215290

LOGBUFFER1003100884031

PREEMPTIVE_OS_LOOKUPACCOUNTSID1369256338701

PAGEIOLATCH_UP834243660112

LCK_M_IX412416367258

PREEMPTIVE_OS_DELETESECURITYCONTEXT2132206453800

PREEMPTIVE_OS_CREATEFILE1147204139001

PREEMPTIVE_OS_CRYPTACQUIRECONTEXT1928170938900

PAGELATCH_SH1051713133594900

SQLTRACE_FILE_BUFFER161291406280

THREADPOOL411205386029

PREEMPTIVE_OS_QUERYREGISTRY14661193300

CMEMTHREAD1334010543875840

PREEMPTIVE_OS_DTCOPS11043104301043

PREEMPTIVE_OS_WAITFORSINGLEOBJECT142390130000

PREEMPTIVE_OS_SETFILEVALIDDATA35784251702

DTC78043922114

EXECSYNC21467802971440

PREEMPTIVE_OS_GETVOLUMEPATHNAME64473429001

PREEMPTIVE_OS_CRYPTIMPORTKEY170649836600

PREEMPTIVE_OS_REPORTEVENT21347738902

PREEMPTIVE_OS_GENERICOPS4841839508

PREEMPTIVE_OS_AUTHORIZATIONOPS2412277900

SQLTRACE_FILE_WRITE_IO_COMPLETION10681433010

CXROWSET_SYNC23341322380

PREEMPTIVE_OS_DISCONNECTNAMEDPIPE1238119800

IMPPROV_IOWAIT261152304

LCK_M_SCH_S17777077

SOS_RESERVEDMEMBLOCKLIST245684330

PREEMPTIVE_OS_REVERTTOSELF166345500

QUERY_EXECUTION_INDEX_SORT_EVENT_OPEN2837361

PREEMPTIVE_OS_GETDISKFREESPACE35731300

CLR_CRST8111160

PREEMPTIVE_COM_QUERYINTERFACE2101005

PREEMPTIVE_DTC_ENLIST78201

FT_IFTS_RWLOCK1016130

PREEMPTIVE_TRANSIMPORT75100

SNI_CRITICAL_SECTION555110

RESOURCE_SEMAPHORE_MUTEX183110

SOS_DISPATCHER_MUTEX162120

PREEMPTIVE_DTC_BEGINTRANSACTION72100

SOS_LOCALALLOCATORLIST22111

PREEMPTIVE_CREATEPARAM12202

PREEMPTIVE_OS_MOVEFILE62100

PREEMPTIVE_OLEDBOPS92100

WRITE_COMPLETION61100

PREEMPTIVE_DTC_COMMITREQUESTDONE71100

PREEMPTIVE_DTC_PREPAREREQUESTDONE71100

PREEMPTIVE_OS_FLUSHFILEBUFFERS11101

TRANSACTION_MUTEX61100

Dave MasonSeminole County, FL

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.

See if you can track, when that query runs slower than usual, what wait type that query is experiencing.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eric M Russell SSC Guru Points: 125561 More actions · Answer 4

Have you traced the SP:Recompile event? If so, then is there a correlation between an increase in recompiles to some activity in the database, like following the reload of tables?

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Dave Mason SSCrazy Points: 2751 More actions · Answer 5

GilaMonster (2/27/2012)
Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.
See if you can track, when that query runs slower than usual, what wait type that query is experiencing.

Thanks for the advice, Gail. I had noticed CXPACKET waits previously when I was looking at the output of dm_os_wait_stats. But when I did the math, the average wait was really low.

I discovered some occasional CXPACKET waits when I looked at sys.dm_os_waiting_tasks. The wait durations varied. some were insignificant, some were significant, others were ridiculously high.

In the past I had tried varying levels of MAXDOP, but impact wasn't discernible. I decided to change MAXDOP again and I set it to 2, based on this from MS (and the fact that hyper-threading is enabled on the server): http://support.microsoft.com/kb/329204

I don't see any improvement in performance, though.

Dave MasonSeminole County, FL

Brian.cs Default port Points: 1407 More actions · Answer 6

This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:

1. Is this being run by a .NET application?

2. What types of arguments does the stored proc take?

3. How close are you to being at 80% or more memory utilization on the SQL box?

Dave Mason SSCrazy Points: 2751 More actions · Answer 7

Brian.cs (2/27/2012)
This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:
1. Is this being run by a .NET application?
2. What types of arguments does the stored proc take?
3. How close are you to being at 80% or more memory utilization on the SQL box?

1. Often, but not always.

2. The problem is happening with many SP's. The one I've referenced takes a single input param--an INT.

3. The physical server currently shows roughly 57% memory utilization.

Dave MasonSeminole County, FL

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 8

Since the reads / writes and Cpu are inline , i dont think this is a parameter sniffing or recompile issue.

Could be blocking ??

Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.

Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.

Clear Sky SQL
My Blog[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

I suspect this is blocking or latch wait-related. CXPacket would knock the CPU up as well as the duration.

Personally I'd leave MAXDOP alone without some significant analysis first.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Dave Mason SSCrazy Points: 2751 More actions · Answer 10

Dave Ballantyne (2/28/2012)
Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.

I considered this too. I didn't see many occurences of ASYNC_NETWORK_IO when I repeatedly queried sys.dm_os_waiting_tasks. If I did, the wait duration was so low I disregarded it.

Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.

Correct, server side traces only.

Dave MasonSeminole County, FL

Brian.cs Default port Points: 1407 More actions · Answer 11

The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspx

That was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.

Dave Mason SSCrazy Points: 2751 More actions · Answer 12

Brian.cs (2/28/2012)
The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspx
That was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.

Hmmm. Very interesting. I know that the performance issues are prevalent in the apps (as opposed to executing an offending SP in an SSMS window). I also know that the apps do use MSDTC, which requires a specific configuration on the server. From your experience, maybe you can help me with a couple of questions related to the article on blogs.msdn.com:

1. If our apps aren't using ideal settings for transaction isolation level and/or transaction timeout, shouldn't I be noticing frequent timeouts and/or deadlocks? That's not happening on our system.

2. Does the TransactionScope issue apply to all versions of the .Net Framework? (Our apps are compiled for v 2.0)

I've never run a trace to check a connection for its trx isolation level. It should be easy to figure out, though. I'll give it a try. Thx for the tip! 🙂

UPDATE: According to sys.dm_exec_requests, the transaction_isolation_level is 2 (ReadCommitted). The scenario from the article does not appear to apply to our situation.

Dave MasonSeminole County, FL

Dave Mason SSCrazy Points: 2751 More actions · Answer 13

One final update (for now)...

I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.

Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁

Dave MasonSeminole County, FL

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 14

Dave Mason (2/28/2012)
One final update (for now)...
I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.
Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁

With that thought in mind and taking a shot in the dark, what are the initial size and growth settings of TempDB?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)