February 27, 2012 at 9:04 am
One of our Sql Servers is experiencing sluggish performance. I've been monitoring performance primarily by running sql traces. It seems the slowdowns are occurring during peak usage during business hours. One of the traces I run monitors for stored procs and other sql batches that take 100 milliseconds or more to run. (Most of the interaction on the server is via stored procs.)
The numbers below are from an SP that takes a single input parameter and joins three tables. I chose it because it was one of the "simpler" SP's and is executed frequently throughout the day. Since it showed up often in my 100+ ms trace, I started another trace, filtering on just the one SP by name and nothing else.
I'm wondering what's going on within the server when it takes 388 ms to run, versus the other times when it runs in 2 ms or less. Also, sometimes CPU cycles are zero, while other times it is in the teens.
Duration
SPID milliseconds Reads Writes CPU
71 2 110 0 0
71 2 110 0 16
65 2 110 0 15
71 3 110 0 16
71 1 107 0 0
71 1 107 0 0
71 1 107 0 0
71 388 107 0 0
71 2 107 0 0
58 388 110 0 0
58 0 113 0 0
58 0 113 0 0
51 0 107 0 0
72 0 110 0 0
72 0 113 0 0
February 27, 2012 at 9:15 am
You might be running into either blocking/locking issues, or into parallelism issues.
The easiest way to narrow down what to target is to track wait states on the server. You can do that through DMVs (search "sql server wait states" in Bing/Google), or you can get a third party tool that will help narrow it down for you. I use Confio Ignite for that, but Foglight and a number of other products work equally well.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 27, 2012 at 9:46 am
I've been looking at sys.dm_os_wait_stats and cross-referencing http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx. I see quite a few actions that include "You should be able to safely ignore".
Here are the wait stats from the server. Is there an easy/better way to post tabular data in these forums?
wait_typewaiting_tasks_countwait_time_msmax_wait_time_mssignal_wait_time_msAvgWait
FT_IFTS_SCHEDULER_IDLE_WAIT32442627063626003891617980982
CLR_MANUAL_EVENT119661781846133051788514890
LAZYWRITER_SLEEP137316175445643174757974092731277
CLR_AUTO_EVENT3751740223788503242710464059
ONDEMAND_TASK_QUEUE7158888628150295552022698375
FT_IFTSHC_MUTEX8157860912148950452119732614
SLEEP_TASK684712713140622899198649180219
DISPATCHER_QUEUE_SEMAPHORE637210713301434798331306716813
CHECKPOINT_QUEUE5851038195606512521048177469
LOGMGR_QUEUE16905890129155394088993965265
XE_TIMER_EVENT296188951075304428895107530040
SQLTRACE_INCREMENTAL_FLUSH_SLEEP219078863019012783364045
REQUEST_FOR_DEADLOCK_SEARCH17528884256705563884256705044
BROKER_TO_FLUSH403914416108621131285691093
BROKER_EVENTHANDLER4390293463536904919757336
CXPACKET53927342498505870851323248534
OLEDB625309428626915680150
TRACEWRITE72229269526253538691283
PAGEIOLATCH_SH92847750226701185218535
LATCH_EX489693405007311634856348
BROKER_TASK_STOP7223670249102652785083
SOS_SCHEDULER_YIELD12205512455068121011552172
BROKER_RECEIVE_WAITFOR211988405988390599420
ASYNC_NETWORK_IO10415268161896334125
IO_COMPLETION360581930705725245
WRITELOG2064416502557246077
BACKUPIO2575906350491924580
SLEEP_BPOOL_FLUSH6188293694177334
BACKUPTHREAD316727184598518
LATCH_SH4499255465321675
LCK_M_SCH_M1322242539701710
PREEMPTIVE_OS_FILEOPS184920956533011
PAGEIOLATCH_EX219020799193999
BACKUPBUFFER5653170702912713
LCK_M_X2131601154606580
PREEMPTIVE_OS_AUTHENTICATIONOPS107041289756101
MSQL_DQ4120191021803004
ASYNC_IO_COMPLETION921042222892113
MSQL_XP1254628339605
PAGELATCH_UP989962445372550
PREEMPTIVE_OS_GETPROCADDRESS1254596839604
PREEMPTIVE_OS_GETFILEATTRIBUTES1611567639703
SQLTRACE_FILE_READ_IO_COMPLETION1724406397025
PREEMPTIVE_OS_DECRYPTMESSAGE2060382238501
PAGELATCH_EX7051032854215290
LOGBUFFER1003100884031
PREEMPTIVE_OS_LOOKUPACCOUNTSID1369256338701
PAGEIOLATCH_UP834243660112
LCK_M_IX412416367258
PREEMPTIVE_OS_DELETESECURITYCONTEXT2132206453800
PREEMPTIVE_OS_CREATEFILE1147204139001
PREEMPTIVE_OS_CRYPTACQUIRECONTEXT1928170938900
PAGELATCH_SH1051713133594900
SQLTRACE_FILE_BUFFER161291406280
THREADPOOL411205386029
PREEMPTIVE_OS_QUERYREGISTRY14661193300
CMEMTHREAD1334010543875840
PREEMPTIVE_OS_DTCOPS11043104301043
PREEMPTIVE_OS_WAITFORSINGLEOBJECT142390130000
PREEMPTIVE_OS_SETFILEVALIDDATA35784251702
DTC78043922114
EXECSYNC21467802971440
PREEMPTIVE_OS_GETVOLUMEPATHNAME64473429001
PREEMPTIVE_OS_CRYPTIMPORTKEY170649836600
PREEMPTIVE_OS_REPORTEVENT21347738902
PREEMPTIVE_OS_GENERICOPS4841839508
PREEMPTIVE_OS_AUTHORIZATIONOPS2412277900
SQLTRACE_FILE_WRITE_IO_COMPLETION10681433010
CXROWSET_SYNC23341322380
PREEMPTIVE_OS_DISCONNECTNAMEDPIPE1238119800
IMPPROV_IOWAIT261152304
LCK_M_SCH_S17777077
SOS_RESERVEDMEMBLOCKLIST245684330
PREEMPTIVE_OS_REVERTTOSELF166345500
QUERY_EXECUTION_INDEX_SORT_EVENT_OPEN2837361
PREEMPTIVE_OS_GETDISKFREESPACE35731300
CLR_CRST8111160
PREEMPTIVE_COM_QUERYINTERFACE2101005
PREEMPTIVE_DTC_ENLIST78201
FT_IFTS_RWLOCK1016130
PREEMPTIVE_TRANSIMPORT75100
SNI_CRITICAL_SECTION555110
RESOURCE_SEMAPHORE_MUTEX183110
SOS_DISPATCHER_MUTEX162120
PREEMPTIVE_DTC_BEGINTRANSACTION72100
SOS_LOCALALLOCATORLIST22111
PREEMPTIVE_CREATEPARAM12202
PREEMPTIVE_OS_MOVEFILE62100
PREEMPTIVE_OLEDBOPS92100
WRITE_COMPLETION61100
PREEMPTIVE_DTC_COMMITREQUESTDONE71100
PREEMPTIVE_DTC_PREPAREREQUESTDONE71100
PREEMPTIVE_OS_FLUSHFILEBUFFERS11101
TRANSACTION_MUTEX61100
February 27, 2012 at 10:32 am
Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.
See if you can track, when that query runs slower than usual, what wait type that query is experiencing.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 27, 2012 at 12:30 pm
Have you traced the SP:Recompile event? If so, then is there a correlation between an increase in recompiles to some activity in the database, like following the reload of tables?
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
February 27, 2012 at 5:13 pm
GilaMonster (2/27/2012)
Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.See if you can track, when that query runs slower than usual, what wait type that query is experiencing.
Thanks for the advice, Gail. I had noticed CXPACKET waits previously when I was looking at the output of dm_os_wait_stats. But when I did the math, the average wait was really low.
I discovered some occasional CXPACKET waits when I looked at sys.dm_os_waiting_tasks. The wait durations varied. some were insignificant, some were significant, others were ridiculously high.
In the past I had tried varying levels of MAXDOP, but impact wasn't discernible. I decided to change MAXDOP again and I set it to 2, based on this from MS (and the fact that hyper-threading is enabled on the server): http://support.microsoft.com/kb/329204
I don't see any improvement in performance, though.
February 27, 2012 at 6:52 pm
This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:
1. Is this being run by a .NET application?
2. What types of arguments does the stored proc take?
3. How close are you to being at 80% or more memory utilization on the SQL box?
February 27, 2012 at 8:25 pm
Brian.cs (2/27/2012)
This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:1. Is this being run by a .NET application?
2. What types of arguments does the stored proc take?
3. How close are you to being at 80% or more memory utilization on the SQL box?
1. Often, but not always.
2. The problem is happening with many SP's. The one I've referenced takes a single input param--an INT.
3. The physical server currently shows roughly 57% memory utilization.
February 28, 2012 at 12:14 am
Since the reads / writes and Cpu are inline , i dont think this is a parameter sniffing or recompile issue.
Could be blocking ??
Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.
Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.
February 28, 2012 at 2:29 am
I suspect this is blocking or latch wait-related. CXPacket would knock the CPU up as well as the duration.
Personally I'd leave MAXDOP alone without some significant analysis first.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 28, 2012 at 6:56 am
Dave Ballantyne (2/28/2012)
Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.
I considered this too. I didn't see many occurences of ASYNC_NETWORK_IO when I repeatedly queried sys.dm_os_waiting_tasks. If I did, the wait duration was so low I disregarded it.
Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.
Correct, server side traces only.
February 28, 2012 at 9:22 am
The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspx
That was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.
February 28, 2012 at 11:30 am
Brian.cs (2/28/2012)
The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspxThat was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.
Hmmm. Very interesting. I know that the performance issues are prevalent in the apps (as opposed to executing an offending SP in an SSMS window). I also know that the apps do use MSDTC, which requires a specific configuration on the server. From your experience, maybe you can help me with a couple of questions related to the article on blogs.msdn.com:
1. If our apps aren't using ideal settings for transaction isolation level and/or transaction timeout, shouldn't I be noticing frequent timeouts and/or deadlocks? That's not happening on our system.
2. Does the TransactionScope issue apply to all versions of the .Net Framework? (Our apps are compiled for v 2.0)
I've never run a trace to check a connection for its trx isolation level. It should be easy to figure out, though. I'll give it a try. Thx for the tip! 🙂
UPDATE: According to sys.dm_exec_requests, the transaction_isolation_level is 2 (ReadCommitted). The scenario from the article does not appear to apply to our situation.
February 28, 2012 at 2:35 pm
One final update (for now)...
I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.
Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁
February 28, 2012 at 10:58 pm
Dave Mason (2/28/2012)
One final update (for now)...I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.
Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁
With that thought in mind and taking a shot in the dark, what are the initial size and growth settings of TempDB?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply