February 6, 2013 at 8:51 am
Hello
I have an interesting (at least to me) case of a stored procedure that is intermittently running long. These are the step I have taken to troubleshoot and am looking for some guidance on where else I might look.
The behavior of the procedure is as follows. When the procedure has not run for a long time (several hours 8+), it takes up to 30+ seconds.
Subsequent runs of the proc after that time run in < .050 seconds. The only inserts being done are into table variables with pretty small (<100 rows). No updates/deletes.
It returns only one row of XML type data.
Sql version 10.50.2500.0
I have reviewed the Execution plan. Nothing interesting there. No indication of missing indexes. No scans of any sort. All lookups are seeks on Indexes or Custered Indexes.
I have rebuilt all the indexes on the tables that are touched by the stored procedure.
I have updated the statistics on the affected tables as well.
Auto update statistics is turned on.
The stored procedure does access the temp db through table variables but I do have 25 MDF files for the tempDB. However, other similar databases on this server do not exhibit the same behavior with this stored procedure.
I have checked the perfmon stats on the server
CPU remains pretty constant throughout the day between 30% and 40%.
Disk queue runs between 0 and 1 all day long with no significant spikes.
Network utilization is pretty constant at 20% through out the day.
The Latch wait time shows to be around 1ms during the time where the procedure ran long.
The Buffer Page life expectancy is a little low averaging around 15min, but other similar databases on this server do not exhibit the same behavior with this stored procedure.
MAX DOP is set at 4 for the server.
Disk space has plenty of free space.
The machine itself is a powerhouse.
Processors: 4
Total Mem: 68,705,669,120 ( 68.71 GB )
Processor Designation: CPU1 Intel(R) Xeon(R) CPU E7540 @ 2.00GHz
Power Plan: Balanced
Core: 6 Processor: 12 Calculated: 48
The disk drives are Equilogic Sumo.
Any suggestions.
Thanks in advance
Steve
February 6, 2013 at 10:52 am
February 7, 2013 at 7:42 am
Recompilation would not take execution time from .05s to 30s unless there are some hellaciously complex queries in the proc. Can you post the code? sanitized to protect the guilty of course.
Can you also post an Actual Execution plan?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 7, 2013 at 7:55 am
Try using SET STATISTICS IO ON next time you run it for the first time in a few hours. Save the results, and then run it again and compare the second results with the original. If your physical reads have decreased, you are seeing pages being flushed out of the cache when they haven't been used for a few hours. You may be reading too many pages in order to produce the query results - perhaps your index fill factors are too low, maybe your clustering keys are too wide, or you could just be missing the index that would make the query run more efficiently.
John
February 7, 2013 at 8:46 am
In early testing for the procedure I did find an index that was missing. That did not fix the issue completely, it only made it less recurrent. As far as width of the clustering keys, they are only single column int values. Fill factors are 90 by default, so I would not anticipate that being an issue, but I will review that.
I'll check the IO setting you recommended, however... if I am "reading too many pages" I am not entirely certain how I would correct that. I am familiar with some queries that read thousands of rows and basically throw away all that work to produce only a small subset of what was touched, however that issue would not produce the wildly variant nature of the execution times. I would anticipate that the execution times would be consistently high.
I should also mention that no data is ever removed from the tables and the index keys, once inserted, never change. They are not high volume insert/update tables, probably only taking less than 1K insert/update statements a day.
Steve
February 7, 2013 at 8:59 am
Steve
I understand. You may still have a missing index, I suppose. In any case, do the SET STATISTICS IO ON thing as I suggested - you'll then be able to rule out or in whether it's a problem of reading from disk vs reading from cache.
John
February 7, 2013 at 12:12 pm
They physical reads did fluctuate from 0 on a fast run to 148 on the slow run. While I would not think that would produce such a large change, it certainly is worth investigating.
Thx
Steve
February 13, 2013 at 9:47 pm
Thanks for all the comments.
I moved the database to one of our less utilized servers and the problem disappeared. The procedure now runs between 16 and 200ms all the time. Based on the fact that the Buffer Page Life Expectancy was extremely low and that this really only occurred during higher volume times, I am going to chalk this up to a resource issue based on the physical read information I got from the IO diagnostic. Although this feels like BPLE was probably only a contributor.
Steve
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply