June 26, 2012 at 12:50 pm
Environment: SQL Server 2005, SP4 on a 2 node, active/passive cluster
DB Size: 335 GB and growing. The DB is highly transactional so we rebuild/reorganize the indices on a weekly basis and have recently UPDATED STATISTICS with FULLSCAN.
We have a stored procedure which SELECTs the top 50 rows from 6 large tables joined together. This SP is called several times per second and returns results in under 1 second.
THE PROBLEM: randomly and sporadically the SP takes 4 - 5 minutes to return results, which causes timeouts and issues in the application. The quickest way that we've found to resolve the issue is to execute SP_RECOMPILE on the SP. However, if we don't catch the issue in time it usually corrects itself after a while. This would indicate to me that the optimizer is choosing a less than optimal execution plan for the SP.
My question is why is this happening? What would cause the optimizer to stop using the cached plan or for the cached plan to suddenly become unoptimal?
June 26, 2012 at 1:12 pm
It sounds like it might be bad parameter sniffing. When the query is running correctly, get a copy of the plan out of cache & store it on the side as a .sqlplan file. Then, when it goes bad, capture another copy of the plan out of cache & compare the two. If they're different, take a look at the compile time values in the parameters (visible in the SELECT operator properties) that will tell you about the data that was used to create the plan.
The optimizer is dependent on statistics. Most tables have data fairly evenly distributed. But sometimes data becomes skewed, a few values with lots of rows, or a few values with few rows, different than the rest of the data in the statistics. Then, when this skewed data is used as a parameter value, the specificity of it allows the optimizer to "sniff" the values of the parameter and come up with an optimal plan. But an optimal plan for the skewed data doesn't work for the majority of the data in the table. That's why you can get bad parameter sniffing and poor performance.
Plans age out of cache as data changes. For tables greater than 500 rows, when 20% of the data changes, that can cause statistics to get updated (unless you're running 2008R2 sp1 or greater, in which case you can set traceflag 2371 so that stats update dynamically). When they get updated, you get a recompile on all queries that use them. Also, index rebuilds create updated statistics and query recompiles.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply