December 4, 2017 at 4:56 am
DB version: Microsoft SQL Server 2008 (SP1) - 10.0.2573.0 (X64)
RCSI enabled
Hi,
last week one of our production job was hanging during several hours. I killed it at some point but I was able to monitor the situation in detail in the meantime.
One specific table refresh query (TRUNCATE / INSERT INTO SELECT) which usually takes 20 seconds was in "runnable" state and was doing nothing during several hours.
I thought it was due to some contention so I looked everywhere but it appeared there was no lock, no wait, no cpu pressure, no tempdb issues.
When I saw that there were several exec plans for this sql, I assumed that something was up and indeed after running FREEPROCCACHE and I restarted the job, it went back to normal speed, and with a single exec query plan.
The refresh query is old and stable for a few years. Stats are updated every week. Query is raw and does not involve procedures or parameters.
Could you help me to understand what happened ? I'm usually for letting SQL Server decides by itself and against query hints but I don't know how to prevent this in the future.
I understand that strong table data size variations can influence such behavior. But source table volume is stable but there is indeed weird information in the execution plan I captured :
"Hanging" Plan properties:
Cached plan size: 56 KB
CompileCPU: 7
Estimated Number of Rows: 1
Estimated Operator Cost: 0%
Estimated Subtree Cost: 0,031225
Optimization Level: FULL
Reason for Early Termination of Statement Optimization: Good Enough Plan Found
"Speedy" Plan properties
Cached plan size: 136 KB
CompileCPU: 16
Estimated Number of Rows: 2407180
Estimated Operator Cost: 0%
Estimated Subtree Cost: 245,896
Optimization Level: FULL
> There is also a lot of Parallelism (Gather Streams) tasks in this plan
The source tables for the INSERT INTO SELECT are indeed large but are never empty. Why was 1 row count estimated and what is this "Early Termination of Statement Optimization" ?
Thanks for your help and let me know if you need any additional info.
December 15, 2017 at 9:53 am
Was it in a "RUNNABLE" or "RUNNING" state or both. The estimated rows does look odd. It's a little hard to say for certain without looking at the plan what might have happened. I would have guessed parameter sniffing, stale stats, of something along those lines, but you said there's no parameters and you update stats regularly.
You do have one option to make sure the speedy plan always gets taken. You can use sp_create_plan_guide_from_handle to pin the plan in the cache. You'll need to get the plan_handle during the next run. The plan will have to be in the cache when you do this. All this does under the covers is create an XML plan guide.
December 15, 2017 at 10:07 am
Even though stats are updated "every week", it could still be a stats problem. It could also be that the data finally reached a "tipping point" and caused code that doesn't necessarily scale well (despite previous "good performance") to choose a less than optimal plan.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 15, 2017 at 2:41 pm
sqlgrease - Friday, December 15, 2017 9:53 AMWas it in a "RUNNABLE" or "RUNNING" state or both. The estimated rows does look odd. It's a little hard to say for certain without looking at the plan what might have happened. I would have guessed parameter sniffing, stale stats, of something along those lines, but you said there's no parameters and you update stats regularly.You do have one option to make sure the speedy plan always gets taken. You can use sp_create_plan_guide_from_handle to pin the plan in the cache. You'll need to get the plan_handle during the next run. The plan will have to be in the cache when you do this. All this does under the covers is create an XML plan guide.
thanks for your response. To be precise, statuses were a bit confusing: query status was "running" in sys.sysprocesses and "runnable" in sys.dm_exec_requests.
Thanks for your tip about sp_create_plan_guide_from_handle. I will look into it.
December 15, 2017 at 3:25 pm
Jeff Moden - Friday, December 15, 2017 10:07 AMEven though stats are updated "every week", it could still be a stats problem. It could also be that the data finally reached a "tipping point" and caused code that doesn't necessarily scale well (despite previous "good performance") to choose a less than optimal plan.
Hi, I agree, this is the most probable scenario. But I'm confused about the plan properties (Estimated Number of Rows: 1 / Reason for Early Termination of Statement Optimization: Good Enough Plan Found) as being symptoms or cause...
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply