July 4, 2013 at 2:37 pm
Hi all,
I have a query that takes 2 seconds to compile and run (first time round) when the server is restarted. When the server has been up and running for a day or two, I will try and run the query and all of a sudden it will take 10 minutes. Nearly all of this time is spent compiling the query, and the execution plan has changed all of a sudden and I don't understand why. This system is not yet live so is not subject to change in data, the databases are refreshed from live every morning at 7am. This is eventually going to be the live system and was supposed to be improved performance for the end users, but at the moment we are stuck on this long running query issue.
There are NO wait times while the query is running, no significant cost in the plan, it is all compile time. The server is a virtual server with 8 cores and 12gb of memory, sql capped at 10gb it has SAN disk. there is still available memory for the OS (300mb according to perfmon). Thought this might be external memory pressure and was hoping to see available mb at 0 at the time when the performance deteriorated but this was not the case. I am out of ideas and the deadline is approaching, any thoughts will be appreciated.
Find the fast and slow query plans attached.
It's a right old conundrum
Thanks
Rich
July 4, 2013 at 5:26 pm
richard.austin (7/4/2013)
Hi all,I have a query that takes 2 seconds to compile and run (first time round) when the server is restarted. When the server has been up and running for a day or two, I will try and run the query and all of a sudden it will take 10 minutes. Nearly all of this time is spent compiling the query, and the execution plan has changed all of a sudden and I don't understand why. This system is not yet live so is not subject to change in data, the databases are refreshed from live every morning at 7am. This is eventually going to be the live system and was supposed to be improved performance for the end users, but at the moment we are stuck on this long running query issue.
There are NO wait times while the query is running, no significant cost in the plan, it is all compile time. The server is a virtual server with 8 cores and 12gb of memory, sql capped at 10gb it has SAN disk. there is still available memory for the OS (300mb according to perfmon). Thought this might be external memory pressure and was hoping to see available mb at 0 at the time when the performance deteriorated but this was not the case. I am out of ideas and the deadline is approaching, any thoughts will be appreciated.
Find the fast and slow query plans attached.
It's a right old conundrum
Thanks
Rich
There are 2 things that I can think of off the top of my head that might be causing this. Parameter sniffing (which a recompile of the proc will usually fix) or data being added to indexes which only an index maintenance plan would fix (along with a rebuild of stats if only doing a reorg instead of a rebuild).
--Jeff Moden
Change is inevitable... Change for the better is not.
July 5, 2013 at 3:44 am
Thanks for your response. I am running the same query each time and not changing the parameters so don't think parameter sniffing is the problem. This query has been running fine for a year on another server which is less specced (one cpu and 2gb memory, but it is the only application using this particulalr server).
I have rebuilt the indexes but to no avail.
I have tried running the query with query hints on the join (hash, loop and merge hints) and the query returns in a couple of seconds. The optimiser is having a hard time compiling the query, but this only started happening yesterday when the query plan changed.
July 5, 2013 at 4:50 am
The optimiser was timing out because of a logic inconsistency in your WHERE clause. Try this:
SELECT c.Title
, c.Surname
, c.[Known As]
, c.Gender
, c.[Start Date]
, c.[Leaving Date]
, c.[Post Name]
, c.Category
, c.[Key Unit Name 1]
, c.[Cost To]
, c.FTE
, lcg.[Grade Name]
, c.[Payroll Name]
FROM [SelectHR].[Employee].[Current and Leaver Positions with Organisation As At Evaluation Date] c
INNER JOIN [SelectHR].[Person].[Laser Current Grade] lcg
ON ((c.[Person Number] = lcg.[Person Number])
AND (c.[Appointment Number] = lcg.[Appointment Number]))
AND (c.[Career Number] = lcg.[Career Number])
WHERE (c.[Start Date] < {ts '2013-06-30 12:55:40'}
AND c.[Leaving Date] IS NULL
AND c.[Payroll Name] = N'CCF')
OR
(c.[Start Date] < {ts '2013-06-30 12:55:40'}
AND c.[Leaving Date] >= {ts '2013-06-01 12:55:36'}
AND c.[Payroll Name] = N'CCF')
Compare the WHERE clause to yours.
Compare the results to yours - they are likely to be very different (the rowcount will be much less).
Post the actual execution plan of the new query if and when you have validated the results as correct.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 5, 2013 at 7:40 am
Thank you, but i am running your revised query and it too is taking a long time. It is worth mentioning that I have since added a join hint to the query (tried all three hash, merge and loop) and the query returned in seconds. There is something that is causing the sql optimiser to take a long time to compile an inefficient plan.
This is just one example of a query ran by a third party application, so I cannot change the actual code that they run.
July 5, 2013 at 8:00 am
richard.austin (7/5/2013)
Thank you, but i am running your revised query and it too is taking a long time. It is worth mentioning that I have since added a join hint to the query (tried all three hash, merge and loop) and the query returned in seconds. There is something that is causing the sql optimiser to take a long time to compile an inefficient plan.This is just one example of a query ran by a third party application, so I cannot change the actual code that they run.
Can you post the actual plan for the quick version using the join hint? That's interesting...
You should refer them to the WHERE clause of the query. Here it is again:
WHERE (c."Start Date"<{ts '2013-06-30 12:55:40'}
AND c."Leaving Date" IS NULL
AND c."Payroll Name"=N'CCF'
OR c."Start Date"<{ts '2013-06-30 12:55:40'}
AND c."Leaving Date">={ts '2013-06-01 12:55:36'}
AND c."Payroll Name"=N'CCF')
Bet you can't guess what it's supposed to mean.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 5, 2013 at 8:57 am
I should mention that the objects being referenced in the initial query are views and not tables. These views use joins too, some quite complex which explains the size of the query plan. See attached for query plan of query executed with join hint.
It is totally different.
July 8, 2013 at 1:13 am
The join hint restricts the number of possible plans which the optimiser has to evaluate to the point where it has time to pick a suitable plan, hence with the hint you're no longer getting a timeout. It's still a horrible plan - you're reading the same tables again and again. If you post up the view definitions, it might be possible to work through them to generate a cleaner query with less reads of the same tables.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 8, 2013 at 3:58 am
Both plans are timing out, but one in a second and one in 10 minutes.
<QueryPlan DegreeOfParallelism="0" MemoryGrant="5968" CachedPlanSize="1632" CompileTime="659453" CompileCPU="649780" CompileMemory="715616">
<QueryPlan DegreeOfParallelism="0" MemoryGrant="7296" CachedPlanSize="1720" CompileTime=" 1183" CompileCPU=" 1055" CompileMemory=" 52024">
It is possible the slow one is due to a request for automatic statistics maintenance. If this is the case you could try setting Auto Update Statistics Asynchronously.
July 8, 2013 at 12:03 pm
ChrisM@Work (7/8/2013)
The join hint restricts the number of possible plans which the optimiser has to evaluate to the point where it has time to pick a suitable plan, hence with the hint you're no longer getting a timeout. It's still a horrible plan - you're reading the same tables again and again. If you post up the view definitions, it might be possible to work through them to generate a cleaner query with less reads of the same tables.
Thanks Chris, I agree the plans are horrible, and some of the joins in the views are horrendous, unfortunately it is a 3rd party application and I am stuck with the queries that are being ran, also the queries work fine on another instance (which is the actual current live system). They also work fine on the slow system when the instance has been restarted.
Sean Pearce (7/8/2013)
Both plans are timing out, but one in a second and one in 10 minutes.<QueryPlan DegreeOfParallelism="0" MemoryGrant="5968" CachedPlanSize="1632" CompileTime="659453" CompileCPU="649780" CompileMemory="715616">
<QueryPlan DegreeOfParallelism="0" MemoryGrant="7296" CachedPlanSize="1720" CompileTime=" 1183" CompileCPU=" 1055" CompileMemory=" 52024">
It is possible the slow one is due to a request for automatic statistics maintenance. If this is the case you could try setting Auto Update Statistics Asynchronously.
Thanks Sean, what do you mean both plans are timing out? I do get results on both systems. Thank you for your suggestion, I tried setting auto update stats asynchronously, and I also tried turning off auto stats updates but didn't see any results immediately.
I am going to try installing the latest cumulative update for sql server 2008r2 SP2 just in case, at the very least it will rule this out. Though I can't see my specific problem listed in any of the cumulative updates.
July 8, 2013 at 2:26 pm
i see what you mean now, thanks for that what on earth could be causing the optimiser to take ten minutes and only then timeout! ridiculous. hopefully cumulative update will sort this, it can't be normal behaviour.
July 9, 2013 at 1:10 pm
Sorted! Your comments led me to the following website, I couldn't understand why the query was taking so long, and then timing out.
http://mssqlwiki.com/2012/10/07/optimizer-timeout-or-optimizer-memory-abort/
quote from the site below
"We can avoid optimizer from timing out and picking bad plan by enabling trace flag -T8780. This increases the time limit before the timeout occurs."
I decided to give this a shot today and guess what, that global trace flag was already on! I certainly didn't start it but my guess it is it is off when the service is restarted, and something is turning this trace flag on. I turned it off and hey presto, sql gives up finding the best plan quickly and runs the query in a couple of seconds.
I am currently running a trace to catch anything that turns this global trace flag on.
Thanks again, I would not have cracked it with your comments.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply