September 12, 2011 at 10:26 am
Tx, and now these 2?
SELECT* FROM sys.dm_exec_query_optimizer_info
Don't do any of the recommendations, it's just very rough guidelines for us.
SELECT
optz.counter
, optz.occurrence
, CONVERT(DECIMAL(18,2), optz.value) AS AvgValue
, conf.name
, conf.value
, conf.value_in_use
, Uptime.DaysUptime AS [Days Server UPTIME & Last Stats Reset]
, CASE WHEN Uptime.DaysUptime < 45 THEN 'You may not have very meaningful stats because of a recent restart' ELSE NULL END AS [Stats Warning]
, CASE WHEN optz.value < conf.value THEN 'Cost threshold for parallelism might be too low' ELSE NULL END AS [Cost Threshold Warning]
, CASE WHEN conf.value_in_use <> conf.value THEN 'Server needs to be restarted for the setting to take effect' ELSE NULL END [Restart Warning]
FROM
sys.dm_exec_query_optimizer_info optz
CROSS JOIN sys.configurations conf
OUTER APPLY(
SELECT
CONVERT(DECIMAL(18 , 2) , DATEDIFF(hh , create_date , GETDATE()) / 24.0) AS DaysUptime
FROM
sys.databases
WHERE
name = 'tempdb'
) Uptime
WHERE
optz.counter = 'final cost'
AND conf.name = 'cost threshold for parallelism'
GO
September 12, 2011 at 10:27 am
FYI, for me it recommends 4, but as I said, I set to 6.
September 12, 2011 at 10:29 am
Bob Fazio (9/12/2011)
http://weblogs.sqlteam.com/tarad/archive/2010/11/08/max-degree-of-parallelism-server-side-setting.aspxFYI, for me it recommends 4, but as I said, I set to 6.
Ok, as I said, not even junior at tuning that big of a server. So now that you obviously took care of that... we can move on :-).
September 12, 2011 at 10:33 am
counteroccurrencevalue
optimizations690881
elapsed time690880.00212443550254748
final cost690887.09694068392164
trivial plan517621
tasks17326623.287082996652
no plan0NULL
search 063101
search 0 time63110.00500285216289019
search 0 tasks6311900.215496751703
search 1110041
search 1 time110160.00153567538126362
search 1 tasks11016464.412490922295
search 2121
search 2 time130.000307692307692308
search 2 tasks13141.846153846154
gain stage 0 to stage 110.00356762375769612
gain stage 1 to stage 20NULL
timeout271
memory limit exceeded0NULL
insert stmt94481
delete stmt4931
update stmt17041
merge stmt0NULL
contains subquery4151
unnest failed5581
tables690881.56905685502547
hints801
order hint11
join hint0NULL
view reference123571
remote query931
maximum DOP690880.000101320055581288
maximum recursion level0NULL
indexed views loaded0NULL
indexed views matched0NULL
indexed views used0NULL
indexed views updated0NULL
dynamic cursor request0NULL
fast forward cursor request241
counteroccurrenceAvgValuenamevaluevalue_in_useDays Server UPTIME & Last Stats ResetStats WarningCost Threshold WarningRestart Warning
final cost690897.10cost threshold for parallelism112.83You may not have very meaningful stats because of a recent restartCost threshold for parallelism might be too lowNULL
Just so you know, I'm pretty green with a system this big as well. Its been running on here for a year, but we hadn't seen even a hint of an issue until about a month ago.
September 12, 2011 at 10:35 am
Bob Fazio (9/12/2011)
counteroccurrencevalue
optimizations690881
elapsed time690880.00212443550254748
final cost690887.09694068392164
trivial plan517621
tasks17326623.287082996652
no plan0NULL
search 063101
search 0 time63110.00500285216289019
search 0 tasks6311900.215496751703
search 1110041
search 1 time110160.00153567538126362
search 1 tasks11016464.412490922295
search 2121
search 2 time130.000307692307692308
search 2 tasks13141.846153846154
gain stage 0 to stage 110.00356762375769612
gain stage 1 to stage 20NULL
timeout271
memory limit exceeded0NULL
insert stmt94481
delete stmt4931
update stmt17041
merge stmt0NULL
contains subquery4151
unnest failed5581
tables690881.56905685502547
hints801
order hint11
join hint0NULL
view reference123571
remote query931
maximum DOP690880.000101320055581288
maximum recursion level0NULL
indexed views loaded0NULL
indexed views matched0NULL
indexed views used0NULL
indexed views updated0NULL
dynamic cursor request0NULL
fast forward cursor request241
counteroccurrenceAvgValuenamevaluevalue_in_useDays Server UPTIME & Last Stats ResetStats WarningCost Threshold WarningRestart Warning
final cost690897.10cost threshold for parallelism112.83You may not have very meaningful stats because of a recent restartCost threshold for parallelism might be too lowNULL
Just so you know, I'm pretty green with a system this big as well. Its been running on here for a year, but we hadn't seen even a hint of an issue until about a month ago.
Ignore those, wrong server
September 12, 2011 at 10:36 am
I wouldn't take any decision based on those stats.
The cost threshold is however at 1 with the default value being 5. This might be something I'd consider but I like to see ideally 1 full month of data. 2 days is just way to short for this.
Is this server mostly olap or oltp?
September 12, 2011 at 10:37 am
counteroccurrencevalue
optimizations197374501
elapsed time197366460.0063553188317777
final cost1973664745.9971011158534
trivial plan73155451
tasks12421103556.496919637491
no plan0NULL
search 014025791
search 0 time14140600.0132889099472444
search 0 tasks14140601250.81451847871
search 1110162501
search 1 time112608820.003472970856102
search 1 tasks11260882449.037774305778
search 222741
search 2 time62730.155596046548701
search 2 tasks627313871.7344173442
gain stage 0 to stage 1112290.147846452882183
gain stage 1 to stage 22230.00432500828511371
timeout5672551
memory limit exceeded0NULL
insert stmt100578821
delete stmt1048871
update stmt23912821
merge stmt32721
contains subquery5018291
unnest failed2621091
tables197374501.45986132960438
hints330341
order hint6851
join hint141
view reference11182211
remote query88060781
maximum DOP197374500.183442187313964
maximum recursion level0NULL
indexed views loaded0NULL
indexed views matched0NULL
indexed views used0NULL
indexed views updated0NULL
dynamic cursor request22221
fast forward cursor request69701
counteroccurrenceAvgValuenamevaluevalue_in_useDays Server UPTIME & Last Stats ResetStats WarningCost Threshold WarningRestart Warning
final cost1973666346.00cost threshold for parallelism55205.08NULLCost threshold for parallelism might be too lowNULL
September 12, 2011 at 10:40 am
Ninja's_RGR'us (9/12/2011)
I wouldn't take any decision based on those stats.The cost threshold is however at 1 with the default value being 5. This might be something I'd consider but I like to see ideally 1 full month of data. 2 days is just way to short for this.
Is this server mostly olap or oltp?
September 12, 2011 at 10:41 am
Almost exclusive OLTP.
September 12, 2011 at 10:48 am
Bob Fazio (9/12/2011)
Almost exclusive OLTP.
Unless Grant has something else in mind I'd go with Cost Treshold at 25-35. I'm going more on the low end here to get more buffer for the "olap" queries which do benefit from parallelism.
Another thing you might play with is actually giving more cpus to those olap queries. Maybe 8 instead of 4. the peak would be higher, but shorter and still leaving 40 other cores to work on the "normal stuff".
However at this point I'd really like to be looking into this =>
You could also corelate the times you have in the perform counters to those traces. At this point it's just a matter of find and tune. I really think you have more of a targeted problem with a few queries rather than a full server setup issue (especially if it ran ok for months).
September 12, 2011 at 11:03 am
This can take a guess at the avg, min & max cost of the olap queries.
this uses the cached plan so it won't be 100% accurate but for guesstimate avg it might do the trick.
This takes 1 min to process 8K plans on my prod server and it trashes the cpus while doing so. You might want to be around if you start this during the day in case users complain or run it at night (when they CAN'T complain :-D).
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
query_plan AS CompleteQueryPlan,
n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,
n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,
n.value('(@StatementOptmEarlyAbortReason)[1]', 'VARCHAR(25)') AS StatementOptmEarlyAbortReason,
n.value('(@StatementSubTreeCost)[1]', 'DECIMAL(18, 6)') AS StatementSubTreeCost,
n.value('(QueryPlan/@CompileCPU)[1]', 'DECIMAL(18, 0)') AS CompileCPU,
n.value('(QueryPlan/@CompileTime)[1]', 'DECIMAL(18, 0)') AS CompileTime,
n.value('(QueryPlan/@CompileMemory)[1]', 'DECIMAL(18, 0)') AS CompileMemory,
--Available only in the actual plan
--n.value('(QueryPlan/@DegreeOfParallelism)[1]', 'DECIMAL(18, 0)') AS DegreeOfParallelism,
n.query('.').exist('//MissingIndexes') AS HasMissingIndexes,
n.query('.') AS ParallelSubTreeXML,
ecp.usecounts,
ecp.size_in_bytes
INTO #X
FROM sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)
WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1
--WHERE LOWER(n.value('(@StatementText)[1]', 'VARCHAR(4000)')) LIKE '%rpt_equipementsetaccessoires_bureaux%'
SELECT * FROM #X
SELECT
--Works only with actual plans
--SUM(usecounts * DegreeOfParallelism) / SUM(usecounts) AS Weigthed_Avg_DOP,
SUM(usecounts * StatementSubTreeCost) / SUM(usecounts) AS Weigthed_Avg_SubTreeCost,
MIN(StatementSubTreeCost) AS Min_StatementSubTreeCost,
MAX(StatementSubTreeCost) AS Max_StatementSubTreeCost
FROM
#X
--DROP TABLE #X
September 12, 2011 at 11:12 am
I appreciate the knowledge and the time. I will state that since the changes Friday, we have only 1 query that still runs in parallel and that query most definitely should continue to run in parallel.
I intend to do a full review of our environment and try and identify area's of improvement. I intend to let the configuration sit a while (Unless the problem arises again) before I make any more changes to the system.
September 12, 2011 at 11:15 am
Bob Fazio (9/12/2011)
I appreciate the knowledge and the time. I will state that since the changes Friday, we have only 1 query that still runs in parallel and that query most definitely should continue to run in parallel.I intend to do a full review of our environment and try and identify area's of improvement. I intend to let the configuration sit a while (Unless the problem arises again) before I make any more changes to the system.
Ok, here's a nice 60 minutes blitz script when taking over a server.
Of course it takes 2.5X that to explain it :-D.
September 12, 2011 at 1:38 pm
There is also a patch we are strongly looking at. http://support.microsoft.com/default.aspx?scid=kb;EN-US;976700
The reason why we are leaning towards the patch, is because I think everyone agrees, one bad query shouldn't take down a server of this (or really any) size.
It's also included in SP1, which we might also apply.
September 12, 2011 at 3:15 pm
Bob Fazio (9/12/2011)
There is also a patch we are strongly looking at. http://support.microsoft.com/default.aspx?scid=kb;EN-US;976700The reason why we are leaning towards the patch, is because I think everyone agrees, one bad query shouldn't take down a server of this (or really any) size.
It's also included in SP1, which we might also apply.
Have you proven this is your problem?
I promise you I have seem a SINGLE simple query take down entire servers. At least 3 times over 8 years.
I'd really start with Gail's article & trace before guessing with patches.
Viewing 15 posts - 16 through 30 (of 33 total)
You must be logged in to reply to this topic. Login to reply