Performance Problem that I can't pin down

Question

Post reply

Performance Problem that I can't pin down

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Tx, and now these 2?

SELECT* FROM sys.dm_exec_query_optimizer_info

Don't do any of the recommendations, it's just very rough guidelines for us.

SELECT

optz.counter

, optz.occurrence

, CONVERT(DECIMAL(18,2), optz.value) AS AvgValue

, conf.name

, conf.value

, conf.value_in_use

, Uptime.DaysUptime AS [Days Server UPTIME & Last Stats Reset]

, CASE WHEN Uptime.DaysUptime < 45 THEN 'You may not have very meaningful stats because of a recent restart' ELSE NULL END AS [Stats Warning]

, CASE WHEN optz.value < conf.value THEN 'Cost threshold for parallelism might be too low' ELSE NULL END AS [Cost Threshold Warning]

, CASE WHEN conf.value_in_use <> conf.value THEN 'Server needs to be restarted for the setting to take effect' ELSE NULL END [Restart Warning]

FROM

sys.dm_exec_query_optimizer_info optz

CROSS JOIN sys.configurations conf

OUTER APPLY(

SELECT

CONVERT(DECIMAL(18 , 2) , DATEDIFF(hh , create_date , GETDATE()) / 24.0) AS DaysUptime

FROM

sys.databases

WHERE

name = 'tempdb'

) Uptime

WHERE

optz.counter = 'final cost'

AND conf.name = 'cost threshold for parallelism'

GO

Bob Fazio SSChampion Points: 10794 More actions · Answer 2

http://weblogs.sqlteam.com/tarad/archive/2010/11/08/max-degree-of-parallelism-server-side-setting.aspx

FYI, for me it recommends 4, but as I said, I set to 6.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

Bob Fazio (9/12/2011)
http://weblogs.sqlteam.com/tarad/archive/2010/11/08/max-degree-of-parallelism-server-side-setting.aspx
FYI, for me it recommends 4, but as I said, I set to 6.

Ok, as I said, not even junior at tuning that big of a server. So now that you obviously took care of that... we can move on :-).

Bob Fazio SSChampion Points: 10794 More actions · Answer 4

counteroccurrencevalue

optimizations690881

elapsed time690880.00212443550254748

final cost690887.09694068392164

trivial plan517621

tasks17326623.287082996652

no plan0NULL

search 063101

search 0 time63110.00500285216289019

search 0 tasks6311900.215496751703

search 1110041

search 1 time110160.00153567538126362

search 1 tasks11016464.412490922295

search 2121

search 2 time130.000307692307692308

search 2 tasks13141.846153846154

gain stage 0 to stage 110.00356762375769612

gain stage 1 to stage 20NULL

timeout271

memory limit exceeded0NULL

insert stmt94481

delete stmt4931

update stmt17041

merge stmt0NULL

contains subquery4151

unnest failed5581

tables690881.56905685502547

hints801

order hint11

join hint0NULL

view reference123571

remote query931

maximum DOP690880.000101320055581288

maximum recursion level0NULL

indexed views loaded0NULL

indexed views matched0NULL

indexed views used0NULL

indexed views updated0NULL

dynamic cursor request0NULL

fast forward cursor request241

counteroccurrenceAvgValuenamevaluevalue_in_useDays Server UPTIME & Last Stats ResetStats WarningCost Threshold WarningRestart Warning

final cost690897.10cost threshold for parallelism112.83You may not have very meaningful stats because of a recent restartCost threshold for parallelism might be too lowNULL

Just so you know, I'm pretty green with a system this big as well. Its been running on here for a year, but we hadn't seen even a hint of an issue until about a month ago.

Bob Fazio SSChampion Points: 10794 More actions · Answer 5

Bob Fazio (9/12/2011)
counteroccurrencevalue optimizations690881 elapsed time690880.00212443550254748 final cost690887.09694068392164 trivial plan517621 tasks17326623.287082996652 no plan0NULL search 063101 search 0 time63110.00500285216289019 search 0 tasks6311900.215496751703 search 1110041 search 1 time110160.00153567538126362 search 1 tasks11016464.412490922295 search 2121 search 2 time130.000307692307692308 search 2 tasks13141.846153846154 gain stage 0 to stage 110.00356762375769612 gain stage 1 to stage 20NULL timeout271 memory limit exceeded0NULL insert stmt94481 delete stmt4931 update stmt17041 merge stmt0NULL contains subquery4151 unnest failed5581 tables690881.56905685502547 hints801 order hint11 join hint0NULL view reference123571 remote query931 maximum DOP690880.000101320055581288 maximum recursion level0NULL indexed views loaded0NULL indexed views matched0NULL indexed views used0NULL indexed views updated0NULL dynamic cursor request0NULL
fast forward cursor request241
counteroccurrenceAvgValuenamevaluevalue_in_useDays Server UPTIME & Last Stats ResetStats WarningCost Threshold WarningRestart Warning
final cost690897.10cost threshold for parallelism112.83You may not have very meaningful stats because of a recent restartCost threshold for parallelism might be too lowNULL
Just so you know, I'm pretty green with a system this big as well. Its been running on here for a year, but we hadn't seen even a hint of an issue until about a month ago.

Ignore those, wrong server

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

I wouldn't take any decision based on those stats.

The cost threshold is however at 1 with the default value being 5. This might be something I'd consider but I like to see ideally 1 full month of data. 2 days is just way to short for this.

Is this server mostly olap or oltp?

Bob Fazio SSChampion Points: 10794 More actions · Answer 7

counteroccurrencevalue

optimizations197374501

elapsed time197366460.0063553188317777

final cost1973664745.9971011158534

trivial plan73155451

tasks12421103556.496919637491

no plan0NULL

search 014025791

search 0 time14140600.0132889099472444

search 0 tasks14140601250.81451847871

search 1110162501

search 1 time112608820.003472970856102

search 1 tasks11260882449.037774305778

search 222741

search 2 time62730.155596046548701

search 2 tasks627313871.7344173442

gain stage 0 to stage 1112290.147846452882183

gain stage 1 to stage 22230.00432500828511371

timeout5672551

memory limit exceeded0NULL

insert stmt100578821

delete stmt1048871

update stmt23912821

merge stmt32721

contains subquery5018291

unnest failed2621091

tables197374501.45986132960438

hints330341

order hint6851

join hint141

view reference11182211

remote query88060781

maximum DOP197374500.183442187313964

maximum recursion level0NULL

indexed views loaded0NULL

indexed views matched0NULL

indexed views used0NULL

indexed views updated0NULL

dynamic cursor request22221

fast forward cursor request69701

counteroccurrenceAvgValuenamevaluevalue_in_useDays Server UPTIME & Last Stats ResetStats WarningCost Threshold WarningRestart Warning

final cost1973666346.00cost threshold for parallelism55205.08NULLCost threshold for parallelism might be too lowNULL

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

Ninja's_RGR'us (9/12/2011)
~~I wouldn't take any decision based on those stats.~~
~~The cost threshold is however at 1 with the default value being 5. This might be something I'd consider but I like to see ideally 1 full month of data. 2 days is just way to short for this.~~
Is this server mostly olap or oltp?

Bob Fazio SSChampion Points: 10794 More actions · Answer 9

Bob Fazio

SSChampion

Points: 10794

September 12, 2011 at 10:41 am

#1380202

Almost exclusive OLTP.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

Bob Fazio (9/12/2011)
Almost exclusive OLTP.

Unless Grant has something else in mind I'd go with Cost Treshold at 25-35. I'm going more on the low end here to get more buffer for the "olap" queries which do benefit from parallelism.

Another thing you might play with is actually giving more cpus to those olap queries. Maybe 8 instead of 4. the peak would be higher, but shorter and still leaving 40 other cores to work on the "normal stuff".

However at this point I'd really like to be looking into this =>

http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1

http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-2

You could also corelate the times you have in the perform counters to those traces. At this point it's just a matter of find and tune. I really think you have more of a targeted problem with a few queries rather than a full server setup issue (especially if it ran ok for months).

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

This can take a guess at the avg, min & max cost of the olap queries.

this uses the cached plan so it won't be 100% accurate but for guesstimate avg it might do the trick.

This takes 1 min to process 8K plans on my prod server and it trashes the cpus while doing so. You might want to be around if you start this during the day in case users complain or run it at night (when they CAN'T complain :-D).

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

WITH XMLNAMESPACES

(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

SELECT

query_plan AS CompleteQueryPlan,

n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,

n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,

n.value('(@StatementOptmEarlyAbortReason)[1]', 'VARCHAR(25)') AS StatementOptmEarlyAbortReason,

n.value('(@StatementSubTreeCost)[1]', 'DECIMAL(18, 6)') AS StatementSubTreeCost,

n.value('(QueryPlan/@CompileCPU)[1]', 'DECIMAL(18, 0)') AS CompileCPU,

n.value('(QueryPlan/@CompileTime)[1]', 'DECIMAL(18, 0)') AS CompileTime,

n.value('(QueryPlan/@CompileMemory)[1]', 'DECIMAL(18, 0)') AS CompileMemory,

--Available only in the actual plan

--n.value('(QueryPlan/@DegreeOfParallelism)[1]', 'DECIMAL(18, 0)') AS DegreeOfParallelism,

n.query('.').exist('//MissingIndexes') AS HasMissingIndexes,

n.query('.') AS ParallelSubTreeXML,

ecp.usecounts,

ecp.size_in_bytes

INTO #X

FROM sys.dm_exec_cached_plans AS ecp

CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp

CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)

WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1

--WHERE LOWER(n.value('(@StatementText)[1]', 'VARCHAR(4000)')) LIKE '%rpt_equipementsetaccessoires_bureaux%'

SELECT * FROM #X

SELECT

--Works only with actual plans

--SUM(usecounts * DegreeOfParallelism) / SUM(usecounts) AS Weigthed_Avg_DOP,

SUM(usecounts * StatementSubTreeCost) / SUM(usecounts) AS Weigthed_Avg_SubTreeCost,

MIN(StatementSubTreeCost) AS Min_StatementSubTreeCost,

MAX(StatementSubTreeCost) AS Max_StatementSubTreeCost

FROM

#X

--DROP TABLE #X

Bob Fazio SSChampion Points: 10794 More actions · Answer 12

I appreciate the knowledge and the time. I will state that since the changes Friday, we have only 1 query that still runs in parallel and that query most definitely should continue to run in parallel.

I intend to do a full review of our environment and try and identify area's of improvement. I intend to let the configuration sit a while (Unless the problem arises again) before I make any more changes to the system.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 13

Bob Fazio (9/12/2011)
I appreciate the knowledge and the time. I will state that since the changes Friday, we have only 1 query that still runs in parallel and that query most definitely should continue to run in parallel.
I intend to do a full review of our environment and try and identify area's of improvement. I intend to let the configuration sit a while (Unless the problem arises again) before I make any more changes to the system.

Ok, here's a nice 60 minutes blitz script when taking over a server.

Of course it takes 2.5X that to explain it :-D.

http://www.youtube.com/watch?v=40_Ilr-Mysk

Bob Fazio SSChampion Points: 10794 More actions · Answer 14

There is also a patch we are strongly looking at. http://support.microsoft.com/default.aspx?scid=kb;EN-US;976700

The reason why we are leaning towards the patch, is because I think everyone agrees, one bad query shouldn't take down a server of this (or really any) size.

It's also included in SP1, which we might also apply.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 15

Bob Fazio (9/12/2011)
There is also a patch we are strongly looking at. http://support.microsoft.com/default.aspx?scid=kb;EN-US;976700
The reason why we are leaning towards the patch, is because I think everyone agrees, one bad query shouldn't take down a server of this (or really any) size.
It's also included in SP1, which we might also apply.

Have you proven this is your problem?

I promise you I have seem a SINGLE simple query take down entire servers. At least 3 times over 8 years.

I'd really start with Gail's article & trace before guessing with patches.

Performance Problem that I can't pin down

Cookies on SQLServerCentral