April 12, 2016 at 11:45 am
I have this query to find the cost of plans that use parallelism, but not sure how to proceed as far as setting the cost threshold?? QA and prod are still at the default of 5, but I find 984 plans in production with cost related to parallelism greater than 50.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT top (150)
query_plan AS CompleteQueryPlan,
n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,
n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,
n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost,
n.query('.') AS ParallelSubTreeXML,
ecp.usecounts,
ecp.size_in_bytes
FROM sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)
WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1
and cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float)>50
order by cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float) desc
April 12, 2016 at 12:45 pm
The default of 5 for cost threshold for parallelism is almost universally too low. I don't know of a formula for setting a higher value, but I know pretty much every machine in existence needs to adjust it. Same for MAXDOP's default of 0. Quite a few bad defaults in a SQL Server instance, come to think of it!! :w00t:
I usually go with 15 for an OLTP box and 40 for an OLAP box for CTFP. Physical core count on each NUMA node as a max for MAXDOP. You can override the latter with a query option, which is nice when you want a particular query to run faster (or SLOWER!!) than it otherwise might given the current configuration.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 12, 2016 at 1:00 pm
Thanks Guru. The trick is "proving" to management that we need to increase the value, AND that it won't hurt anything. I have a number of QA environments, just not sure about the proving part. ( maxdop is already at 8 ) Also the prod load is a mix of oltp / olap (report type queries ) if I had to guess I'd say 75% OLTP
This is our production setup: 4 numa nodes
number_of_physical_cpus - 4number_of_cores_per_cpu-12 total_number_of_cores- 48 number_of_virtual_cpus 64 cpu_category x64
QA: 281632x64
TheSQLGuru (4/12/2016)
The default of 5 for cost threshold for parallelism is almost universally too low. I don't know of a formula for setting a higher value, but I know pretty much every machine in existence needs to adjust it. Same for MAXDOP's default of 0. Quite a few bad defaults in a SQL Server instance, come to think of it!! :w00t:I usually go with 15 for an OLTP box and 40 for an OLAP box for CTFP. Physical core count on each NUMA node as a max for MAXDOP. You can override the latter with a query option, which is nice when you want a particular query to run faster (or SLOWER!!) than it otherwise might given the current configuration.
April 12, 2016 at 1:27 pm
TheSQLGuru (4/12/2016)
The default of 5 for cost threshold for parallelism is almost universally too low. I don't know of a formula for setting a higher value, but I know pretty much every machine in existence needs to adjust it. Same for MAXDOP's default of 0. Quite a few bad defaults in a SQL Server instance, come to think of it!! :w00t:I usually go with 15 for an OLTP box and 40 for an OLAP box for CTFP. Physical core count on each NUMA node as a max for MAXDOP. You can override the latter with a query option, which is nice when you want a particular query to run faster (or SLOWER!!) than it otherwise might given the current configuration.
Agreed that 5 is very low, find 15 quite low to for OLTP, how did you come to that conclusion? Normally would have OLTP boxes higher than OLAP given the OLAP results are cached. Also the difference between VMs and Tins is something one must factor into the equation.
😎
April 12, 2016 at 1:30 pm
April 12, 2016 at 2:31 pm
Eirikur Eiriksson (4/12/2016)
TheSQLGuru (4/12/2016)
The default of 5 for cost threshold for parallelism is almost universally too low. I don't know of a formula for setting a higher value, but I know pretty much every machine in existence needs to adjust it. Same for MAXDOP's default of 0. Quite a few bad defaults in a SQL Server instance, come to think of it!! :w00t:I usually go with 15 for an OLTP box and 40 for an OLAP box for CTFP. Physical core count on each NUMA node as a max for MAXDOP. You can override the latter with a query option, which is nice when you want a particular query to run faster (or SLOWER!!) than it otherwise might given the current configuration.
Agreed that 5 is very low, find 15 quite low to for OLTP, how did you come to that conclusion? Normally would have OLTP boxes higher than OLAP given the OLAP results are cached. Also the difference between VMs and Tins is something one must factor into the equation.
😎
I almost never want my OLAP queries cached if that is what you are saying.
And I rarely have a client with a properly indexed (or coded) app, so some of those OLTP queries really do need parallelism and also pretty much everyone runs a "mixed-mode" server where they do reports on same system (usually same database #FACEPALM). 🙂
So no magic formula - just a number I pulled out of my butt. 😀 I do monitor for (and teach my clients how to monitor for) issues with too much or too little parallelism regardless of their settings.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 12, 2016 at 2:57 pm
Sounds like, based on our hardware and oltp/reporting mix in the same database, I can just try CTFP 25 in QA and see how it goes. Of course, QA never gets anywhere the kind of load prod does.
I'll start by getting a count of plans using parallelism, with cost above 50 and usecount <30
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT count(*)
FROM sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)
WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1
and cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float)>50
and ecp.usecounts<30
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
query_plan AS CompleteQueryPlan,
n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,
n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,
n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost,
n.query('.') AS ParallelSubTreeXML,
ecp.usecounts,
ecp.size_in_bytes
FROM sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)
WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1
and cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float)>50
and ecp.usecounts<30
order by cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float) desc
April 14, 2016 at 11:35 am
To prove the need to raise cost threshold for parallelism ( CTP ), I took a query from the prod plan cache with a cost of 35. It has been using the maxdop of 8. 187 records retrieved in about a second.
With the query hint option ( maxdop 1 ), it takes about the same amount of time and also shows StatementSubTreeCost="35.122"
DegreeOfParallelism="0" NonParallelPlanReason="MaxDOPSetToOne"
So this doesn't appear to be proving any need to raise CTP ??
I can try some with a cost of ~10 I suppose. With cxpacket wait at 50% of waits, I thought we needed adjustments, but I realize not all cxpacket wait is bad.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply