July 31, 2020 at 4:30 pm
Sorry for the newb question, but as I explore the Cost Threshhold for Parallelism and Max Degree of Parallelism configurations in SQL Server and talk with colleagues, I'm getting more confused.
My understanding is that the SQL Server engine will recognize larger queries expected to take a lot of time/resources, and will automatically make use of parallelism, or processing different parts of the query through multiple processors. Someone I'm speaking with seems to think it is pointless to set these values for certain applications that "don't support multithreading," but I would have thought any large query could be subjected to parallelism. I would greatly appreciate any help in having some of this explained to me.
July 31, 2020 at 5:41 pm
multi-threading is completely unrelated to how SQL server internally uses multiple cores.
a multi threaded operation is how the applciation does it's thing, SQL server is a black box as far as the app is concerned.
SQL server evaluates every query and assigns it a cost value; that number is used to determine whether the SQL engine would divide the work across multiple cores, and then assemble the data together and send it back.
part of that logic is to use an appropriate number of threads. if i have 64 cores, i do not want every query trying to use all 64 cores, potentially, so a maxdop of 4 or 8 is usually what i set up, unless it's a data warehouse situation, where even more cores and heavier processing takes place.
the default value of cost threshhold for parallelism is 5, a throwback to the SQL 2000 days;
that means even relatively simple queries could use multiple threads; for simple queries, that means they could actually be a bit slower than if it was using only one thread, because of the divide+process+reassemble results
most DBA's use a cost threshold for parallelism of at least 50 until they know a bit more about the serve.r
my query below is what i use to analyze. it gets me means and medians and averages of the current workload in the cache; my idea is i want the 50% easiest queries to be single threaded, and anything above that 50% to use parallelism.
on one of my servers, for example, i get a value of 69 for my average, so i would set that server to 69 or 70 instead of my typical value of 50.
DECLARE @ServerLastRestarted DATETIME
SELECT @ServerLastRestarted=create_date FROM sys.databases WHERE name = 'tempdb';
IF OBJECT_ID('tempdb..[#SubtreeCost]') IS NOT NULL
DROP TABLE [#SubtreeCost]
DECLARE @Value VARCHAR(30);
SELECT @Value = CONVERT(VARCHAR(50), [Value]) FROM sys.[configurations] WHERE name = 'cost threshold for parallelism';
;WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
TRY_CAST(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS DECIMAL(18,2)) AS StatementSubtreeCost
INTO #SubtreeCost
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)
WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1;
--SELECT StatementSubtreeCost
--FROM #SubtreeCost
--ORDER BY 1;
SELECT
@ServerLastRestarted AS ServerLastRestarted,
AVG(StatementSubtreeCost) AS AverageSubtreeCost,
COUNT(*) AS StatementCount,@Value AS CurrentCostThreshold
FROM #SubtreeCost;
SELECT @ServerLastRestarted AS ServerLastRestarted,
((SELECT TOP 1 StatementSubtreeCost
FROM
(
SELECT TOP 50 PERCENT StatementSubtreeCost
FROM #SubtreeCost
ORDER BY StatementSubtreeCost ASC
) AS A
ORDER BY StatementSubtreeCost DESC
)
+
(SELECT TOP 1 StatementSubtreeCost
FROM
(
SELECT TOP 50 PERCENT StatementSubtreeCost
FROM #SubtreeCost
ORDER BY StatementSubtreeCost DESC
) AS A
ORDER BY StatementSubtreeCost ASC))
/2 AS MEDIAN,
@Value AS CurrentCostThreshold;
SELECT TOP 1 @ServerLastRestarted AS ServerLastRestarted,StatementSubtreeCost AS MODE,COUNT(*) AS StatementCount,@Value AS CurrentCostThreshold
FROM #SubtreeCost
GROUP BY StatementSubtreeCost
ORDER BY COUNT(1) DESC;
Lowell
July 31, 2020 at 6:08 pm
Hi Lowell. Thank you for the very helpful response (and for that Cost Threshhold for Parallelism query)!
I'd be curious to hear a bit more about your approach for choosing a suitable MAXDOP, especially when there are fewer cores involved, say 2-4. The know the default is 0, indicating that all cores can be involved for a given query.
August 1, 2020 at 2:08 am
Well for MaxDop recommendations, I've been using this pair of scripts which have calculations that someone else came up with.
this script takes into account numa nodes and microsofts recommendations, and the updated takes into account that in 2017 and above, you can have MaxDop at the database level now.
http://dbamastery.com/performance-tuning/maxdop-calculator/
https://github.com/dbamaster/DBA-Mastery/tree/master/MAXDOP%20Calculator
Lowell
August 4, 2020 at 5:52 pm
Thanks again Lowell. I'll give those scripts a look!
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply