June 4, 2020 at 9:45 am
Morning All,
Is it possible to determine how many threads are spawned by a spid in SQL Server 2017.
We are running out of threads (every now and then) during load times through the night. I am trying to track down and subsquently prove which particular query is the worst offender?
Cheers
Alex
June 4, 2020 at 3:31 pm
That sounds like a maxdop and cost threshold configuration issue.
But if you execute sp_who2 and look for multiple copies of the same SPID, that'll show you how many threads are being created for that one query. Downside to this approach is that it only tells you the number of current threads. So you'd need to run that while your load times are high. It is not a historical check, it is a point-in-time check.
However, you could toss it into a job and insert the results into a table and review it at a future date.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
June 8, 2020 at 3:36 pm
Thanks Brian for the reply.
Is it possible to us the Query Store to historically identify Parallel plans or problematic Paralellel plans?
June 8, 2020 at 4:17 pm
I imagine you could. I am currently not using Query Store on my servers (too old... need to do upgrades), but my understanding is that it stores the execution plan and you could look through that for parallel processes. Only downside is it is in XML, so you'd need to do some comparisons in XML which is not efficient in SQL Server.
That would allow you to identify parallel plans, but I think identifying problematic parallel plans would be tricky. What do you consider to be a problematic parallel plan? One that runs long? What do you define as "long"? I know my systems have some things that run long such as our full ETL load which can take over an hour to complete (we need to optimize it, but it works and runs during downtime, so it is not a high priority to get fixed). But if we had a report (for example) that took a minute to get the data, that is far too long.
I think first, it is good to figure out what you are trying to find. Here, I think you are looking specifically for queries that run parallel. Parallel queries are not necessarily a bad thing though. Sometimes parallel queries are exactly what you want for performance purposes. Other times, parallel queries hurt performance. You may benefit the most by looking at your MAX DOP setting and your cost threshold for parallelism.
That being said, do you have any non-SQL Server things running on that server (including SSRS, SSAS, and SSIS)? The reason I ask is it MIGHT not be SQL Server eating all the threads. It could be IIS (if that is on there), antivirus, SSRS, SSIS, SSAS, or any number of things. I just want to make sure you are investigating the correct thing. I know how easy it can be to not see the forest for the trees. I've been in that boat many times and it has bitten me more times than I care to admit.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
June 9, 2020 at 2:56 pm
You raise a good point about other things running on the server, and yes we do have SSAS -- but its usually idle when these problems occur. It only has 3 cubes that refresh at a very quiet time. We've never encountered a problem that overlaps that refresh window (which takes 3 minutes).
Regarding what I am trying to find out by identifying historical parallel threads -- it wasn't as much to identify problematic ones (despite my phrasing in the first reply) but to quantify what impact has happened since changing Cost Threshold for Parralelism.
Cheers
Alex
June 9, 2020 at 4:15 pm
Quick google and I found this blog post:
https://blog.sqlauthority.com/2010/07/24/sql-server-find-queries-using-parallelism-from-cached-plan/
Pinal Dave has a query there that will pull all parallel plans from the plan cache (not from the query store). I would add in:
ORDER BY last_execution_time DESC
to his query simply because you are wanting to know the last time the plan was used. And you may want to remove the TOP, but maybe not. Or you may want to put a date limit in like if you know that you run out of threads on Tuesdays at 2:00 AM, you may want to see last execution time between 09/06/2020 (day/month/year) at 1:00 AM and 3:00 AM, then change the cost threshold and wait for next week Tuesday to compare.
The full query to run:
SELECT TOP 10
p.*,
q.*,
qs.*,
cp.plan_handle
FROM
sys.dm_exec_cached_plans cp
CROSS apply sys.dm_exec_query_plan(cp.plan_handle) p
CROSS apply sys.dm_exec_sql_text(cp.plan_handle) AS q
JOIN sys.dm_exec_query_stats qs
ON qs.plan_handle = cp.plan_handle
WHERE
cp.cacheobjtype = 'Compiled Plan' AND
p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";
max(//p:RelOp/@Parallel)', 'float') > 0
ORDER BY last_execution_time DESC
OPTION (MAXDOP 1)
Other values that will be useful for you to look at from the above query are - last_elapsed_time, min_elapsed_time, max_elapsed_time, last_grant_kb, min_grant_kb, max_grant_kb, last_used_grant_kb, min_used_grant_kb, max_used_grant_kb. You do want to make sure that changing your cost threshold didn't cause spikes in the elapsed time or spike in memory grants or memory used. The above query will also tell you the last_used_thread count as well as the min and max, and the last_reserved_threads count and the min/max too.
MAY want to tweak the query to not use *'s, as it brings back a lot of data you may not use, but it runs pretty quick on my test system (4 seconds with that order by added in).
Sorry I didn't suggest it sooner, but I was not aware that SQL actually tracked that info.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply