March 28, 2016 at 11:31 am
I have a complex query with multiple joins and conditions. My test server has only one drive for data. Same for production.
Estimated execution plan inserts parallelism (distribute and repartition) almost at every step. My tests showed that without parallelism (I played with threshold in configuration) it executes about 20% faster while total execution cost was about 3 times more expensive than with parallelism.
Questions:
1. Just with one single drive where all data files placed into the same drive, how SQL Server supposed to parallelize my process?
2. Execution cost, even though it is not measured in units of time, should be somehow proportional to actual execution time, am I correct?
3. Would it be a good practice to raise threshold cost for parallelism to some high number, like 1000 ? This is what I actually did on my test server.
Thanks
March 28, 2016 at 12:18 pm
SQL Guy 1 (3/28/2016)
I have a complex query with multiple joins and conditions. My test server has only one drive for data. Same for production.Estimated execution plan inserts parallelism (distribute and repartition) almost at every step. My tests showed that without parallelism (I played with threshold in configuration) it executes about 20% faster while total execution cost was about 3 times more expensive than with parallelism.
If it's faster running serial then that's what you want. If one CPU is getting it done 20% faster than multiple CPUs then it's a no-brainer. I often force the optimizer to use a serial execution plan (using MAXDOP 1) when a parallel one does not show a substantial performance gain. A parallel plan generally uses all the CPUs and I would rather not see them tied up unless I'm getting lots of bang for my buck.
1. Just with one single drive where all data files placed into the same drive, how SQL Server supposed to parallelize my process?
Parallelism is about CPUs not, IO. If you look at the execution plan you can actually see how many CPUs were involved. Not spreading your data files, log files across multiple drives is a problem though. You want to make sure that SQL Server has enough IO to do what it needs. This is a different topic but one that I suggest you address.
2. Execution cost, even though it is not measured in units of time, should be somehow proportional to actual execution time, am I correct?
The cost is an estimate that the optimizer uses when determining the best possible plan. There sometimes may be a parallel where lower cost = lower execution time, but the reality is that the optimizer is often making the best educated guess based on statistics/cardinality estimates, available indexes, etc.
3. Would it be a good practice to raise threshold cost for parallelism to some high number, like 1000 ? This is what I actually did on my test server.
The default of 5 is too low - that I know for sure, and it should be raised IMHO. To what depends on a number of things, e.g. what else is happening on that specific sytstem/SQL instance. For this I recommend you do some research on what is the best option.
-- Itzik Ben-Gan 2001
March 28, 2016 at 1:22 pm
Thanks Alan, for detailed explanation.
I always thought that parallelism is about I/O, with dedicating a CPU to each I/O stream. If it's about CPU, it looks that in my case it spends more resources for distributing and repartition streams rather than receiving benefits from it.
About data and log files. They are not on the same drive. All data is on one drive, and logs on another.
And about the value for threshold for parallelism, you recommend to do some research. Can I ask what should I check for it? This is server-wide setting, apparently I need to check some sys. DMV's?
Thanks
March 28, 2016 at 2:39 pm
SQL Guy 1 (3/28/2016)
And about the value for threshold for parallelism, you recommend to do some research. Can I ask what should I check for it?
I am not an expert on what the threshold for parallelism should be; to be perfectly honest - I should probably do some more research. I generally set it as 50, that's a default I have been using since reading this article[/url]. From there, when testing my queries, I'll often change my plans from serial to parallel and vise-versa and see which is faster; at 5 (the default) I used to see parallel plans slowing down my queries. Anyhow, doing this helps me understand if I'm frequently getting serial plans when I should be getting a parallel plan or vise-versa. To change a serial plan to parallel I use trace flag 8649
or Adam Machanic's make_parallel function. To force a serial plan I use OPTION (maxdop 1). I set it up/down based on that kind of testing.
Again, I'm no expert on this topic but, while doing a little research, I came across this blog post[/url] by Paul White which I found helpful.
-- Itzik Ben-Gan 2001
March 29, 2016 at 3:18 am
SQL Guy 1 (3/28/2016)
I always thought that parallelism is about I/O, with dedicating a CPU to each I/O stream.
Not at all. Mostly because the query processor doesn't know what a disk is. It processes data from memory only.
If it's about CPU, it looks that in my case it spends more resources for distributing and repartition streams rather than receiving benefits from it.
There's always overhead from parallalism, but if the query runs 20% faster in parallel, then it's receiving benefits from parallelism.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 29, 2016 at 8:44 am
The default value of the cost threshold for parallelism is very low (5), but 1000 seems excessively high. Depending on the system, I've seen the value moved to somewhere between 25 and 50. It really depends on your system and at what point you get benefits for parallel execution.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply