August 4, 2014 at 9:39 am
Hi all, hoping one of the bright sparks on here can point me in the right direction.
I have 4 servers. 2 Physical and 2 VM.
Running SQL 2008R2SP2.
Physical I have 32Cores on Enterprise, and also 16 on Standard
VM 8 Cores on Standard.
Now, the juicy bit.
I have a query (I know, shocking). Anyway, on the 2 physical machines this gets a lovely parallel plan out of the box, runs in under 10 minutes. With 'High Performance' power settings that drops to around 3, but I digress.
On the VM boxes there is a bit of an issue. It won't pick a parallel plan. Stats are updated, indexes rebuilt, all to no avail.
What has made a difference though, is changing the query from:
select <massive field list> into #mytemptable from <11 way table join>
to
select * into #mytemptable from (select <massive field list> from <11 way table join> )x
Now, in my exploration of this I have found all sorts of information - stuff like vm settings to pass through numa, os power settings, hyperthreading etc. But for some reason without this simple query rejig it won't take. Yes, I've fiddled the cost thresholds, the maxdops etc, with no results. The cost for the query without parallelism is so far past the cost threshold -talking 10's thousands here, not just 5.
With the parallel plan on the vm it runs in about 5m, without it's about 25m, so it's quite key... but refactoring all the code is not going to be an easy job, and the physical boxes handle it anyway - even maxdopped down to 4.
Anyone have any idea why this would be the case? Even tried slapping the new CU on to see if that cleared it, but no joy.
August 4, 2014 at 10:56 am
Same query, but different data sets right? Which means radically different statistics? It helps explain why you might see differences in the plans selected. But, it all comes back to the costing model. There are mechanisms you can try to force parallelism on the plan. Adam Machanic and Paul White make a number of suggestions. You can read about them here. Hopefully that will prove helpful.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 5, 2014 at 3:00 am
Ah, no, I guess I wasn't clear.
It's the same data set on each server. Hence the confusion 🙂
August 5, 2014 at 4:01 am
The differences between Standard and Enterprise could be part of it. There are options within the optimizer that are different there. But, it probably comes back to the available memory and available processors and the costing engine. If the costs are estimated higher for a parallel plan, it's off the table. Changing the query changed the costing.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 5, 2014 at 7:18 am
That's what I thought Grant - but then once the standard was on the new physical box, it still works.
Trimming the physical box to numa 0 only has made it behave the same way, which is interesting. Ta.
August 5, 2014 at 8:24 am
Physical on maxdop 5 or less behaves the same as the vm. 6 or up parallel.
But only NUMA 0 (8 cores) also ends up not parallel.
Anyway, found it all a bit curious.
August 5, 2014 at 9:10 am
I'll bet if you compare costs by forcing plans to parallel using the trace flag, you'll see that, for whatever reason, the plans are higher in the systems where it's not choosing them. The thing about the costs is, they're all just calculations based on canned mechanisms. It doesn't actually measure the system it's on. It just looks at the memory and processors available and supplies the existing formula. Variations in the environments based on those settings, not the actual machines and their performance, could explain it. Not that it makes sense, but that it's explainable.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 5, 2014 at 9:19 am
Yes, I don't doubt it.
Any idea why using a single Numa node with 8 cores would throw away the parallel plan, whereas using both nodes, but limited to maxdop 6 on the server would use parallel?
August 5, 2014 at 9:20 am
Short answer, costing. Has to be something in the calculation engine that makes that determination. Not something I know. Adam might. Paul probably would.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 5, 2014 at 9:24 am
Further fiddling suggests it is happy to parallelize even with only one core per numa node... I always understood it preferred to keep it on a single node.
August 5, 2014 at 9:24 am
Thanks for the input, found it a bit puzzling tbh.
August 5, 2014 at 9:31 am
Looking into this in more detail over at http://msdn.microsoft.com/en-us/library/gg415714.aspx
This is a whitepaper Understanding and Controlling Parallel Query Processing in SQL Server
Writers: Don Pinto, Eric Hanson
Guessing it will leave me a little better informed than I was when I started this!
August 5, 2014 at 9:55 am
So... to summarise my findings:
Server - Physical, 2x8CPU, 128GB Ram
Parallelized:
MAXDOP 5 or above
and
(
Numa ALL Auto
or
Numa 1 or more Cores from BOTH nodes
)
Single thread:
MaxDop 4 (or less)
or
Single Numa Node (either - even with 8 Cores)
(note NUMA set in CPU affinity sql server settings)
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply