VM v Physical - Parallel Plan Problem

Question

VM v Physical - Parallel Plan Problem

RichB

SSCrazy Eights

Points: 9651
More actions
August 4, 2014 at 9:39 am

#293897

Hi all, hoping one of the bright sparks on here can point me in the right direction.
I have 4 servers. 2 Physical and 2 VM.
Running SQL 2008R2SP2.
Physical I have 32Cores on Enterprise, and also 16 on Standard
VM 8 Cores on Standard.
Now, the juicy bit.
I have a query (I know, shocking). Anyway, on the 2 physical machines this gets a lovely parallel plan out of the box, runs in under 10 minutes. With 'High Performance' power settings that drops to around 3, but I digress.
On the VM boxes there is a bit of an issue. It won't pick a parallel plan. Stats are updated, indexes rebuilt, all to no avail.
What has made a difference though, is changing the query from:
select <massive field list> into #mytemptable from <11 way table join>
to
select * into #mytemptable from (select <massive field list> from <11 way table join> )x
Now, in my exploration of this I have found all sorts of information - stuff like vm settings to pass through numa, os power settings, hyperthreading etc. But for some reason without this simple query rejig it won't take. Yes, I've fiddled the cost thresholds, the maxdops etc, with no results. The cost for the query without parallelism is so far past the cost threshold -talking 10's thousands here, not just 5.
With the parallel plan on the vm it runs in about 5m, without it's about 25m, so it's quite key... but refactoring all the code is not going to be an easy job, and the physical boxes handle it anyway - even maxdopped down to 4.
Anyone have any idea why this would be the case? Even tried slapping the new CU on to see if that cleared it, but no joy.

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 1

Same query, but different data sets right? Which means radically different statistics? It helps explain why you might see differences in the plans selected. But, it all comes back to the costing model. There are mechanisms you can try to force parallelism on the plan. Adam Machanic and Paul White make a number of suggestions. You can read about them here. Hopefully that will prove helpful.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

RichB SSCrazy Eights Points: 9651 More actions · Answer 2

Ah, no, I guess I wasn't clear.

It's the same data set on each server. Hence the confusion 🙂

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 3

The differences between Standard and Enterprise could be part of it. There are options within the optimizer that are different there. But, it probably comes back to the available memory and available processors and the costing engine. If the costs are estimated higher for a parallel plan, it's off the table. Changing the query changed the costing.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

RichB SSCrazy Eights Points: 9651 More actions · Answer 4

That's what I thought Grant - but then once the standard was on the new physical box, it still works.

Trimming the physical box to numa 0 only has made it behave the same way, which is interesting. Ta.

RichB SSCrazy Eights Points: 9651 More actions · Answer 5

Physical on maxdop 5 or less behaves the same as the vm. 6 or up parallel.

But only NUMA 0 (8 cores) also ends up not parallel.

Anyway, found it all a bit curious.

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 6

I'll bet if you compare costs by forcing plans to parallel using the trace flag, you'll see that, for whatever reason, the plans are higher in the systems where it's not choosing them. The thing about the costs is, they're all just calculations based on canned mechanisms. It doesn't actually measure the system it's on. It just looks at the memory and processors available and supplies the existing formula. Variations in the environments based on those settings, not the actual machines and their performance, could explain it. Not that it makes sense, but that it's explainable.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

RichB SSCrazy Eights Points: 9651 More actions · Answer 7

Yes, I don't doubt it.

Any idea why using a single Numa node with 8 cores would throw away the parallel plan, whereas using both nodes, but limited to maxdop 6 on the server would use parallel?

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 8

Short answer, costing. Has to be something in the calculation engine that makes that determination. Not something I know. Adam might. Paul probably would.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

RichB SSCrazy Eights Points: 9651 More actions · Answer 9

Further fiddling suggests it is happy to parallelize even with only one core per numa node... I always understood it preferred to keep it on a single node.

RichB SSCrazy Eights Points: 9651 More actions · Answer 10

Thanks for the input, found it a bit puzzling tbh.

RichB SSCrazy Eights Points: 9651 More actions · Answer 11

Looking into this in more detail over at http://msdn.microsoft.com/en-us/library/gg415714.aspx

This is a whitepaper Understanding and Controlling Parallel Query Processing in SQL Server

Writers: Don Pinto, Eric Hanson

Guessing it will leave me a little better informed than I was when I started this!

RichB SSCrazy Eights Points: 9651 More actions · Answer 12

So... to summarise my findings:

Server - Physical, 2x8CPU, 128GB Ram

Parallelized:

MAXDOP 5 or above

and

(

Numa ALL Auto

or

Numa 1 or more Cores from BOTH nodes

)

Single thread:

MaxDop 4 (or less)

or

Single Numa Node (either - even with 8 Cores)

(note NUMA set in CPU affinity sql server settings)