August 19, 2013 at 3:32 am
Hello there, some of this post might be a bit rambling, but bear with me. If anyone wants further information let me know and I can dig it up.
One of our clients servers is showing a reasonable amount of SOS_SCHEDULER_YIELD waits. It's the highest wait type present in the system at around 22% of total waits. They are clearly visible when I run an intensive query and take a delta of the waits stats. I've seen 17 seconds of SOS_SCHEDULER_YIELD waits reported for a 15 second period, correct me if I'm wrong but this basically equates to at least an entire core being idle for the entire operation. We also see a small amount of ASYNC_NETWORK_IO, but this is expected with the particular revision of the software.
I'm trying to isolate why these waits are occurring, as it appears to be CPU contention slowing them down. There is plenty of headroom in the IO subsystem and we have about double the buffer pool that we realistically need. The vast majority of reads appear to be logcial, not physical. We don't see continually high CPU usage, it averages around 10-20% across the system in what is a 8 vCPU VMWare system. However, we've seen similar systems where increasing the amount of cores on the VM provides a noticeable performance benefit.
The problem I have here is that most of our clients run with 4 cores, and we then suggest an upgrade to 8 cores if they are experiencing performance issues. We simply shouldn't be seeing CPU contention on this system.
A DBA from one of our sites has emailed us the following advice:
"Just another quick thought –with VM’s we have found setting the processor affinity in MSSQL engine properties is a vital config for VM’s and MSSQL. This means that MSSQL VM’s will exclusively use the same processor for each thread rather than potentially bouncing it around like a yo yo across the available CPU resources that VMware manages at a much lower level, it can result in all sorts of odd cpu bottleneck issues and load issues like high cpu context switching, interrupt queues etc and so forth which are detrimental across the whole VM node."
Now I've read up on this and it seems like a very bad idea to be considering, however, in the context of SQL Server running on VMWare, does that statement have any weight to it?
For some background: At the back end the host is a dual 6 Core System with hyperthreading enabled. The SQL Server has 8 cores assigned and 2 other servers have 4 cores each assigned (total of 16), one of these servers (apart from the SQL server) might be applying reasonable pressure on the host. I'm inclined to suggest that we subscribe the hosts at 1:1 to eliminate competition for CPU resources on the host.
I also have a query about core licensing. If a server is only licensed for 4 cores, but has 8 vCPU's allocated does SQL apply licensing restraints and only utilise 4 cores? Or do all of them get used. On the system in question I'm seeing 8 active schedulers, which seems to indicate the latter. However, i'm not 100% positive they've gone for Core licensing as opposed to Server+CAL.
August 19, 2013 at 6:20 am
SOS_SCHEDULER_YIELD are generally a sign of CPU pressure, although in this case, it could be overallocation on the host.
VMware is meant to be very clever in the way it allocates CPU, especially across NUMA node boundaries, so it doesn't just throw CPU around any available core unless there's a good reason to. Be careful with setting affinity at the VMWare level - it could be a useful test, but make sure you understand about NUMA boundaries, it's relation to memory allocation etc. before changing that.
VMWare specifically recommends that you set the vCPU number to a multiple of the NUMA node size, which could well be 6 cores in your case (although obviously check this), so that could be causing a scheduling issue:
http://blogs.vmware.com/vsphere/2012/02/vspherenuma-loadbalancing.html
Also, which Service Pack and Cumulative Update are you running for 2012? There have been some fairly major performance issues fixed in the later CUs.
In regards to the licensing question, absolutely not. If you've allocated twice the number of cores you're licensed for, you're violating the license, even if you set affinity at the guest level to only 4 cores.
August 19, 2013 at 6:30 am
dr.blowfin (8/19/2013)
I've seen 17 seconds of SOS_SCHEDULER_YIELD waits reported for a 15 second period, correct me if I'm wrong but this basically equates to at least an entire core being idle for the entire operation.
Not idle. A scheduler yield wait occurs when the process has used too much CPU and voluntarily relinquishes the CPU in favour of another queued thread. If there is no other queued thread, then the thread that yielded will go straight back on the CPU and you'll see no wait.
A scheduler yield wait will never leave the processor idle
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 19, 2013 at 7:36 pm
Thanks for the reponses!
I have a bit of reading to do on the VMWare side of things, the config is starting to worry me. Does bad config/contention on the host explain why I don't see CPU usage running at a much higher average level on the server? To the naked eye (from within Windows) they all seem to have plenty of resources. It feels like there should be way more cycles being burnt by SQL server IMHO.
HowardW (8/19/2013)
Also, which Service Pack and Cumulative Update are you running for 2012? There have been some fairly major performance issues fixed in the later CUs.
It's currently version 11.0.3000.0.
GilaMonster
Not idle. A scheduler yield wait occurs when the process has used too much CPU and voluntarily relinquishes the CPU in favour of another queued thread. If there is no other queued thread, then the thread that yielded will go straight back on the CPU and you'll see no wait.
Ah ok thanks, I thought I might have been a bit mixed up there.:-D
So basically: the task exhausts it's quantum, then accrues SOS_SCHEDULER_YIELD until it gets to the front of the queue and runs?
and thanks guys for not asking about bad queries and the like, sketchy SQL is a known factor in all this but not something we're likely to address as it's an older version of our software. Potential infrastructure issues are way more of a big deal😉
August 19, 2013 at 8:18 pm
Actually after checking out vSphere the SQL server is only sharing the host with two of our test servers, there really shouldn't be much contention at all.
August 20, 2013 at 4:00 pm
dr.blowfin (8/19/2013)I also have a query about core licensing. If a server is only licensed for 4 cores, but has 8 vCPU's allocated does SQL apply licensing restraints and only utilise 4 cores? Or do all of them get used. On the system in question I'm seeing 8 active schedulers, which seems to indicate the latter. However, i'm not 100% positive they've gone for Core licensing as opposed to Server+CAL.
Yes, it is the latter. The only resources limits enforced by the executable is those that comes withe the edition. The license is only a piece of paper. But when you are audited, it's a damn important piece of paper.
So basically: the task exhausts it's quantum, then accrues SOS_SCHEDULER_YIELD until it gets to the front of the queue and runs?
Yes. You should see a lot of SOS_SCHEDULER_YIELD waits, because they occur over and over again on the system. However, as long as their is no CPU pressure, the total time should be about 0. But the number of waits is completely uninteresting.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply