Understanding NUMA in SQL Server VMs

Last Updated on January 8, 2022 by John Morehouse

What really is NUMA and why do we as database administrators care?

NUMA stands for “Non-uniform Memory Access” and allows for increased CPU & memory performance. This is because a memory bank is physically adjacent to a CPU socket on the motherboard of the server. Each combination of memory and socket is considered a “node”. If demands of a request (think query) forces it to go outside the bounds of a given node then the request could encounter slower performance.

When dealing with SQL Server on virtual machines specifically, NUMA can be confusing and there are a couple of layers of NUMA. Let’s talk about the foundation first and work our way up. Keep in mind that these concepts are fairly hypervisor agnostic so it’s pretty much the same if you are running VMWare or Hyper-V.

Of course, if you are running bare metal machines still, you still have NUMA nodes however you wouldn’t have the Guest NUMA layer to deal with.

Host NUMA

Regardless if you are running a virtual machine on-premises or in the cloud, there is a physical host that supports the guest virtual machine. There can be multiple guest virtual machines for any given host. The host has a CPU scheduler that doles out CPU threads across all of the guest virtual machines that reside on itself. Since the physical host is going to have a motherboard with (most likely) multiple CPU sockets and adjacent memory banks, it can take advantage of the onboard NUMA architecture to ensure CPU time for any given request is as minimal as possible.

The real trick with the host NUMA configuration is ensure you now what the physical architecture is:

How many cores per CPU socket
How much memory is adjacent to each CPU socket

This information is critical to ensure guest virtual machines are configured correctly, especially when it comes to potential workload intensive applications such as SQL Server.

Guest NUMA

On hypervisors, such as VMWare and Hyper-V, both of which are NUMA aware, each virtual machine can be configured for a mix of sockets and cores. Usually this is designated by a “socket x core” nomenclature. For example,

1×4 = 1 socket with 4 cores (4 cores total)
1×8 = 1 socket with 8 cores (8 cores total)
2×4 = 2 sockets with 4 cores each (8 cores total)

You get the idea.

Ideally, you want the guest virtual machines Socket/Core configuration to reside within a single NUMA node of the host machine. This is where knowing what architecture the host machine has is important. By configuring the VM to “fit” within the NUMA node of those, the requests for CPU cycle time from the VM will remain on a given NUMA node thus allowing for optimal performance.

This is also true for memory configurations. If the host has 1TB of physical memory on it with 4 sockets, then each numa node would have 250GB. If a virtual machine is configured with a memory amount greater than 250GB, then the memory allocation has to reside outside the bounds of one NUMA node of the host.

So the general rule of thumb, and your mileage may vary, is to configure your virtual machines from both CPU and memory perspectives to fit within a NUMA node of the underlying host.

SQL Server & NUMA

Thankfully, SQL Server is NUMA aware which means that it can determine whether or not the underlying guest virtual machine has NUMA enabled. SQL Server can also implement what is called “soft numa”. Soft NUMA, in my opinion, is like software RAID. In the olden times of hardware, you had to have a physical RAID controller in the drive chases in order to apply any type of RAID across the hard drives. As technology evolved, eventually you could do RAID configurations by utilizing software alone. In the same context, SQL Server can create soft NUMA nodes from the basis of the physical NUMA nodes of the underlying infrastructure.

If you are running SQL Server 2016 or higher, this is done for you automatically. When SQL Server starts, if it detects more than eight cores per socket, it will automatically create soft NUMA nodes for you. Keep in mind that it doesn’t matter if the cores are using hyper-threading or not.

Soft NUMA nodes can help with performance and scalability for SQL Server and while you can disable this behavior within SQL Server, it’s generally recommended that you don’t to do this for this reason alone.

Summary

NUMA is a critical piece of Harvard architecture that can drastically affect your SQL Server workloads. Understanding the foundation of how it works and why it is important is critical when deploying SQL Server in a virtualized environment. Make sure to look at your existing systems and make any adjustments necessary to take full advantage of this awesome piece of technology.

The post Understanding NUMA in SQL Server VMs first appeared on John Morehouse.