Hyper-converged compute architectures, meaning clusters of physical servers with self-contained storage configured for a traditional SAN-less solution, are growing in popularity. These architectures provide a solid and simple way to deliver scale-out virtualized infrastructures with a minimum of management overhead. However, one concern of mine with SQL Server and these platforms has popped up recently, and I wanted to talk you through it.
Note: I am in no way telling anyone NOT to evaluate or purchase hyper-converged compute platforms. I actually really like the concept and feel that it is a very solid architecture for many use cases in business. I’ve even worked with some of the largest hyper-converged architecture companies on their SQL Server initiatives.
The challenge that I pose is not with the use of SQL Server on these platforms. It’s with SQL Server licensing on these platforms. I’ll explain after a review of the platform design.
The storage architecture of a number of the solutions on the market today are similar. Get some physical storage in either spindle disks or SSDs and add them to each physical server. Build a scale-out file system that spans across each of these servers, and replicates data between nodes to prevent data loss if one or more physical servers were to fail. Group the servers into a VM-level compute cluster, and present this storage layer to the compute cluster as a shared storage solution. At this point, you can start deploying virtual machines. The local SSDs keep the I/O performing quite well, and if spindle media are present, the capacity of the storage can grow quickly to scale with you.
Now, add data savings technologies into the mix. This feature is usually in the form of either compression and/or deduplication technologies in line with the storage. These processes will reduce the footprint of the data being written to disk considerably.
However, these processes are usually quite resource intensive, and often require substantial amounts of CPU power to perform these operations without creating a performance bottleneck for those I/O operations. On a traditional shared-storage SAN that’s capable of data reduction, this workload is offloaded to the CPUs on the SAN controllers, which are (hopefully) powerful enough to perform these duties with ease, and therefore not slowing the observed SAN performance without draining the servers themselves from their compute resources.
Now, look at the hyper-converged architectures. The CPUs used for I/O handling are the same as those that your VMs use to power your applications. A substantial portion of the host’s CPU power is now needed for I/O handling, and this activity comes first in the CPU scheduling queues.
This fact, by itself, is not necessarily a problem. Most virtualization host CPUs are relatively lightly utilized, and this amount of CPU needed for I/O handling is readily absorbed without causing a performance problem.
But, larger scale SQL Servers can read and write exceptionally large amounts of data around the clock. The I/O handling at the host layer can start to drain resources from the host. The additional activity scheduling time inside the hypervisor could be slowing down these SQL Servers without you knowing it.
Now, look at the SQL Server licensing models. Today’s Enterprise licensing is usually applied on the host’s CPU cores.
Think about it.
If you are under an Enterprise licensing agreement where you license the physical server cores, you could be paying for SQL Server Enterprise licensing on cores that are not available for use by the SQL Server VMs.
As a result, the SQL Server licensing model should be reviewed. Review the SQL Server VM density on the physical server, and the number of vCPUs allocated to those VMs. Do the math and run the numbers to see if you can reduce the licensed core count if you license the VM cores individually, versus the entire set of physical cores. You might find that the math swings in your favor if you license in this manner!