People often ask me what’s the number one thing to look out for when running SQL Server on Kubernetes…the answer is memory settings. In this post, we’re going to dig into why you need to configure resource limits in your SQL Server’s Pod Spec when running SQL Server workloads in Kubernetes. I’m running these demos in AKS, but these concepts apply to any SQL Server environment running in Kubernetes.
Let’s deploy SQL Server in a Pod without any resource limits. In the yaml below, we’re using a Deployment to run one SQL Server Pod with a PersistentVolumeClaim for our instance directory and also frontending the Pod with a Service for access. This is being implemented in Azure Kubernetes Service (AKS).
apiVersion: apps/v1 kind: Deployment metadata: name: mssql-deployment-2017 spec: replicas: 1 strategy: type: Recreate selector: matchLabels: app: mssql-2017 template: metadata: labels: app: mssql-2017 spec: hostname: sql3 containers: - name: mssql image: 'mcr.microsoft.com/mssql/server:2017-CU16-ubuntu' ports: - containerPort: 1433 env: - name: ACCEPT_EULA value: "Y" - name: SA_PASSWORD valueFrom: secretKeyRef: name: mssql key: SA_PASSWORD volumeMounts: - name: mssqldb mountPath: /var/opt/mssql volumes: - name: mssqldb persistentVolumeClaim: claimName: pvc-sql-2017 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc-sql-2017 spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: managed-premium --- apiVersion: v1 kind: Service metadata: name: mssql-svc-2017 spec: selector: app: mssql-2017 ports: - protocol: TCP port: 1433 targetPort: 1433 type: LoadBalancer
Running a Workload Against our Pod…then BOOM!
With that Pod deployed, I loaded up a HammerDB TPC-C test with about 10GB of data and drove a workload against our SQL Server. Then while monitoring the workload…boom HammerDB throws connection errors and crashes. Let’s look at why.
First thing’s first, let’s check the Pods status with kubectl get pods. We’ll that’s interesting I have 13 Pods. 1 has a Status of Running and the remainder have are Evicted.
kubectl get pods NAME READY STATUS RESTARTS AGE mssql-deployment-2017-8698fb8bf5-2pw2z 0/1 Evicted 0 8m24s mssql-deployment-2017-8698fb8bf5-4bn6c 0/1 Evicted 0 8m23s mssql-deployment-2017-8698fb8bf5-4pw7d 0/1 Evicted 0 8m25s mssql-deployment-2017-8698fb8bf5-54k6k 0/1 Evicted 0 8m27s mssql-deployment-2017-8698fb8bf5-96lzf 0/1 Evicted 0 8m26s mssql-deployment-2017-8698fb8bf5-clrbx 0/1 Evicted 0 8m27s mssql-deployment-2017-8698fb8bf5-cp6ml 0/1 Evicted 0 8m27s mssql-deployment-2017-8698fb8bf5-ln8zt 0/1 Evicted 0 8m27s mssql-deployment-2017-8698fb8bf5-nmq65 0/1 Evicted 0 8m21s mssql-deployment-2017-8698fb8bf5-p2mvm 0/1 Evicted 0 25h mssql-deployment-2017-8698fb8bf5-stzfw 0/1 Evicted 0 8m23s mssql-deployment-2017-8698fb8bf5-td24w 1/1 Running 0 8m20s mssql-deployment-2017-8698fb8bf5-wpgcx 0/1 Evicted 0 8m22s
What Just Happened?
kubectl get events --sort-by=.metadata.creationTimestamp LAST SEEN TYPE REASON OBJECT MESSAGE 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-clrbx Successfully assigned default/mssql-deployment-2017-8698fb8bf5-clrbx to aks-agentpool-43452558-0 17m Warning EvictionThresholdMet node/aks-agentpool-43452558-0 Attempting to reclaim memory 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 Created pod: mssql-deployment-2017-8698fb8bf5-clrbx 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 Created pod: mssql-deployment-2017-8698fb8bf5-ln8zt 17m Normal Killing pod/mssql-deployment-2017-8698fb8bf5-p2mvm Stopping container mssql 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-54k6k The node had condition: [MemoryPressure]. 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-p2mvm The node was low on resource: memory. Container mssql was using 4461532Ki, which exceeds its request of 0. 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-cp6ml The node had condition: [MemoryPressure]. 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-cp6ml Successfully assigned default/mssql-deployment-2017-8698fb8bf5-cp6ml to aks-agentpool-43452558-0 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-54k6k Successfully assigned default/mssql-deployment-2017-8698fb8bf5-54k6k to aks-agentpool-43452558-0 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-clrbx The node had condition: [MemoryPressure]. 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 Created pod: mssql-deployment-2017-8698fb8bf5-cp6ml 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 Created pod: mssql-deployment-2017-8698fb8bf5-54k6k 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-ln8zt Successfully assigned default/mssql-deployment-2017-8698fb8bf5-ln8zt to aks-agentpool-43452558-0 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-96lzf Successfully assigned default/mssql-deployment-2017-8698fb8bf5-96lzf to aks-agentpool-43452558-0 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 Created pod: mssql-deployment-2017-8698fb8bf5-96lzf 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-ln8zt The node had condition: [MemoryPressure]. 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-96lzf The node had condition: [MemoryPressure]. 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-4pw7d The node had condition: [MemoryPressure]. 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-4pw7d Successfully assigned default/mssql-deployment-2017-8698fb8bf5-4pw7d to aks-agentpool-43452558-0 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 Created pod: mssql-deployment-2017-8698fb8bf5-4pw7d 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-2pw2z The node had condition: [MemoryPressure]. 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-2pw2z Successfully assigned default/mssql-deployment-2017-8698fb8bf5-2pw2z to aks-agentpool-43452558-0 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 Created pod: mssql-deployment-2017-8698fb8bf5-2pw2z 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-4bn6c The node had condition: [MemoryPressure]. 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 Created pod: mssql-deployment-2017-8698fb8bf5-4bn6c 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 Created pod: mssql-deployment-2017-8698fb8bf5-stzfw 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-4bn6c Successfully assigned default/mssql-deployment-2017-8698fb8bf5-4bn6c to aks-agentpool-43452558-0 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-stzfw The node had condition: [MemoryPressure]. 17m Normal SuccessfulCreate replicaset/mssql-deployment-2017-8698fb8bf5 (combined from similar events): Created pod: mssql-deployment-2017-8698fb8bf5-td24w 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-wpgcx Successfully assigned default/mssql-deployment-2017-8698fb8bf5-wpgcx to aks-agentpool-43452558-0 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-wpgcx The node had condition: [MemoryPressure]. 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-stzfw Successfully assigned default/mssql-deployment-2017-8698fb8bf5-stzfw to aks-agentpool-43452558-3 17m Warning Evicted pod/mssql-deployment-2017-8698fb8bf5-nmq65 The node had condition: [MemoryPressure]. 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-nmq65 Successfully assigned default/mssql-deployment-2017-8698fb8bf5-nmq65 to aks-agentpool-43452558-0 17m Normal NodeHasInsufficientMemory node/aks-agentpool-43452558-0 Node aks-agentpool-43452558-0 status is now: NodeHasInsufficientMemory 17m Normal Scheduled pod/mssql-deployment-2017-8698fb8bf5-td24w Successfully assigned default/mssql-deployment-2017-8698fb8bf5-td24w to aks-agentpool-43452558-3 16m Normal SuccessfulAttachVolume pod/mssql-deployment-2017-8698fb8bf5-td24w AttachVolume.Attach succeeded for volume "pvc-f35b270a-e063-11e9-9b6d-ee8baa4f9319" 15m Normal Pulling pod/mssql-deployment-2017-8698fb8bf5-td24w Pulling image "mcr.microsoft.com/mssql/server:2017-CU16-ubuntu" 15m Normal Pulled pod/mssql-deployment-2017-8698fb8bf5-td24w Successfully pulled image "mcr.microsoft.com/mssql/server:2017-CU16-ubuntu" 15m Normal Started pod/mssql-deployment-2017-8698fb8bf5-td24w Started container mssql 15m Normal Created pod/mssql-deployment-2017-8698fb8bf5-td24w Created container mssql 12m Normal NodeHasSufficientMemory node/aks-agentpool-43452558-0 Node aks-agentpool-43452558-0 status is now: NodeHasSufficientMemory
Understanding Allocatable Memory in Kubernetes
kubectl describe nodes aks-agentpool-43452558-0 Name: aks-agentpool-43452558-0 ...output omitted... Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Tue, 10 Sep 2019 16:20:00 -0500 Tue, 10 Sep 2019 16:20:00 -0500 RouteCreated RouteController created a route MemoryPressure False. Sat, 28 Sep 2019 07:58:56 -0500. Sat, 28 Sep 2019 07:53:55 -0500. KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Sat, 28 Sep 2019 07:58:56 -0500 Tue, 10 Sep 2019 16:18:27 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Sat, 28 Sep 2019 07:58:56 -0500 Tue, 10 Sep 2019 16:18:27 -0500 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Sat, 28 Sep 2019 07:58:56 -0500 Tue, 10 Sep 2019 16:18:27 -0500 KubeletReady kubelet is posting ready status. AppArmor enabled Addresses: Hostname: aks-agentpool-43452558-0 InternalIP: 10.240.0.6 Capacity: attachable-volumes-azure-disk: 8 cpu: 2 ephemeral-storage: 101584140Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 7113152Ki pods: 110 Allocatable: attachable-volumes-azure-disk: 8 cpu: 1931m ephemeral-storage: 93619943269 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 4667840Ki pods: 110 ...output omitted... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning EvictionThresholdMet 10m kubelet, aks-agentpool-43452558-0 Attempting to reclaim memory Normal NodeHasInsufficientMemory 10m kubelet, aks-agentpool-43452558-0 Node aks-agentpool-43452558-0 status is now: NodeHasInsufficientMemory Normal NodeHasSufficientMemory 5m15s (x2 over 14d) kubelet, aks-agentpool-43452558-0 Node aks-agentpool-43452558-0 status is now: NodeHasSufficientMemory
SQL Server’s View of Memory on Kubernetes Nodes
2019-09-28 14:46:16.23 Server Detected 5557 MB of RAM. This is an informational message; no user action is required.
Configuring Pod Limits for SQL Server
So how do we fix all of this? We need to tell set a resource limit in our Pod Spec. Limits allow us to control the amount of a resource exposed to a Pod. And in our case, we want to limit the amount of memory we want SQL Server to see. In our environment we know we have 4667840Ki (~4.45GB) of Allocatable memory for Pods on Nodes so let’s set a value lower than that…and to be super safe I’m going to use 3GB. In the code below you can see in the Pod Spec for our mssql container we have a section for resources, limits and a value of memory: “3Gi”.
spec: hostname: sql3 containers: - name: mssql image: 'mcr.microsoft.com/mssql/server:2017-CU16-ubuntu' ports: - containerPort: 1433 env: - name: ACCEPT_EULA value: "Y" - name: MSSQL_AGENT_ENABLED value: "true" - name: SA_PASSWORD valueFrom: secretKeyRef: name: mssql key: SA_PASSWORD resources: limits: memory: "3Gi" volumeMounts: - name: mssqldb mountPath: /var/opt/mssql volumes: - name: mssqldb persistentVolumeClaim: claimName: pvc-sql-system-2017
2019-09-28 14:01:46.16 Server Detected 2458 MB of RAM. This is an informational message; no user action is required.
Summary
With that, I hope you can see why I consider this the number one thing to look out for when deploying SQL Server in Kubernetes. Setting the appropriate values will make sure that your workload stays up and running and happy with the other workload you have running in your cluster. What’s the best value to set, well we need to take into account the amount of memory on the Node and the amount of memory we need to run our workload, taking into account the reservations needed by both Kubernetes and SQLPAL. Additionally, we should set max server memory as an instance level setting inside of SQL Server to limit the amount of memory that’s allocatable. My suggestion to you is to configure both a resource limit at the Pod level and configure max server memory at the instance level.
If you want to read more about resource management and pod eviction check out this resources:
The post Memory Settings for Running SQL Server in Kubernetes appeared first on Centino Systems Blog.