Introduction
Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set, or stateful set based on observed CPU utilization or other application-approved metrics.
This post assumes that you already have a microservice application deployed on AKS cluster. To setup HPA for a cluster deployed on AKS, follow the below steps.
Install metrics-server on your AKS cluster
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
You will know that metric-server is successfully installed if you get the proper output for the following commands:
- kubectl top pod
- kubectl top nodes
After the metric-server is successfully setup, it is time to make sure the metric requests and limits are properly configured in the Kubernetes manifest file. The following is a sample manifest file where the request and limit are configured for CPU:
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- annotations:
- kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe
- convert
- kompose.version: 1.21.0 (992df58d8)
- creationTimestamp: null
- labels:
- io.kompose.service: loginservicedapr
- name: loginservicedapr
- spec:
- replicas: 1
- selector:
- matchLabels:
- io.kompose.service: loginservicedapr
- strategy: {}
- template:
- metadata:
- annotations:
- kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe
- convert
- kompose.version: 1.21.0 (992df58d8)
- creationTimestamp: null
- labels:
- io.kompose.service: loginservicedapr
- spec:
- containers:
- image: loginservicedapr:latest
- imagePullPolicy: ""
- name: loginservicedapr
- resources:
- requests:
- cpu: "250m"
- limits:
- cpu: "500m"
- ports:
- - containerPort: 80
- restartPolicy: Always
- serviceAccountName: ""
- volumes: null
- status: {}
Please note that the above deployment file with its resource and limits will work on a Kubernetes cluster deployed on cloud, like Azure or AWS. The above syntax might not work for an on-prem deployment.
A Little Tip
The ‘kompose’ labels you see in the above deployment file are added by Kubernetes compose extension. I had initially developed my microservice on docker. The deployment file was designed for docker-compose. So when I had to deploy it to AKS, I used the ‘kompose’ service to convert my docker-compose deployment file to suit AKS deployment.
Following is the HPA deployment file for the above service:
- apiVersion: autoscaling/v1
- kind: HorizontalPodAutoscaler
- metadata:
- name: loginservicedapr-hpa
- spec:
- maxReplicas: 10 # define max replica count
- minReplicas: 3 # define min replica count
- scaleTargetRef:
- apiVersion: apps/v1
- kind: Deployment
- name: loginservicedapr
- metrics:
- - type: Resource
- resource:
- name: cpu
- target:
- type: Utilization
- averageUtilization: 50
- - type: Pods
- pods:
- name: cpu
- target:
- type: Utilization
- averageUtilization: 50
Please note that I have used the default algorithm provided by Kubernetes HPA itself to calculate the max and min replicas and the average utilization percentage. Based on your application’s metrics you can devise your own algorithm and put down the numbers.
To deploy the HPA file, do the following:
kubectl apply -f loginserivcedapr-hpa.yaml
If HPA is properly deployed and configured, you should get an output like below:
In case you received an output like below:
The HPA is not properly deployed or configured. In this case, you have to check what is the exact error HPA is throwing internally.
Try troubleshooting based on the reason that the conditions have their status set as ‘False’, from the above output.
To check whether the HPA is functioning as expected, you can devise a load test from one of the open-source Load Testing platforms. I have used LoadImpact’s cloud-based SAAS service TestBuilder(https://k6.io/docs/cloud/creating-and-running-a-test/test-builder).
Use the load test to continuously send requests to one of the APIs in the service deployed above. The free subscription with LoadImpact allows up to 50 Virtual Users. When you start the load test you should see an increase in CPU utilization,
Try to send more requests so that it reaches the threshold and check whether HPA scales out the number of replicas or not.