Skip to main content

Horizontal Pod Autoscaler (HPA)

HPA - the Horizontal Pod Autoscaler is a Kubernetes extension that automatically adjusts the number of replicas of a deployment in response to the resource demand of a workload.

For more information, see Kubernetes docs.

How to use the Horizontal Pod Autoscaler resource in KEI?

After you have created a deployment in KEI, you can use the Horizontal Pod Autoscaler to automatically scale the number of replicas of the deployment.

  • Create a yaml file, such as hpa.yaml with the following content:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50

  • Apply it to your application:
kubectl apply -f hpa.yaml
  • See your HPA object running on Section:
kubectl get hpa.v2beta2.autoscaling

See other supported kubectl commands you can use with the HPA resource.

What parts of the Horizontal Pod Autoscaler spec are supported by KEI?

  • At this stage KEI, supports the autoscaling/v2beta2 version of the Horizontal Pod Autoscaler API object.
  • The following fields (including subfields) of the Horizontal Pod Autoscaler spec are supported:
    • scaleTargetRef
    • minReplicas
    • maxReplicas
    • metrics
      • type: Resource
  • When using a Resource metric, scaling is only supported based on the cpu and memory resources.
  • The maxReplicas field can have the highest value of 20.

Adaptive Edge Engine(AEE) and Horizontal Pod Autoscaler (HPA)

The AEE and the HPA work together to provide a scalable container deployment that scales across the globe and within a particular edge location.

While AEE deploys the deployment to new edge locations depending on the traffic requirements in a particular region, the HPA is used to scale the number of replicas of the deployment in a particular edge location based on the resource (CPU and/or memory) demand.