Skip to content

Resource Limits & Horizontal Pod Autoscaling (HPA)

Kubernetes allows you to control how much CPU and memory a container can use and automatically scale applications based on resource usage.

This page provides a quick reference for resource limits, requests, and horizontal pod autoscaling (HPA).


Resource Requests vs Limits

Example configuration:

resources:
  requests:
    cpu: "250m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"
Field Meaning
Requests Minimum resources guaranteed to the container
Limits Maximum resources the container is allowed to use

CPU Units

Value Meaning
1000m 1 CPU core
500m 0.5 CPU
250m 0.25 CPU

Explanation:

  • 250m CPU = 0.25 CPU core
  • 128Mi = 128 mebibytes of memory

Requests affect pod scheduling, while limits control runtime enforcement.


Why Resource Limits Matter

Without limits:

  • Containers may consume excessive CPU or memory
  • Other workloads may starve
  • Nodes may become unstable

Setting limits helps Kubernetes maintain cluster stability and fairness.


Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler automatically adjusts the number of pod replicas based on metrics such as CPU usage.

Example:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
This configuration means:

  • Minimum pods: 2
  • Maximum pods: 10
  • Scale when CPU usage exceeds 70%

Useful Commands

Check HPA status:

kubectl get hpa

View detailed information:

kubectl describe hpa <name>


Metrics Requirement

HPA requires the metrics server to be installed.

Check if it is running:

kubectl get deployment metrics-server -n kube-system
If it is missing, install the Kubernetes metrics server.


Common Scaling Strategy

Typical production configuration:

minReplicas: 2
maxReplicas: 10
CPU target: 60–75%

This provides:

  • Baseline redundancy
  • Ability to scale under load
  • Controlled resource consumption

  • Deployments
  • Resource Requests and Limits
  • Kubernetes Metrics Server