Horizontal Pod Autoscaler · Kubernetes Quickdive

The Horizontal Pod Autoscaler (HPA) fits into the Kubernetes control plane as yet another controller that automatically adjusts the number of pods in your deployments based on observed metrics. Here's how it works and where it fits in the Kubernetes architecture:

What HPA Does:

The HPA automatically scales the number of pods in a deployment, replication controller, or replica set based on:

CPU utilization
Memory usage
Custom metrics
External metrics

Where HPA Fits in the Kubernetes Architecture:

┌─────────────────────────────┐
│     Control Plane           │
│                             │
│  ┌─────────────────────┐    │
│  │  API Server         │    │
│  └─────────────────────┘    │
│            ▲                │
│            │                │
│            ▼                │
│  ┌─────────────────────┐    │
│  │  Controller Manager │    │
│  │                     │    │
│  │  • Node Controller  │    │
│  │  • Replica Controller    │
│  │  • ...              │    │
│  │  • HPA Controller ──┼────┼──► Scales Deployments
│  └─────────────────────┘    │
│            ▲                │
│            │                │
│            ▼                │
│  ┌─────────────────────┐    │
│  │  Metrics Server     │◄───┼──── Collects metrics from kubelet
│  └─────────────────────┘    │
└─────────────────────────────┘

How HPA Works:

Metrics Collection:
- The metrics-server component collects resource metrics from kubelets
- For custom metrics, you might use Prometheus Adapter or similar
- These metrics are exposed through the Metrics API
HPA Controller Loop:
- The HPA controller runs as part of the controller-manager
- It periodically (default: 15 seconds) checks the metrics
- Calculates desired replica count based on target utilization
Scaling Decision:
- HPA calculates the desired number of replicas using this formula: desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
- Example: If current CPU is 200% of target, it doubles the pods
Update Deployment:
- HPA updates the replica count on the deployment/replicaset
- The deployment controller notices this change
- The deployment controller creates/deletes pods accordingly
- The scheduler places new pods on nodes

Configuring HPA:

Here's a basic HPA that targets CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80

Advanced HPA Features:

Multiple Metrics: Scale based on the maximum recommendation from multiple metrics

metrics:
- type: Resource
 resource:
   name: cpu
   target:
     type: Utilization
     averageUtilization: 50
- type: Resource
 resource:
   name: memory
   target:
     type: Utilization
     averageUtilization: 50

Custom Metrics: Scale based on application-specific metrics

metrics:
- type: Pods
 pods:
   metric:
     name: packets-per-second
   target:
     type: AverageValue
     averageValue: 1k

External Metrics: Scale based on metrics from outside Kubernetes

metrics:
- type: External
 external:
   metric:
     name: queue_messages_ready
     selector:
       matchLabels:
         queue: worker_tasks
   target:
     type: AverageValue
     averageValue: 30

Scaling Behavior: Control how aggressively HPA scales up or down

behavior:
 scaleDown:
   stabilizationWindowSeconds: 300
   policies:
   - type: Percent
     value: 10
     periodSeconds: 60
 scaleUp:
   stabilizationWindowSeconds: 0
   policies:
   - type: Percent
     value: 100
     periodSeconds: 60
   - type: Pods
     value: 4
     periodSeconds: 60

HPA is part of the control plane but relies on metrics being collected and exposed. The metrics-server is a cluster add-on that enables the resource metrics API required by HPA for basic CPU/memory scaling. For more advanced metrics-based scaling, additional components like Prometheus and custom adapters are usually deployed.

This integrates with everything we've discussed earlier - the API server exposes the HPA resources, the controller manager runs the HPA controller, and the changes ultimately result in the scheduler and kubelet creating new pods on nodes.