The Horizontal Pod Autoscaler (HPA) fits into the Kubernetes control plane as yet another controller that automatically adjusts the number of pods in your deployments based on observed metrics. Here's how it works and where it fits in the Kubernetes architecture:
What HPA Does:
The HPA automatically scales the number of pods in a deployment, replication controller, or replica set based on:
- CPU utilization
- Memory usage
- Custom metrics
- External metrics
Where HPA Fits in the Kubernetes Architecture:
┌─────────────────────────────┐
│ Control Plane │
│ │
│ ┌─────────────────────┐ │
│ │ API Server │ │
│ └─────────────────────┘ │
│ ▲ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Controller Manager │ │
│ │ │ │
│ │ • Node Controller │ │
│ │ • Replica Controller │
│ │ • ... │ │
│ │ • HPA Controller ──┼────┼──► Scales Deployments
│ └─────────────────────┘ │
│ ▲ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Metrics Server │◄───┼──── Collects metrics from kubelet
│ └─────────────────────┘ │
└─────────────────────────────┘
How HPA Works:
Metrics Collection:
- The metrics-server component collects resource metrics from kubelets
- For custom metrics, you might use Prometheus Adapter or similar
- These metrics are exposed through the Metrics API
HPA Controller Loop:
- The HPA controller runs as part of the controller-manager
- It periodically (default: 15 seconds) checks the metrics
- Calculates desired replica count based on target utilization
Scaling Decision:
- HPA calculates the desired number of replicas using this formula:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
- Example: If current CPU is 200% of target, it doubles the pods
- HPA calculates the desired number of replicas using this formula:
Update Deployment:
- HPA updates the replica count on the deployment/replicaset
- The deployment controller notices this change
- The deployment controller creates/deletes pods accordingly
- The scheduler places new pods on nodes
Configuring HPA:
Here's a basic HPA that targets CPU utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
Advanced HPA Features:
Multiple Metrics: Scale based on the maximum recommendation from multiple metrics
metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 50
Custom Metrics: Scale based on application-specific metrics
metrics: - type: Pods pods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k
External Metrics: Scale based on metrics from outside Kubernetes
metrics: - type: External external: metric: name: queue_messages_ready selector: matchLabels: queue: worker_tasks target: type: AverageValue averageValue: 30
Scaling Behavior: Control how aggressively HPA scales up or down
behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 10 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 60 - type: Pods value: 4 periodSeconds: 60
HPA is part of the control plane but relies on metrics being collected and exposed. The metrics-server is a cluster add-on that enables the resource metrics API required by HPA for basic CPU/memory scaling. For more advanced metrics-based scaling, additional components like Prometheus and custom adapters are usually deployed.
This integrates with everything we've discussed earlier - the API server exposes the HPA resources, the controller manager runs the HPA controller, and the changes ultimately result in the scheduler and kubelet creating new pods on nodes.