The Horizontal Pod Autoscaler (HPA) fits into the Kubernetes control plane as yet another controller that automatically adjusts the number of pods in your deployments based on observed metrics. Here's how it works and where it fits in the Kubernetes architecture:

What HPA Does:

The HPA automatically scales the number of pods in a deployment, replication controller, or replica set based on:

Where HPA Fits in the Kubernetes Architecture:

┌─────────────────────────────┐
│     Control Plane           │
│                             │
│  ┌─────────────────────┐    │
│  │  API Server         │    │
│  └─────────────────────┘    │
│            ▲                │
│            │                │
│            ▼                │
│  ┌─────────────────────┐    │
│  │  Controller Manager │    │
│  │                     │    │
│  │  • Node Controller  │    │
│  │  • Replica Controller    │
│  │  • ...              │    │
│  │  • HPA Controller ──┼────┼──► Scales Deployments
│  └─────────────────────┘    │
│            ▲                │
│            │                │
│            ▼                │
│  ┌─────────────────────┐    │
│  │  Metrics Server     │◄───┼──── Collects metrics from kubelet
│  └─────────────────────┘    │
└─────────────────────────────┘

How HPA Works:

  1. Metrics Collection:

    • The metrics-server component collects resource metrics from kubelets
    • For custom metrics, you might use Prometheus Adapter or similar
    • These metrics are exposed through the Metrics API
  2. HPA Controller Loop:

    • The HPA controller runs as part of the controller-manager
    • It periodically (default: 15 seconds) checks the metrics
    • Calculates desired replica count based on target utilization
  3. Scaling Decision:

    • HPA calculates the desired number of replicas using this formula: desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
    • Example: If current CPU is 200% of target, it doubles the pods
  4. Update Deployment:

    • HPA updates the replica count on the deployment/replicaset
    • The deployment controller notices this change
    • The deployment controller creates/deletes pods accordingly
    • The scheduler places new pods on nodes

Configuring HPA:

Here's a basic HPA that targets CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80

Advanced HPA Features:

  1. Multiple Metrics: Scale based on the maximum recommendation from multiple metrics

    metrics:
    - type: Resource
     resource:
       name: cpu
       target:
         type: Utilization
         averageUtilization: 50
    - type: Resource
     resource:
       name: memory
       target:
         type: Utilization
         averageUtilization: 50
    
  2. Custom Metrics: Scale based on application-specific metrics

    metrics:
    - type: Pods
     pods:
       metric:
         name: packets-per-second
       target:
         type: AverageValue
         averageValue: 1k
    
  3. External Metrics: Scale based on metrics from outside Kubernetes

    metrics:
    - type: External
     external:
       metric:
         name: queue_messages_ready
         selector:
           matchLabels:
             queue: worker_tasks
       target:
         type: AverageValue
         averageValue: 30
    
  4. Scaling Behavior: Control how aggressively HPA scales up or down

    behavior:
     scaleDown:
       stabilizationWindowSeconds: 300
       policies:
       - type: Percent
         value: 10
         periodSeconds: 60
     scaleUp:
       stabilizationWindowSeconds: 0
       policies:
       - type: Percent
         value: 100
         periodSeconds: 60
       - type: Pods
         value: 4
         periodSeconds: 60
    

HPA is part of the control plane but relies on metrics being collected and exposed. The metrics-server is a cluster add-on that enables the resource metrics API required by HPA for basic CPU/memory scaling. For more advanced metrics-based scaling, additional components like Prometheus and custom adapters are usually deployed.

This integrates with everything we've discussed earlier - the API server exposes the HPA resources, the controller manager runs the HPA controller, and the changes ultimately result in the scheduler and kubelet creating new pods on nodes.