Chapter 9. Scaling

In Kubernetes, scaling can mean different things to different users. We distinguish between two cases:

  • Cluster scaling, sometimes called infrastructure-level scaling, refers to the (automated) process of adding or removing worker nodes based on cluster utilization.

  • Application-level scaling, sometimes called pod scaling, refers to the (automated) process of manipulating pod characteristics based on a variety of metrics, from low-level signals such as CPU utilization to higher-level ones, such as HTTP requests served per second, for a given pod. Two kinds of pod-level scalers exist:

    • Horizontal Pod Autoscalers (HPAs), which increase or decrease the number of pod replicas depending on certain metrics.

    • Vertical Pod Autoscalers (VPAs), which increase or decrease the resource requirements of containers running in a pod. Since VPAs are still under development as of January 2018, we will not discuss them here. If you’re interested in this topic, you can read about them in Michael’s blog post “Container resource consumption—too important to ignore”.

In the chapter, we first examine cluster-level scaling for AWS and GKE, then discuss app-level scaling with HPAs.

9.1 Scaling a Deployment

Problem

You have a deployment and want to scale it horizontally.

Solution

Use the kubectl scale command to scale out a deployment.

Let’s reuse the fancyapp deployment from Recipe 4.4, with five replicas. If it’s not running yet, create it with kubectl create -f fancyapp.yaml ...

Get Kubernetes Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.