In Kubernetes, scaling can mean different things to different users. We distinguish between two cases:
Cluster scaling, sometimes called infrastructure-level scaling, refers to the (automated) process of adding or removing worker nodes based on cluster utilization.
Application-level scaling, sometimes called pod scaling, refers to the (automated) process of manipulating pod characteristics based on a variety of metrics, from low-level signals such as CPU utilization to higher-level ones, such as HTTP requests served per second, for a given pod. Two kinds of pod-level scalers exist:
Horizontal Pod Autoscalers (HPAs), which increase or decrease the number of pod replicas depending on certain metrics.
Vertical Pod Autoscalers (VPAs), which increase or decrease the resource requirements of containers running in a pod. Since VPAs are still under development as of January 2018, we will not discuss them here. If you’re interested in this topic, you can read about them in Michael’s blog post “Container resource consumption—too important to ignore”.
In the chapter, we first examine cluster-level scaling for AWS and GKE, then discuss app-level scaling with HPAs.
You have a deployment and want to scale it horizontally.
kubectl scale command to scale out a deployment.
Let’s reuse the
fancyapp deployment from Recipe 4.4, with five replicas. If it’s not running yet, create it with
kubectl create -f fancyapp.yaml ...