In Creating your first cluster with Cluster API we demonstrated how to create a cluster using a standard declarative configuration. Now we will show how to leverage this declarative configuration to update cluster topology with ease.
Table of Contents
- What is a Cluster Topology?
- Add a new pool of worker nodes
- Remove a worker node
- Scale out control plane nodes
- More scale operations
- Next: MachineHealthChecks and Remediation
- More information
We define a cluster topology as the set of configurations that describe your cluster: a few examples are the number of control plane and worker nodes; the type of Machine hardware that underlies various nodes; regional distribution across nodes or node pools. You may also see this described as "cluster shape" elsewhere.
Now we can demonstrate how easily you can leverage the flexibility of Cluster API to add and remove pools of nodes. We'll re-use the existing default-worker
MachineDeployment class; in other words, we'll define a discrete, new node set based on a pre-existing, common worker machine recipe.
For reference, this is the declarative block that defines the default-worker
MachineDeployment class in the ClusterClass spec we originally installed:
workers:
machineDeployments:
- class: default-worker
template:
bootstrap:
ref:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: quick-start-default-worker-bootstraptemplate
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
name: quick-start-default-worker-machinetemplate
We'll reference the above default-worker
class in an updated Cluster configuration that declares a second worker pool.
The difference between this spec and the original spec used to create the docker-cluster-one
Cluster:
diff yamls/clusters/1-docker-cluster-one.yaml yamls/clusters/3-docker-cluster-one-second-worker-pool.yaml
Output:
35a36,38
> - class: default-worker # Adding a 2nd worker pool
> name: md-1
> replicas: 1
The above shows that we've added a new MachineDeployment
called md-1
(Our existing MachineDeployment
is named md-0
) to our new spec. By applying this modified spec to our kind management cluster we will initiate the creation of a new node from this new node pool:
kubectl apply -f yamls/clusters/3-docker-cluster-one-second-worker-pool.yaml
Output:
cluster.cluster.x-k8s.io/docker-cluster-one configured
Let's watch those new nodes come online:
kubectl --kubeconfig cluster-one.kubeconfig get nodes -w
Output:
NAME STATUS ROLES AGE VERSION
docker-cluster-one-md-0-nrh7k-75ddc4778f-wwpg9 Ready <none> 25h v1.24.6
docker-cluster-one-xbqb2-bcjvj Ready control-plane 26h v1.24.6
docker-cluster-one-md-1-ksqwt-8489684d5b-fpgjs NotReady <none> 0s v1.24.6
docker-cluster-one-md-1-ksqwt-8489684d5b-fpgjs NotReady <none> 46s v1.24.6
docker-cluster-one-md-1-ksqwt-8489684d5b-fpgjs NotReady <none> 46s v1.24.6
docker-cluster-one-md-1-ksqwt-8489684d5b-fpgjs Ready <none> 76s v1.24.6
clusterctl describe now shows two MachineDeployments:
clusterctl describe cluster docker-cluster-one
Output:
NAME READY SEVERITY REASON SINCE MESSAGE
Cluster/docker-cluster-one True 11m
├─ClusterInfrastructure - DockerCluster/docker-cluster-one-4ffkk True 12m
├─ControlPlane - KubeadmControlPlane/docker-cluster-one-hfrd8 True 11m
│ └─Machine/docker-cluster-one-hfrd8-4r5mp True 11m
└─Workers
├─MachineDeployment/docker-cluster-one-md-0-mbbfs True 10m
│ └─Machine/docker-cluster-one-md-0-mbbfs-dbb74566d-pvww6 True 10m
└─MachineDeployment/docker-cluster-one-md-1-s2bdz True 23s
└─Machine/docker-cluster-one-md-1-s2bdz-7475cfb889-7jnf9 True 28s
We can now demonstrate how easy it is to "roll back" such a change, as well as show how to remove an existing node pool from your Cluster configuration. In our case it's as easy as reapplying the original Cluster spec, which only declares one md-0
node pool:
kubectl apply -f yamls/clusters/1-docker-cluster-one.yaml
Once again, we should observe only one running worker node, in the md-0
pool:
kubectl --kubeconfig cluster-one.kubeconfig get nodes
Output:
NAME STATUS ROLES AGE VERSION
docker-cluster-one-md-0-lq4f8-b59497b9d-xchpm Ready <none> 99m v1.24.6
docker-cluster-one-mvthd-k7dwf Ready control-plane 174m v1.24.6
Note: If you're running this tutorial on Windows, or your system is running below the minimum resource requirements it's highly recommended you move on to the next section: MachineHealthChecks and Remediation. If you run into trouble at any point please consult the troubleshooting guide.
A common Kubernetes cluster maintenance activity is scaling out (or in) the number of control plane nodes in response to cluster activity. Because Cluster API configuration interfaces are in fact Kubernetes resources, there are a lot of ways to do this. We'll demonstrate using a variety of methods.
Because we are leveraging the Kubernetes declarative model, we can simply refer to a desired configuration specification and rely upon Cluster API to evaluate what's different between the two.
Assuming that your docker-cluster-one
Cluster still has its original configuration, you should have one control plane node:
kubectl --kubeconfig cluster-one.kubeconfig get nodes
Output:
NAME STATUS ROLES AGE VERSION
docker-cluster-one-md-0-nrh7k-75ddc4778f-vlph9 Ready <none> 24m v1.24.6
docker-cluster-one-xbqb2-tlmpb Ready control-plane 24m v1.24.6
Let's use the idempotent model to submit a modified configuration of our docker-cluster-one
Cluster with 3 control plane replicas instead of 1. We've provided a reference yaml of this updated configuration in this repo.
If you're on masOS or Linux you can diff this modified spec from the original spec used to create the docker-cluster-one
Cluster:
diff yamls/clusters/1-docker-cluster-one.yaml yamls/clusters/3-docker-cluster-one-3-control-plane-replicas.yaml
Output:
17c17
< replicas: 1
---
> replicas: 3 # Replicas changed from 1 to 3
The above shows that the yaml specs are almost identical, with the only change being the replicas
value on L17
. By applying that modified spec to our kind management cluster we can achieve a control plane node scale out operation:
kubectl apply -f yamls/clusters/3-docker-cluster-one-3-control-plane-replicas.yaml
Output:
cluster.cluster.x-k8s.io/docker-cluster-one configured
Now we can watch the new control plane nodes come online:
kubectl --kubeconfig cluster-one.kubeconfig get nodes -w
An interesting note! As your cluster transitions from 1
to 2
control plane nodes, it will temporarily lose etcd quorum and the apiserver running on your cluster will briefly go offline. So your kubectl -w
command will be interrupted. Re-run the same command again to resume watching your cluster nodes:
kubectl --kubeconfig cluster-one.kubeconfig get nodes -w
Output:
NAME STATUS ROLES AGE VERSION
docker-cluster-one-md-0-nrh7k-75ddc4778f-vlph9 Ready <none> 86m v1.24.6
docker-cluster-one-xbqb2-tlmpb Ready control-plane 86m v1.24.6
docker-cluster-one-xbqb2-8wcbj NotReady <none> 21s v1.24.6
docker-cluster-one-xbqb2-8wcbj NotReady <none> 21s v1.24.6
docker-cluster-one-xbqb2-8wcbj NotReady control-plane 21s v1.24.6
docker-cluster-one-xbqb2-8wcbj NotReady control-plane 23s v1.24.6
docker-cluster-one-xbqb2-8wcbj NotReady control-plane 23s v1.24.6
docker-cluster-one-xbqb2-8wcbj NotReady control-plane 30s v1.24.6
docker-cluster-one-xbqb2-8wcbj NotReady control-plane 30s v1.24.6
docker-cluster-one-xbqb2-8wcbj NotReady control-plane 30s v1.24.6
docker-cluster-one-xbqb2-8wcbj Ready control-plane 31s v1.24.6
docker-cluster-one-xbqb2-8wcbj Ready control-plane 48s v1.24.6
docker-cluster-one-xbqb2-8wcbj Ready control-plane 48s v1.24.6
docker-cluster-one-xbqb2-8wcbj Ready control-plane 53s v1.24.6
docker-cluster-one-xbqb2-sjlwm NotReady <none> 0s v1.24.6
docker-cluster-one-xbqb2-sjlwm NotReady <none> 0s v1.24.6
docker-cluster-one-xbqb2-sjlwm NotReady <none> 0s v1.24.6
docker-cluster-one-xbqb2-sjlwm NotReady <none> 0s v1.24.6
docker-cluster-one-xbqb2-sjlwm NotReady <none> 2s v1.24.6
docker-cluster-one-xbqb2-sjlwm NotReady <none> 2s v1.24.6
docker-cluster-one-xbqb2-sjlwm NotReady <none> 7s v1.24.6
docker-cluster-one-xbqb2-sjlwm NotReady <none> 7s v1.24.6
docker-cluster-one-xbqb2-sjlwm NotReady <none> 7s v1.24.6
docker-cluster-one-xbqb2-sjlwm Ready <none> 10s v1.24.6
docker-cluster-one-xbqb2-sjlwm Ready <none> 10s v1.24.6
docker-cluster-one-xbqb2-sjlwm Ready <none> 12s v1.24.6
For such a simple cluster topology change against a single configuration, it's also possible to update our Cluster resource in-place. Let's use kubectl edit
to do that.
kubectl edit cluster/docker-cluster-one
The above command will open your locally configured editor (for example, most macOS and Linux environments will be configured to launch vim
). You'll want to look for the yaml configuration at the path spec.topology.controlPlane.replicas
. Based on our prior scale out the value should be 3
. Go ahead and change that back to 1
, and then save the changes in your editor:
Output after editing, saving and exiting your editor:
cluster.cluster.x-k8s.io/docker-cluster-one edited
After a few minutes we should now see the cluster back to reporting 1 control plane node (note that we're pointing kubectl
to our workload cluster below using the previously saved cluster-one.kubeconfig
kubeconfig file):
kubectl --kubeconfig cluster-one.kubeconfig get nodes
Output:
NAME STATUS ROLES AGE VERSION
docker-cluster-one-md-0-nrh7k-75ddc4778f-vlph9 Ready <none> 118m v1.24.6
docker-cluster-one-xbqb2-8wcbj Ready control-plane 31m v1.24.6
We can do the same gesture to scale worker nodes as well. This time we want to edit the configuration at path spec.topology.workers.machineDeployments
. We should only have one item in that array; change its replicas
value from 1
to 3
:
kubectl edit cluster/docker-cluster-one
Output after editing, saving, and exiting your editor:
cluster.cluster.x-k8s.io/docker-cluster-one edited
If you updated from 1
to 3
, you would see those 3
nodes gradually come online after the change to our Cluster's spec.topology.workers.machineDeployments
replicas
value:
kubectl --kubeconfig cluster-one.kubeconfig get nodes
Output:
NAME STATUS ROLES AGE VERSION
docker-cluster-one-md-0-nrh7k-75ddc4778f-lzfr4 Ready <none> 7m44s v1.24.6
docker-cluster-one-md-0-nrh7k-75ddc4778f-t28hd Ready <none> 7m40s v1.24.6
docker-cluster-one-md-0-nrh7k-75ddc4778f-vlph9 Ready <none> 131m v1.24.6
docker-cluster-one-xbqb2-8wcbj Ready control-plane 44m v1.24.6
Note!: Because we're executing the above topology changes using the Docker provider, your local system may not have sufficient resources to run 4
nodes. If so, you can skip the control-plane scaling part and reduce the number of worker nodes to 2
from 3
.
Feel free to continue experimenting with scaling. When you're done and ready to move forward, let's go back to our original topology configuration of 1
control plane node and 1
worker node. It's easy to do that by simply reapplying our original Cluster spec, which declares that configuration:
kubectl apply -f yamls/clusters/1-docker-cluster-one.yaml
You are now in control of your Cluster's topology configuration! Let's next explore MachineHealthChecks and Remediation for operational self-healing.
- To learn more about managing Kubernetes clusters using a Cluster Topology see the topology section of the CAPI book.