-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
What steps did you take and what happened?
KCP support remediating single CP machines up to when control plane is considered initialized
A control plane which is considered initialized when KCP can actually connect to the workload cluster and it detects that kubeadm config has been created, which is a proxy signal for kubeadm init completed.
However in some edge case, when users have an aggressive nodStartupTimeout and infra is slow for any reason, it might happed that deletion of the first CP machine is triggered, and kubeadm init completes in the short timeframe between when machine deletion is triggered and when the machine goes away.
This leads to an inconsistent state where cluster is initialized, no CP machine exists, and the replacement CP machine fails when trying to join
What did you expect to happen?
KCP should not consider control plane initialized if there is only a machine being deleted.
Cluster API version
main
Kubernetes version
No response
Anything else you would like to add?
No response
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.