Skip to content

Conversation

@databus23
Copy link
Member

@databus23 databus23 commented Nov 22, 2022

Deleting volumes in the deorbiter is failing for multiple minutes because the nodes are gone and the csi daemonset is not unmounting the volumes.
This leads to very long cluster deletion times in our soak test:

...
--- PASS: TestRunner/Cleanup/Cluster/IsDeleted (576.01s)
...

This changes lets the launch controller wait for the deorbiter to finish work before starting to terminate nodes.

For the volumes to actually be deleted we need to remove pods that hold a reference to the pvc how that the nodes stay around.

With this change the deletion time is down consideribly:

...
--- PASS: TestRunner/Cleanup/Cluster/IsDeleted (126.01s)
...

(I think deletion can be even faster, its currently limited by the 120s wait for loadbalancer deletion)

Open question:

  • What about deployments/replicasets/statefulsets creating new pods with pvc references. (Maybe cordening all nodes is sufficient?)

deleting volumes in the deorbiter is failing for multiple minutes because the nodes are gone and the csi daemonset is not unmounting the volumes.

This changes lets the launch controller wait for the deorbiter to finish work before starting to terminate nodes.

For the volumes to actually be deleted we need to remove pods that hold a reference to the pvc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants