fix: don't try to clean up pvs on nodes that are gone #480

marcusramberg · 2025-02-27T13:38:37Z

We're running local-provisioner to provide local storage for CI runners where nodes come and go pretty frequently.
We observe that the provisioner is trying run clean up on nodes that are already gone, which causes helper pods
to be stuck in pending state as they cannot be scheduled.

This PR adds a check to see if the node still exists before trying to clean up the node.

marcusramberg · 2025-03-10T08:30:13Z

@derekbit Thoughts about this PR? We're running it in production from a fork now and it has resolved our issue of stuck pvs from old nodes and stuck helper prs trying to schedule on non-existing nodes. I guess it would also address the issues you're seeing in #416 with stuck pvs from previous runs?

github-actions · 2025-04-25T02:06:42Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2025-06-10T02:11:18Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2025-07-26T02:11:55Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

derekbit · 2025-07-30T15:58:26Z

provisioner.go

+			if _, err := p.kubeClient.CoreV1().Nodes().Get(context.TODO(), node, metav1.GetOptions{}); err != nil {
+				logrus.Infof("Node %v does not exist, skipping cleanup of volume %v", node, pv.Name)
+				return nil
+			}


@marcusramberg WDYT?

Suggested change

if _, err := p.kubeClient.CoreV1().Nodes().Get(context.TODO(), node, metav1.GetOptions{}); err != nil {

logrus.Infof("Node %v does not exist, skipping cleanup of volume %v", node, pv.Name)

return nil

}

if _, err := p.kubeClient.CoreV1().Nodes().Get(context.TODO(), node, metav1.GetOptions{}); err != nil && apierrors.IsNotFound(err) {

logrus.Infof("Node %v does not exist, skipping cleanup of volume %v", node, pv.Name)

return nil

}

That seems reasonable to me, I'll update the PR. It's imported in there as k8serror tho.

derekbit

LGTM @marcusramberg. Thanks for your contribution. The improvement will be in v0.0.33 that is scheduled in Oct

marcusramberg changed the title ~~fix: don't try to clean up pvcs on nodes that are gone~~ fix: don't try to clean up pvs on nodes that are gone Feb 27, 2025

marcusramberg force-pushed the marcus/ephemeral_fix branch 4 times, most recently from 69c6989 to 1e1388b Compare March 6, 2025 09:07

github-actions bot added the stale label Apr 25, 2025

derekbit removed the stale label Apr 25, 2025

github-actions bot added the stale label Jun 10, 2025

derekbit removed the stale label Jun 10, 2025

github-actions bot added the stale label Jul 26, 2025

derekbit removed the stale label Jul 27, 2025

derekbit reviewed Jul 30, 2025

View reviewed changes

fix: don't try to clean up pvs on nodes that are gone

bdf05c2

marcusramberg force-pushed the marcus/ephemeral_fix branch from 1e1388b to bdf05c2 Compare July 31, 2025 07:22

marcusramberg requested a review from derekbit July 31, 2025 14:34

derekbit approved these changes Jul 31, 2025

View reviewed changes

derekbit assigned marcusramberg Jul 31, 2025

derekbit merged commit e703098 into rancher:master Jul 31, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: don't try to clean up pvs on nodes that are gone #480

fix: don't try to clean up pvs on nodes that are gone #480

Uh oh!

marcusramberg commented Feb 27, 2025

Uh oh!

marcusramberg commented Mar 10, 2025

Uh oh!

github-actions bot commented Apr 25, 2025

Uh oh!

github-actions bot commented Jun 10, 2025

Uh oh!

github-actions bot commented Jul 26, 2025

Uh oh!

derekbit Jul 30, 2025

Uh oh!

marcusramberg Jul 31, 2025 •

edited

Loading

Uh oh!

derekbit left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: don't try to clean up pvs on nodes that are gone #480

fix: don't try to clean up pvs on nodes that are gone #480

Uh oh!

Conversation

marcusramberg commented Feb 27, 2025

Uh oh!

marcusramberg commented Mar 10, 2025

Uh oh!

github-actions bot commented Apr 25, 2025

Uh oh!

github-actions bot commented Jun 10, 2025

Uh oh!

github-actions bot commented Jul 26, 2025

Uh oh!

derekbit Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

marcusramberg Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

derekbit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcusramberg Jul 31, 2025 •

edited

Loading