Skip to content

🐛 (API): Fix file descriptor leak from defer inside loop in external plugin handler#5688

Open
SebTardif wants to merge 1 commit into
kubernetes-sigs:masterfrom
SebTardif:fix/defer-in-loop-fd-leak
Open

🐛 (API): Fix file descriptor leak from defer inside loop in external plugin handler#5688
SebTardif wants to merge 1 commit into
kubernetes-sigs:masterfrom
SebTardif:fix/defer-in-loop-fd-leak

Conversation

@SebTardif
Copy link
Copy Markdown

@SebTardif SebTardif commented May 9, 2026

Problem

In pkg/plugins/external/helpers.go, the handlePluginResponse function uses defer f.Close() inside a for loop (line 203). This causes file descriptor retention: all deferred close calls stack up and only run when the enclosing function returns, not at each loop iteration.

If res.Universe contains N files, all N file descriptors remain open simultaneously until handlePluginResponse completes. For large responses or long-running operations, this can exhaust file descriptors.

This bug was introduced in #2338 (2021-08-10) and has been present for nearly 5 years.

Fix

Replace the defer block with explicit f.Close() calls: one on the write error path (before returning) and one on the success path (after the write). This ensures each file is closed within its own loop iteration, releasing the file descriptor immediately after use.

Verification

  • make lint-fix: 0 issues
  • go test ./pkg/plugins/external/...: all pass

Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 9, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @SebTardif. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 9, 2026
@v47
Copy link
Copy Markdown
Contributor

v47 commented May 10, 2026

Thanks for the improvement. I think the code change is directionally correct, but the PR description should be adjusted. The defer f.Close() inside the loop does keep all file descriptors open until handlePluginResponse returns, so the FD-retention issue is real and this patch addresses it.

However, I do not think this part is accurate:

Only the last file is actually closed. The closure captures f by reference.

If we take a look at the assembly. In the deferred version, the loop body registers a deferred closure instead of calling Close directly:

CALL    os.Create(SB)

LEAQ    type:noalg.struct { F uintptr; X0 *os.File }(SB), AX
CALL    runtime.newobject(SB)

LEAQ    main.deferLoop.func1(SB), CX
MOVQ    CX, (AX)

MOVQ    main.f+96(SP), CX
MOVQ    CX, 8(DX)

CALL    runtime.deferproc(SB)

CALL    os.(*File).WriteString(SB)
JMP     loop

The actual close happens in the deferred closure:

main.deferLoop.func1:
MOVQ    8(DX), AX
CALL    os.(*File).Close(SB)

And deferred calls are run from function return paths via:

CALL    runtime.deferreturn(SB)
RET

By contrast, the explicit-close version has Close directly in the loop body:

CALL    os.Create(SB)
CALL    os.(*File).WriteString(SB)
CALL    os.(*File).Close(SB)
JMP     loop

So I would suggest rewording the problem statement 😺
Besides that, great catch 🚀

@SebTardif
Copy link
Copy Markdown
Author

Thank you for the detailed explanation and the assembly analysis! You're absolutely right - I incorrectly claimed "only the last file is actually closed." Each deferred closure does capture its own file correctly.

The actual problem is simpler: file descriptor retention - all N files stay open simultaneously until the function returns, not that only one gets closed.

I'll update the PR description to be accurate.

@camilamacedo86 camilamacedo86 changed the title 🐛 Fix file descriptor leak from defer inside loop in external plugin handler 🐛 (API): Fix file descriptor leak from defer inside loop in external plugin handler May 11, 2026
Copy link
Copy Markdown
Member

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: camilamacedo86, SebTardif

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@SebTardif: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubebuilder-e2e-k8s-1-36-0 b857e40 link true /test pull-kubebuilder-e2e-k8s-1-36-0

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@SebTardif
Copy link
Copy Markdown
Author

/retest

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@SebTardif: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants