Skip to content

libct/exeseal: add annotation to choose runc binary protection mechanism#5290

Open
captainmo1 wants to merge 2 commits into
opencontainers:mainfrom
captainmo1:5272-exeseal-annotation
Open

libct/exeseal: add annotation to choose runc binary protection mechanism#5290
captainmo1 wants to merge 2 commits into
opencontainers:mainfrom
captainmo1:5272-exeseal-annotation

Conversation

@captainmo1

Copy link
Copy Markdown

Resolves #5272.

Introduce the org.opencontainers.runc.clone-self-exe annotation to let
users explicitly choose how runc protects the host runc binary against
tampering by the container. Previously, runc attempted sealed overlayfs
and silently fell back to the clone-binary path on failure, with no way
for users to express a preference.

Recognized values:

  • independent-data-copy — use the clone-binary path only (memfd, with
    an internal fallback to a classic unlinked tmpfile on older kernels).
  • ro-shared-page — use sealed overlayfs only; failure is fatal.

When the annotation is absent, the existing default behavior is
preserved unchanged (sealed overlayfs, then clone-binary fallback).
This is step 1 of the plan in #5272; changing the default order is
left to a follow-up targeting 1.6.

@captainmo1 captainmo1 force-pushed the 5272-exeseal-annotation branch from 0ca5535 to f890905 Compare May 21, 2026 07:58

@cyphar cyphar left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest this isn't really the approach I would've gone with, and there are a few issues here. Since I've already thought about it, it will probably be easier for me to just bang something out.

This review is mainly just some general information that will help you when writing future patches. It's up to you if you want to submit a follow-up but I'm not really sure I'm going to merge this even if these points were fixed.

FWIW, the callback approach I mention would also make the patch to switch to memfd cloning smaller.

Comment thread libcontainer/configs/config.go Outdated
Labels []string `json:"labels"`

// CloneSelfExe selects how runc protects runc binary against tampering.
CloneSelfExe exeseal.Mode `json:"clone_self_exe,omitempty"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would be a string, stuff serialised into state.json is quite fragile.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I see how having a descriptive string is better than just an int in state.json. thanks

case ModeIndependentDataCopy:
return cloneSelfExeViaCloneBinary(tmpDir)

case ModeUnset:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication here isn't really nice, I had imagined doing it with a list of callback functions that we set based on the configured annotation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok got it

exePath = "/proc/self/exe"
} else {
var err error
safeExe, err = exeseal.CloneSelfExe(c.stateDir)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if this was explicitly set, we probably want to skip the exeseal.IsSelfExeCloned check -- at the very least, someone asking for independent-data-copy doesn't want to share for any reason, even if they used memfd-bind.

Comment on lines +243 to +249
overlayFile, err := sealedOverlayfs("/proc/self/exe", tmpDir)
if err == nil {
logrus.Debug("runc exeseal: using overlayfs for sealed /proc/self/exe") // used for tests
return overlayFile, nil
}
logrus.WithError(err).Debugf("could not use overlayfs for /proc/self/exe sealing -- falling back to making a temporary copy")
return cloneSelfExeViaCloneBinary(tmpDir)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the switch of the defaults to memfd cloning (as a second patch).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah step 2(memfd tried first when unset) is going in a follow-up PR after this lands, so 1.4 can backport this one cleanly. but i can include it now

@captainmo1 captainmo1 force-pushed the 5272-exeseal-annotation branch from f890905 to 13072fe Compare May 21, 2026 22:37
@captainmo1

Copy link
Copy Markdown
Author

Hey I appreciate the review regardless of outcome 👍 This was a great learning opportunity. And I will submit a follow up.

@captainmo1 captainmo1 force-pushed the 5272-exeseal-annotation branch 3 times, most recently from 9f1d50f to 4de8e51 Compare May 23, 2026 09:51
@captainmo1 captainmo1 force-pushed the 5272-exeseal-annotation branch from 4de8e51 to c689774 Compare May 31, 2026 19:33
Introduce the org.opencontainers.runc.clone-self-exe annotation to let
users explicitly choose how runc protects the host runc binary against
tampering by the container. Previously, runc attempted sealed overlayfs
and silently fell back to the clone-binary path on failure, with no way
for users to express a preference.

Recognized values:
  - independent-data-copy: use the clone-binary path only (memfd, with
                           an internal fallback to a classic unlinked
                           tmpfile on older kernels).
  - ro-shared-page:        use sealed overlayfs only.

When the annotation is absent, runc's existing default behavior is
preserved unchanged (sealed overlayfs, then clone-binary fallback).

The annotation is registered in PotentiallyUnsafeConfigAnnotations
because it configures runc's own execution path.

Signed-off-by: Mohammed Aminu Futa <mohammedfuta2000@gmail.com>
- Drop Mode int enum in favor of plain strings; state.json now stores
  the annotation value.
- Refactor CloneSelfExe to dispatch via a per-mode list of strategy
  callbacks, eliminating duplication between explicit modes and the
  unset fallback path.
- Skip the IsSelfExeCloned shortcut when an explicit mode is set.

Signed-off-by: Mohammed Aminu Futa <mohammedfuta2000@gmail.com>
@captainmo1 captainmo1 force-pushed the 5272-exeseal-annotation branch from c689774 to 82e6529 Compare June 5, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/proc/self/exe and page-cache poisoning

2 participants