fix: return success on DEL when CNI cache is missing#1498
fix: return success on DEL when CNI cache is missing#1498kvaps wants to merge 1 commit intok8snetworkplumbingwg:masterfrom
Conversation
When a pod sandbox fails during creation (e.g. due to network issues), the CNI ADD never completes and no cache file is written. On subsequent cleanup attempts, CmdDel tries to reconstruct delegate configs from the Kubernetes API and call DEL on each delegate plugin. However, delegate plugins (e.g. kube-ovn) require runtime state that only exists after a successful ADD (socket paths, cached results), causing DEL to fail with errors like "server_socket is required in cni.conf". This creates a deadlock: containerd cannot release the sandbox name reservation because CNI DEL fails, and CNI DEL fails because ADD never completed. The constant DEL retry storm also overloads the multus daemon, causing new ADD requests to timeout. Fix this by returning success immediately when the cache file does not exist (os.IsNotExist), since there is nothing to clean up if ADD never completed. This matches the existing behavior for the case when both cache and pod are missing (lines 979-982 before this change). Related issues: - k8snetworkplumbingwg#1454 - k8snetworkplumbingwg#1446 - containerd/containerd#11504 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
When CNI ADD never completes, multus DEL fails because delegate plugins require runtime state that only exists after a successful ADD. This creates a deadlock where containerd cannot release sandbox name reservations, permanently blocking pod creation on the affected node. Build a custom multus-cni image with a patch that returns success on DEL when no cache file exists (ADD was never completed). Upstream PR: k8snetworkplumbingwg/multus-cni#1498 Tracking issue: #2310 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request simplifies the CmdDel logic in pkg/multus/multus.go by skipping the deletion process when a cache file is missing, which indicates that the ADD operation was never completed. This change prevents failures during sandbox cleanup by avoiding attempts to reconstruct configurations from the API without necessary runtime state. Feedback was provided to enhance the observability of error logs by including the container ID and to clarify log messages when cache parsing fails. Additionally, a refactor was suggested to combine redundant return paths for better readability.
| if os.IsNotExist(err) { | ||
| logging.Verbosef("Multus: no cache found for container %s (ADD was never completed), skip DEL", args.ContainerID) | ||
| return nil | ||
| } | ||
| // The options to continue with a delete have been exhausted | ||
| // We cannot exit with an error as this may cause a sandbox to never get deleted. | ||
| logging.Errorf("Multus: failed to get the cached delegates file: %v, cannot properly delete", err) | ||
| return nil |
There was a problem hiding this comment.
The error logging and logic in this block can be improved for better observability and clarity:
- Context: The log message at line 968 loses the pod/container context. Including
args.ContainerID(as done in theVerbosefcall at line 963) would significantly help in identifying which container failed cleanup in the logs. - Clarity: If
errisnil(which occurs if the cache file was read successfully but failed to unmarshal at line 941), the log will show: <nil>, which is misleading. It should ideally distinguish between a filesystem error and a parsing failure. - Simplification: Since both branches return
nil, the logic can be combined to reduce redundancy and improve readability.
| if os.IsNotExist(err) { | |
| logging.Verbosef("Multus: no cache found for container %s (ADD was never completed), skip DEL", args.ContainerID) | |
| return nil | |
| } | |
| // The options to continue with a delete have been exhausted | |
| // We cannot exit with an error as this may cause a sandbox to never get deleted. | |
| logging.Errorf("Multus: failed to get the cached delegates file: %v, cannot properly delete", err) | |
| return nil | |
| if os.IsNotExist(err) { | |
| logging.Verbosef("Multus: no cache found for container %s (ADD was never completed), skip DEL", args.ContainerID) | |
| } else { | |
| // The options to continue with a delete have been exhausted | |
| // We cannot exit with an error as this may cause a sandbox to never get deleted. | |
| logging.Errorf("Multus: failed to get or parse the cached delegates file for container %s: %v, cannot properly delete", args.ContainerID, err) | |
| } | |
| return nil |
## Summary - Build custom multus-cni image with a patch that fixes sandbox cleanup deadlock - When CNI ADD never completes, DEL now returns success instead of failing - Prevents stale sandbox name reservations from permanently blocking pod creation ## Problem After a node disruption (e.g. DRBD/kube-ovn issues during upgrade), containerd accumulates stale sandbox name reservations. Cleanup fails because multus calls delegate plugins for DEL without cached state, and they reject the incomplete config. This blocks all new pods on the affected node indefinitely. ## Changes - Add `images/multus-cni/` with Dockerfile and patch for custom image build - Add `image` target to multus Makefile - Add multus to root Makefile build targets - Update DaemonSet template to use the custom image ## Upstream - PR: k8snetworkplumbingwg/multus-cni#1498 - Tracking: #2310 ## Test plan - [x] Built and pushed image to ghcr.io/cozystack/cozystack/multus-cni - [x] Deployed to cluster, verified fix in multus logs ("no cache found... skip DEL") - [x] Force-deleted stuck pods, all 24 pods started successfully within 70s <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added Multus CNI container image build process to the system. * **Bug Fixes** * Fixed delete operation behavior when cache is unavailable, improving reliability of network cleanup operations. * **Chores** * Updated Multus CNI container image reference in deployment configuration to latest version with build improvements. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
When CNI ADD never completes, multus DEL fails because delegate plugins require runtime state that only exists after a successful ADD. This creates a deadlock where containerd cannot release sandbox name reservations, permanently blocking pod creation on the affected node. Build a custom multus-cni image with a patch that returns success on DEL when no cache file exists (ADD was never completed). Upstream PR: k8snetworkplumbingwg/multus-cni#1498 Tracking issue: #2310 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> (cherry picked from commit 094b80a)
|
We had the same issue specified here |
Summary
Problem
After a node disruption, containerd accumulates stale sandbox name reservations. When it tries
to clean them up, CNI DEL is called through multus. Without a cache file, multus reconstructs
the config from the Kubernetes API, but delegate plugins (e.g. kube-ovn) reject the config
because it lacks runtime state (socket paths, cached results) that only exists after ADD.
This causes:
Fix
If the cache file does not exist (
os.IsNotExist), returnnilimmediately since there isnothing to clean up. This matches the existing graceful fallback when both cache and pod are
missing (the
elsebranch at the former line 978).Related issues
Test plan