fix(provisioner): consolidate containerd v1/v2 templates into one#800
fix(provisioner): consolidate containerd v1/v2 templates into one#800ArangoGutierrez wants to merge 7 commits intoNVIDIA:mainfrom
Conversation
Coverage Report for CI Build 25261481429Coverage decreased (-0.03%) to 47.742%Details
Uncovered Changes
Coverage Regressions1 previously-covered line in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
ArangoGutierrez
left a comment
There was a problem hiding this comment.
Self-review: 4-cell E2E matrix captured in PR body (Cells 1, 2 verified containerd.io apt path on Ubuntu; Cells 3, 4 verified dnf path on Amazon Linux 2023). go test, go vet, and golangci-lint are clean. DCO + GPG signatures on all 7 commits. Awaiting QA promotion to ready-for-review.
ArangoGutierrez
left a comment
There was a problem hiding this comment.
Self-review: 4-cell E2E matrix captured in PR body (Cells 1, 2 verified containerd.io apt path on Ubuntu; Cells 3, 4 verified dnf path on Amazon Linux 2023). go test, go vet, and golangci-lint are clean. DCO + GPG signatures on all 7 commits. Awaiting QA promotion to ready-for-review.
…io template Adds TestContainerd_Execute_Version2x. Fails today because Execute() dispatches v2.x to the binary-download containerdV2Template. Will pass once the dispatch is collapsed into a single template path. Refs: NVIDIA/gpu-operator#2396 Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Drops TestContainerd_Execute_Version2 and the v2-specific branches in TestContainerd_Execute_CommonElements. Those assertions describe the binary-download path (RUNC_VERSION pin, github.com tarball, explicit modprobe/sysctl) which is being removed in the next commit. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
The v2 binary-download template was Debian-only and is broken in practice (NVIDIA/gpu-operator#2396). The containerd.io apt/dnf package now ships 2.x and works on debian, amazon, and rhel families. Both v1.x and v2.x now render through the unified package template. Fixes: NVIDIA/gpu-operator#2396 Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
The MajorVersion struct field is being removed. These assertions are the last consumers. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
…rsion Removes containerdV2Template, the containerdV2Tmpl pre-compiled var, and the MajorVersion field on the Containerd struct. The bare-major "2" -> "2.0.0" coercion is preserved for backward compatibility. Git and latest source paths are unchanged. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Adds assertions for /etc/apt/keyrings/docker.gpg (debian), docker-ce.repo (rhel), and "Unsupported OS family" (default arm), satisfying the design doc's mutation invariant: deleting any branch of the unified template must cause a unit test to fail. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Cells: (Ubuntu, Amazon Linux 2023) x (containerd 1.7.27, 2.2.3). Cell 4 (Amazon Linux + 2.2.3) is previously untested -- the v2 binary-download path was Debian-only. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
7f276a3 to
82b7e0d
Compare
|
Please review @tariq1890 / @cdesiniotis |
tariq1890
left a comment
There was a problem hiding this comment.
Thanks very much @ArangoGutierrez !
Problem
The v2 binary-download template (
containerdV2Template) was Debian-only and is broken in practice — see NVIDIA/gpu-operator#2396 where Tariq reported that holodeck cannot provision an environment withcontainerRuntime.version: 2.2.3. The original December-2025 rationale for v2 ("v1 didn't work for containerd 2.x") is obsolete: thecontainerd.ioapt/dnf package now ships 2.x and works on debian, amazon, and rhel families.Approach
Single template path. Both v1.x and v2.x render through the OS-aware
containerd.iopackage template. DropscontainerdV2Template,containerdV2Tmpl, theMajorVersionfield on theContainerdstruct, and the major-version dispatch branch inExecute(). The bare-major"2"→"2.0.0"coercion is preserved for backward compatibility (apt rejectscontainerd.io=2-1). Git and Latest source paths are not touched.Commits
094884e1containerd.iotemplate (RED)f6cea29f697a785acontainerd.iopackage (GREEN)1841754fMajorVersionassertions5e7afc10MajorVersionfield7727dcb77f276a39Testing
Unit tests, vet, and golangci-lint are clean.
E2E matrix on
g4dn.xlargeinus-west-1:ctr version1.7.27containerd.io 1.7.27Client/Server 1.7.272.2.3containerd.io v2.2.3Client/Server v2.2.31.7.272.2.3+unknown(fallback)Client/Server 2.2.3+unknown2.2.32.2.3+unknownClient/Server 2.2.3+unknownVerbatim verification output captured at the time of the run:
Cell 1 — Ubuntu + 1.7.27
Cell 2 — Ubuntu + 2.2.3 (the fix)
Cell 3 — Amazon Linux 2023 + 1.7.27 (fell back to latest)
Cell 4 — Amazon Linux 2023 + 2.2.3 (previously untested combo)
Notable
1.7.27on AL2023, they currently get 2.x silently; making pinning strict is out of scope and tracked separately.auth.username: ec2-userin the Environment manifest; included in the e2e fixtures here.bin_dir/bin_dirsdeprecation warning on everyctrinvocation. Unrelated to this PR; tracked separately.Refs
docs/plans/2026-04-29-containerd-template-consolidation-design.mddocs/plans/2026-04-29-containerd-template-consolidation-plan.md