Skip to content

feat: DRA resource.k8s.io/v1 integration via draclient#1492

Open
rollandf wants to merge 3 commits intok8snetworkplumbingwg:masterfrom
rollandf:dra-resourcename
Open

feat: DRA resource.k8s.io/v1 integration via draclient#1492
rollandf wants to merge 3 commits intok8snetworkplumbingwg:masterfrom
rollandf:dra-resourcename

Conversation

@rollandf
Copy link
Copy Markdown
Contributor

Migrate Dynamic Resource Allocation from kubelet PodResources / v1alpha2-style usage to the stable resource.k8s.io/v1 API (Kubernetes 1.34+).

  • Add pkg/draclient: fetch ResourceClaims and ResourceSlices, build pod resource map from device attributes (k8s.cni.cncf.io/deviceID, k8s.cni.cncf.io/resourceName) and ExtendedResourceClaimStatus
  • Wire GetPodResourceMap into k8sclient; remove DRA path from kubeletclient
  • RBAC: resourceclaims, resourceclaims/status, resourceslices (get, list) on multus ClusterRole
  • Docs: DRA / NAD usage; tests for draclient and k8sclient

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades Multus CNI's integration with Kubernetes Dynamic Resource Allocation by adopting the stable resource.k8s.io/v1 API. This change future-proofs Multus's ability to handle dynamic resource allocation for network devices, moving away from an alpha API. It introduces a dedicated client for DRA, streamlines resource mapping within the Kubernetes client, and updates necessary permissions and documentation to support the new standard.

Highlights

  • Dynamic Resource Allocation (DRA) API Migration: Migrated the Dynamic Resource Allocation integration from the deprecated kubelet PodResources / v1alpha2 style usage to the stable resource.k8s.io/v1 API, targeting Kubernetes 1.34+.
  • New DRA Client Implementation: Introduced a new pkg/draclient package responsible for fetching ResourceClaims and ResourceSlices and constructing a pod resource map based on device attributes like k8s.cni.cncf.io/deviceID and k8s.cni.cncf.io/resourceName.
  • Kubernetes Client Integration: Integrated the new GetPodResourceMap function from the DRA client into the k8sclient and removed the old DRA processing path from the kubeletclient.
  • RBAC Updates: Updated the Multus ClusterRole RBAC rules to include get and list permissions for resourceclaims, resourceclaims/status, and resourceslices under the resource.k8s.io API group.
  • Documentation and Testing: Updated the how-to-use.md documentation to reflect the new DRA / NAD usage and added comprehensive unit tests for the new draclient and updated k8sclient.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces Dynamic Resource Allocation (DRA) support to Multus CNI. It adds the draclient package for interacting with DRA resources, updates the documentation to reflect the new DRA integration, and modifies the k8sclient package to include DRA resources when retrieving network delegates. The code includes nil checks on ResourceClaimName and StringValue to prevent potential panics.

Comment thread pkg/draclient/draclient.go
Comment thread pkg/draclient/draclient.go Outdated
@rollandf
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates Dynamic Resource Allocation (DRA) support in Multus from the alpha v1alpha2 API to the stable v1 API. The changes include adding a new draclient package to interact with ResourceClaim and ResourceSlice objects, updating RBAC permissions, removing the old DRA implementation from kubeletclient, and significantly improving the documentation with up-to-date examples.

The overall implementation is well-structured and the tests are comprehensive. However, I've identified a critical correctness issue in the new draclient where it incorrectly assumes only one ResourceSlice can exist per driver/pool, which contradicts the DRA specification. I've provided detailed comments and suggestions to fix this. There is also a related test that will need to be updated once the implementation is corrected. Other than that, the changes look solid.

Comment thread pkg/draclient/draclient.go Outdated
Comment thread pkg/draclient/draclient.go Outdated
Comment thread pkg/draclient/draclient.go Outdated
Comment thread pkg/draclient/draclient_test.go
@rollandf
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates Dynamic Resource Allocation (DRA) from the alpha v1alpha2 API to the stable resource.k8s.io/v1 API. This is a significant and well-executed change, involving the creation of a new draclient to handle interactions with ResourceClaim and ResourceSlice objects, updating RBAC permissions, and removing the old DRA logic from the kubeletclient. The documentation has also been thoroughly updated with new examples.

My review has identified a high-severity bug in the caching logic of the new draclient that could lead to incorrect resource allocation in multi-namespace scenarios. I've also noted a potential performance issue related to fetching ResourceSlice objects. Please see my detailed comments for suggestions on how to address these points.

Comment thread pkg/draclient/draclient.go Outdated
Comment thread pkg/draclient/draclient.go Outdated
@rollandf
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates Dynamic Resource Allocation (DRA) integration from the alpha v1alpha2 API to the stable resource.k8s.io/v1 API. The changes include updating RBAC permissions, creating a new draclient to interact with the v1 API, removing the old DRA logic from the kubeletclient, and updating documentation and tests accordingly. The implementation of the new draclient is robust and well-tested. The overall change is a significant improvement, moving to a stable Kubernetes API. I have one suggestion regarding a potential performance improvement in the draclient.

Comment thread pkg/draclient/draclient.go Outdated
@rollandf
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates Dynamic Resource Allocation (DRA) from the alpha v1alpha2 API to the stable v1 API. This is a significant and well-executed change, involving the introduction of a new draclient package to interact with ResourceClaim and ResourceSlice objects, updating RBAC rules, and integrating the new client into k8sclient. The old DRA logic has been correctly removed from kubeletclient, and the documentation has been updated with clear examples for the stable API. The new implementation is accompanied by a comprehensive test suite, ensuring its correctness and robustness. My review includes one minor suggestion to improve code style and clarity in the new draclient package.

Comment thread pkg/draclient/draclient.go Outdated
@rollandf
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully migrates Dynamic Resource Allocation (DRA) from the alpha v1alpha2 API to the stable v1 API, a significant and valuable improvement. The introduction of a dedicated draclient package provides a clean abstraction for interacting with ResourceClaim and ResourceSlice objects. The accompanying changes to RBAC, documentation, and the extensive test suite for the new client are thorough and well-executed. I have a couple of suggestions for the new draclient to enhance consistency and robustness, but overall, this is a high-quality contribution.

Comment thread pkg/draclient/draclient.go
Comment thread pkg/draclient/draclient.go
@rollandf
Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@rollandf
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a well-executed migration of Dynamic Resource Allocation (DRA) from the alpha kubelet PodResources API to the stable resource.k8s.io/v1 API. The introduction of a new draclient package with comprehensive unit tests is a solid approach. The integration into k8sclient is clean, and the corresponding updates to RBAC permissions and documentation are thorough. My review identified a minor issue in one of the new integration tests, where a failure scenario was not being correctly triggered.

Comment thread pkg/k8sclient/k8sclient_test.go
@rollandf
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces integration with the stable resource.k8s.io/v1 API for Dynamic Resource Allocation (DRA), migrating from the older v1alpha2 style. The changes include a new draclient package to fetch ResourceClaims and ResourceSlices, update RBAC rules, and comprehensive documentation updates. The kubeletclient has been refactored to remove its direct DRA handling, delegating it to the new draclient. The new client includes caching mechanisms for ResourceClaims and ResourceSlices to optimize API calls. The accompanying test suite for the draclient and its integration with k8sclient is thorough, covering various success and failure scenarios, including extended resource claims and caching behavior. Overall, the implementation is robust and well-tested.

Comment thread pkg/draclient/draclient.go Outdated
@rollandf
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively migrates Multus from the alpha DRA API to the stable resource.k8s.io/v1 API. The changes are well-structured, with the introduction of a dedicated draclient package to handle DRA-specific logic, which is a great separation of concerns. The documentation has been thoroughly updated to reflect the new API and usage patterns, and the new unit tests provide good coverage for the new functionality.

My main feedback is a suggestion to improve the performance of ResourceSlice discovery by using label selectors, which would avoid listing all ResourceSlice objects in the cluster. This is detailed in a specific comment. Overall, this is a solid and important feature update.

Comment thread pkg/draclient/draclient.go Outdated
Comment on lines +224 to +237
listOptions := metav1.ListOptions{}
allResourceSlices, err := d.client.ResourceSlices().List(ctx, listOptions)
if err != nil {
logging.Errorf("getDeviceInfo: failed to list resource slices: %v", err)
return nil, err
}

var matchingSlices []*resourcev1api.ResourceSlice
for i := range allResourceSlices.Items {
slice := &allResourceSlices.Items[i]
if slice.Spec.Driver == result.Driver && slice.Spec.Pool.Name == result.Pool {
matchingSlices = append(matchingSlices, slice)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation lists all ResourceSlice objects in the cluster and then filters them on the client side. As noted in the TODO comment, this can cause performance issues in large clusters.

While waiting for server-side field selector support for spec.driver, consider using a label selector as an optimization. This would require establishing a convention for DRA drivers to add a label with their driver name (e.g., resource.k8s.io/driver-name: my.dra.driver) to the ResourceSlice objects they create. This would allow for a much more efficient query.

Here's how the implementation could look with a label selector:

listOptions := metav1.ListOptions{
	LabelSelector: fmt.Sprintf("resource.k8s.io/driver-name=%s", result.Driver),
}
allResourceSlices, err := d.client.ResourceSlices().List(ctx, listOptions)
if err != nil {
	logging.Errorf("getDeviceInfo: failed to list resource slices: %v", err)
	return nil, err
}

var matchingSlices []*resourcev1api.ResourceSlice
for i := range allResourceSlices.Items {
	slice := &allResourceSlices.Items[i]
	if slice.Spec.Pool.Name == result.Pool {
		matchingSlices = append(matchingSlices, slice)
	}
}

rollandf and others added 3 commits April 26, 2026 17:38
Migrate Dynamic Resource Allocation from kubelet PodResources / v1alpha2-style usage to the stable resource.k8s.io/v1 API (Kubernetes 1.34+).

- Add pkg/draclient: fetch ResourceClaims and ResourceSlices, build pod resource map from device attributes (k8s.cni.cncf.io/deviceID, k8s.cni.cncf.io/resourceName) and ExtendedResourceClaimStatus
- Wire GetPodResourceMap into k8sclient; remove DRA path from kubeletclient
- RBAC: resourceclaims, resourceclaims/status, resourceslices (get, list) on multus ClusterRole
- Docs: DRA / NAD usage; tests for draclient and k8sclient

Co-authored-by: Sebastian Sch <sebassch@gmail.com>
Signed-off-by: Fred Rolland <frolland@nvidia.com>
- If every allocation result for a claim is skipped (no slice match, missing
  Multus deviceID/resourceName, etc.), log a warning and continue to the next
  claim instead of returning an error, so kubelet/device-plugin entries stay
  usable (hybrid legacy VF + broken or irrelevant DRA claims).
- getDeviceInfo: treat missing multus deviceID on a matched device as the same
  skippable sentinel as “not in slice”, avoiding a misleading final error.
- Tests: expect nil error for unmapped claims; add case preserving pre-filled
  resource map entries; keep VF+GPU success with empty GPU slice.
- k8sclient: stub kubelet ResourceClient in DRA failure test; logging: Warningf.

Signed-off-by: Fred Rolland <frolland@nvidia.com>
…rceSlice cache

Replace full ResourceSlice object cache with a flat deviceInfoCacheKey→deviceInfo
index. Populate lazily per driver using combined spec.nodeName+spec.driver field
selectors derived from pod.Spec.NodeName, reducing API response size to a single
node's slices for one driver. Device lookup becomes O(1). Add ctx propagation to
GetPodResourceMap interface.

Signed-off-by: Fred Rolland <frolland@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant