feat: DRA resource.k8s.io/v1 integration via draclient#1492
feat: DRA resource.k8s.io/v1 integration via draclient#1492rollandf wants to merge 3 commits intok8snetworkplumbingwg:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades Multus CNI's integration with Kubernetes Dynamic Resource Allocation by adopting the stable Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces Dynamic Resource Allocation (DRA) support to Multus CNI. It adds the draclient package for interacting with DRA resources, updates the documentation to reflect the new DRA integration, and modifies the k8sclient package to include DRA resources when retrieving network delegates. The code includes nil checks on ResourceClaimName and StringValue to prevent potential panics.
27ce0fe to
dad6f02
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request migrates Dynamic Resource Allocation (DRA) support in Multus from the alpha v1alpha2 API to the stable v1 API. The changes include adding a new draclient package to interact with ResourceClaim and ResourceSlice objects, updating RBAC permissions, removing the old DRA implementation from kubeletclient, and significantly improving the documentation with up-to-date examples.
The overall implementation is well-structured and the tests are comprehensive. However, I've identified a critical correctness issue in the new draclient where it incorrectly assumes only one ResourceSlice can exist per driver/pool, which contradicts the DRA specification. I've provided detailed comments and suggestions to fix this. There is also a related test that will need to be updated once the implementation is corrected. Other than that, the changes look solid.
dad6f02 to
0d8ae35
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request migrates Dynamic Resource Allocation (DRA) from the alpha v1alpha2 API to the stable resource.k8s.io/v1 API. This is a significant and well-executed change, involving the creation of a new draclient to handle interactions with ResourceClaim and ResourceSlice objects, updating RBAC permissions, and removing the old DRA logic from the kubeletclient. The documentation has also been thoroughly updated with new examples.
My review has identified a high-severity bug in the caching logic of the new draclient that could lead to incorrect resource allocation in multi-namespace scenarios. I've also noted a potential performance issue related to fetching ResourceSlice objects. Please see my detailed comments for suggestions on how to address these points.
0d8ae35 to
fbe0e06
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request migrates Dynamic Resource Allocation (DRA) integration from the alpha v1alpha2 API to the stable resource.k8s.io/v1 API. The changes include updating RBAC permissions, creating a new draclient to interact with the v1 API, removing the old DRA logic from the kubeletclient, and updating documentation and tests accordingly. The implementation of the new draclient is robust and well-tested. The overall change is a significant improvement, moving to a stable Kubernetes API. I have one suggestion regarding a potential performance improvement in the draclient.
fbe0e06 to
4f083b4
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request migrates Dynamic Resource Allocation (DRA) from the alpha v1alpha2 API to the stable v1 API. This is a significant and well-executed change, involving the introduction of a new draclient package to interact with ResourceClaim and ResourceSlice objects, updating RBAC rules, and integrating the new client into k8sclient. The old DRA logic has been correctly removed from kubeletclient, and the documentation has been updated with clear examples for the stable API. The new implementation is accompanied by a comprehensive test suite, ensuring its correctness and robustness. My review includes one minor suggestion to improve code style and clarity in the new draclient package.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request successfully migrates Dynamic Resource Allocation (DRA) from the alpha v1alpha2 API to the stable v1 API, a significant and valuable improvement. The introduction of a dedicated draclient package provides a clean abstraction for interacting with ResourceClaim and ResourceSlice objects. The accompanying changes to RBAC, documentation, and the extensive test suite for the new client are thorough and well-executed. I have a couple of suggestions for the new draclient to enhance consistency and robustness, but overall, this is a high-quality contribution.
4f083b4 to
e3a789c
Compare
|
/gemini review |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request is a well-executed migration of Dynamic Resource Allocation (DRA) from the alpha kubelet PodResources API to the stable resource.k8s.io/v1 API. The introduction of a new draclient package with comprehensive unit tests is a solid approach. The integration into k8sclient is clean, and the corresponding updates to RBAC permissions and documentation are thorough. My review identified a minor issue in one of the new integration tests, where a failure scenario was not being correctly triggered.
e3a789c to
25483d2
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces integration with the stable resource.k8s.io/v1 API for Dynamic Resource Allocation (DRA), migrating from the older v1alpha2 style. The changes include a new draclient package to fetch ResourceClaims and ResourceSlices, update RBAC rules, and comprehensive documentation updates. The kubeletclient has been refactored to remove its direct DRA handling, delegating it to the new draclient. The new client includes caching mechanisms for ResourceClaims and ResourceSlices to optimize API calls. The accompanying test suite for the draclient and its integration with k8sclient is thorough, covering various success and failure scenarios, including extended resource claims and caching behavior. Overall, the implementation is robust and well-tested.
25483d2 to
8760bee
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively migrates Multus from the alpha DRA API to the stable resource.k8s.io/v1 API. The changes are well-structured, with the introduction of a dedicated draclient package to handle DRA-specific logic, which is a great separation of concerns. The documentation has been thoroughly updated to reflect the new API and usage patterns, and the new unit tests provide good coverage for the new functionality.
My main feedback is a suggestion to improve the performance of ResourceSlice discovery by using label selectors, which would avoid listing all ResourceSlice objects in the cluster. This is detailed in a specific comment. Overall, this is a solid and important feature update.
| listOptions := metav1.ListOptions{} | ||
| allResourceSlices, err := d.client.ResourceSlices().List(ctx, listOptions) | ||
| if err != nil { | ||
| logging.Errorf("getDeviceInfo: failed to list resource slices: %v", err) | ||
| return nil, err | ||
| } | ||
|
|
||
| var matchingSlices []*resourcev1api.ResourceSlice | ||
| for i := range allResourceSlices.Items { | ||
| slice := &allResourceSlices.Items[i] | ||
| if slice.Spec.Driver == result.Driver && slice.Spec.Pool.Name == result.Pool { | ||
| matchingSlices = append(matchingSlices, slice) | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation lists all ResourceSlice objects in the cluster and then filters them on the client side. As noted in the TODO comment, this can cause performance issues in large clusters.
While waiting for server-side field selector support for spec.driver, consider using a label selector as an optimization. This would require establishing a convention for DRA drivers to add a label with their driver name (e.g., resource.k8s.io/driver-name: my.dra.driver) to the ResourceSlice objects they create. This would allow for a much more efficient query.
Here's how the implementation could look with a label selector:
listOptions := metav1.ListOptions{
LabelSelector: fmt.Sprintf("resource.k8s.io/driver-name=%s", result.Driver),
}
allResourceSlices, err := d.client.ResourceSlices().List(ctx, listOptions)
if err != nil {
logging.Errorf("getDeviceInfo: failed to list resource slices: %v", err)
return nil, err
}
var matchingSlices []*resourcev1api.ResourceSlice
for i := range allResourceSlices.Items {
slice := &allResourceSlices.Items[i]
if slice.Spec.Pool.Name == result.Pool {
matchingSlices = append(matchingSlices, slice)
}
}2cf483c to
9952c7b
Compare
Migrate Dynamic Resource Allocation from kubelet PodResources / v1alpha2-style usage to the stable resource.k8s.io/v1 API (Kubernetes 1.34+). - Add pkg/draclient: fetch ResourceClaims and ResourceSlices, build pod resource map from device attributes (k8s.cni.cncf.io/deviceID, k8s.cni.cncf.io/resourceName) and ExtendedResourceClaimStatus - Wire GetPodResourceMap into k8sclient; remove DRA path from kubeletclient - RBAC: resourceclaims, resourceclaims/status, resourceslices (get, list) on multus ClusterRole - Docs: DRA / NAD usage; tests for draclient and k8sclient Co-authored-by: Sebastian Sch <sebassch@gmail.com> Signed-off-by: Fred Rolland <frolland@nvidia.com>
- If every allocation result for a claim is skipped (no slice match, missing Multus deviceID/resourceName, etc.), log a warning and continue to the next claim instead of returning an error, so kubelet/device-plugin entries stay usable (hybrid legacy VF + broken or irrelevant DRA claims). - getDeviceInfo: treat missing multus deviceID on a matched device as the same skippable sentinel as “not in slice”, avoiding a misleading final error. - Tests: expect nil error for unmapped claims; add case preserving pre-filled resource map entries; keep VF+GPU success with empty GPU slice. - k8sclient: stub kubelet ResourceClient in DRA failure test; logging: Warningf. Signed-off-by: Fred Rolland <frolland@nvidia.com>
…rceSlice cache Replace full ResourceSlice object cache with a flat deviceInfoCacheKey→deviceInfo index. Populate lazily per driver using combined spec.nodeName+spec.driver field selectors derived from pod.Spec.NodeName, reducing API response size to a single node's slices for one driver. Device lookup becomes O(1). Add ctx propagation to GetPodResourceMap interface. Signed-off-by: Fred Rolland <frolland@nvidia.com>
2e1e2b6 to
90ad2f2
Compare
Migrate Dynamic Resource Allocation from kubelet PodResources / v1alpha2-style usage to the stable resource.k8s.io/v1 API (Kubernetes 1.34+).