@@ -177,7 +177,7 @@ KEP, the hard limit on the number of pods will be removed.
177177Training workloads that uses TPUs can be very large, requiring over 9,000
178178TPUs for a single training job. The number of TPUs for each node is usually 4, meaning
179179that the job will run across more than 2,000 nodes. Due to topology constraints, TPU slices
180- are usually modelled in DRA as multi-host devices, meaning that a single DRA device
180+ are usually modeled in DRA as multi-host devices, meaning that a single DRA device
181181can represent thousands of TPUs. As a result, all pods running the workload will
182182therefore share a single ResourceClaim. The current limit of 256 pods sharing a
183183ResourceClaim is therefore too low.
@@ -237,7 +237,7 @@ provide guidance as to what is a safe number.
237237The ` ReservedFor ` field on the ` ResourceClaimStatus ` is currently used for two purposes:
238238
239239#### Deallocation
240- Devices are allocated to ` ResourceClaim ` s when the first pod referencing the claim is
240+ Devices are allocated to a ` ResourceClaim ` when the first pod referencing the claim is
241241scheduled. Other pods can also share the ` ResourceClaim ` in which case they share the
242242devices. Once no pods are consuming the claim, the devices should be deallocated to they
243243can be allocted to other claims. The ` ReservedFor ` list is used to keep track of pods
@@ -256,7 +256,7 @@ controller to find pods that are using the ResourceClaim:
2562561 . The DRA scheduler plugin uses the list to find claims that have zero or only
257257 a single pod using it, and is therefore a candidate for deallocation in the ` PostFilter ` function.
258258
259- 1 . The device_taint_eviction controller uses the ` ReservedFor ` list to find the pods that needs to be evicted
259+ 1 . The device_taint_eviction controller uses the ` ReservedFor ` list to find the pods that need to be evicted
260260 when one or more of the devices allocated to a ResourceClaim is tainted (and the ResourceClaim
261261 does not have a toleration).
262262
@@ -284,6 +284,9 @@ type ResourceClaimSpec struct {
284284 // ResourceClaim to remove the reference from the status.ReservedFor list when there
285285 // are no longer any pods consuming the claim.
286286 //
287+ // Most user-created ResourceClaims should not set this field. It is more typically
288+ // used by ResourceClaims created and managed by controllers.
289+ //
287290 // +featureGate=DRAReservedForWorkloads
288291 // +optional
289292 ReservedFor *ResourceClaimConsumerReference
@@ -303,7 +306,7 @@ type AllocationResult struct {
303306}
304307```
305308
306- The `ResourceClaimConsumerReference type already exists:
309+ The ` ResourceClaimConsumerReference ` type already exists:
307310
308311``` go
309312// ResourceClaimConsumerReference contains enough information to let you
@@ -345,21 +348,24 @@ list, it will not add a reference to the pod.
345348
346349##### Deallocation
347350The resourceclaim controller will remove Pod references from the ` ReservedFor ` list just
348- like it does now, but it will never remove references to non-Pod resources. Instead, it
349- will be the responsibility of the controller/user that created the ` ResourceClaim ` to
350- remove the reference to the non-Pod resource from the ` ReservedFor ` list when no pods
351+ like it does now using the same logic. For non-Pod references, the controller will recognize
352+ a small number of built-in types, starting with Deployment, StatefulSet and Job, and will
353+ remove the reference from the list when those resources are removed. For other types,
354+ it will be the responsibility of the workload controller/user that created the ` ResourceClaim `
355+ to remove the reference to the non-Pod resource from the ` ReservedFor ` list when no pods
351356are consuming the ` ResourceClaim ` and no new pods will be created that references
352357the ` ResourceClaim ` .
353358
354359The resourceclaim controller will then discover that the ` ReservedFor ` list is empty
355360and therefore know that it is safe to deallocate the ` ResourceClaim ` .
356361
357- This requires that the controller/user has permissions to update the status
358- subresource of the ` ResourceClaim ` . The resourceclaim controller will also try to detect if
359- the resource referenced in the ` ReservedFor ` list has been deleted from the cluster, but
360- that requires that the controller has permissions to get or list resources of the type. If the
361- resourceclaim controller is not able to check, it will just wait until the reference in
362- the ` ReservedFor ` list is removed.
362+ This requires that the resourceclaim controller watches the workload types that will
363+ be supported. For other types of workloads, there will be a requirement that the workload
364+ controller has permissions to update the status subresource of the ` ResourceClaim ` . The
365+ resourceclaim controller will also try to detect if an unknown resource referenced in the
366+ ` ReservedFor ` list has been deleted from the cluster, but that requires that the controller
367+ has permissions to get or list resources of the type. If the resourceclaim controller is
368+ not able to check, it will just wait until the reference in the ` ReservedFor ` list is removed.
363369
364370##### Finding pods using a ResourceClaim
365371If the reference in the ` ReservedFor ` list is to a non-Pod resource, controllers can no longer
0 commit comments