You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs/README.md
+14-9
Original file line number
Diff line number
Diff line change
@@ -406,20 +406,18 @@ type JobSpec struct {
406
406
// failures per index is kept in the pod's
407
407
// batch.kubernetes.io/job-index-failure-count annotation. It can only
408
408
// be set when Job's completionMode=Indexed, and the Pod's restart
409
-
// policy is Never.
410
-
// When specified, then the backoffLimit field is defaulted to the max int32
411
-
// value. Also, once specified it cannot be set to nil.
409
+
// policy is Never. The field is immutable.
412
410
// +optional
413
411
BackoffLimitPerIndex *int32
414
412
415
413
// Specifies the maximal number of failed indexes before marking the Job as
416
414
// failed, when backoffLimitPerIndex is set. Once the number of failed
417
415
// indexes exceeds this number the entire Job is marked as Failed and its
418
-
// execution is terminated. When left as nil the job continues execution of
416
+
// execution is terminated. When left as null the job continues execution of
419
417
// all of its indexes and is marked with the `Complete` Job condition.
420
418
// It can only be specified when backoffLimitPerIndex is set.
421
-
// The value is might be up to 10^5 when completionsis <= 10^5, or 10^4 when
422
-
// completions > 10^5.
419
+
// It can be null or up to completions. It is required and must be
420
+
// less than or equal to 10^4 when is completions greater than 10^5.
423
421
// +optional
424
422
MaxFailedIndexes *int32
425
423
...
@@ -515,9 +513,6 @@ all indexes are succeeded. The Job is marked as failed (the `Failed` Job conditi
515
513
when at least one index is failed. The `Failed` condition is added once
516
514
all indexes completed their execution (either failed or succeeded), or when
517
515
the number of failed indexes exceeds the specified `.spec.maxFailedIndexes`.
518
-
The `reason` field for the failed Job is `FailedIndexes` and the `message` field
519
-
will list up to 3 of the failed indexes (in case there are more indexes the
520
-
message will indicate that, for example with `...`).
521
516
522
517
### FailIndex action
523
518
@@ -730,6 +725,12 @@ in back-to-back releases.
730
725
- Address reviews and bug reports from Alpha users
731
726
- Propose and implement metrics
732
727
- E2e tests are in Testgrid and linked in KEP
728
+
- Evaluate performance of Job controller for jobs using backoff limit per index
729
+
with benchmarks at the integration or e2e level (discussion pointers from Alpha
730
+
review: [thread1](https://github.com/kubernetes/kubernetes/pull/118009#discussion_r1261694406) and [thread2](https://github.com/kubernetes/kubernetes/pull/118009#discussion_r1263862076))
731
+
- Reevaluate ideas of not using `.status.uncountedTerminatedPods` for keeping track
732
+
in the `.status.Failed` field. The idea is to prevent `backoffLimit` for setting.
@@ -1214,6 +1215,10 @@ Major milestones might include:
1214
1215
- 2023-01-23: Initial version of the KEP PR [Backoff Limit Per Job #3774](https://github.com/kubernetes/enhancements/pull/3774)
1215
1216
- 2023-04-26: The KEP PR [Backoff limit per Job Index #3967](https://github.com/kubernetes/enhancements/pull/3967) takes over from [#3774](https://github.com/kubernetes/enhancements/pull/3774)
1216
1217
- 2023-05-08: The KEP PR ready for review
1218
+
- 2023-06-07: The KEP PR merged
1219
+
- 2023-07-13: The implementation PR [Support BackoffLimitPerIndex in Jobs #118009](https://github.com/kubernetes/kubernetes/pull/118009) under review
1220
+
- 2023-07-18: Merge the API PR [Extend the Job API for BackoffLimitPerIndex](https://github.com/kubernetes/kubernetes/pull/119294)
1221
+
- 2023-07-18: Merge the Job Controller PR [Support BackoffLimitPerIndex in Jobs](https://github.com/kubernetes/kubernetes/pull/118009)
0 commit comments