Skip to content

Commit fcc6bbd

Browse files
authored
Merge pull request kubernetes#4123 from mimowo/backoff-limit-per-index-update
Update for KEP3850 "Backoff Limit Per Index"
2 parents 45483ff + a6b7810 commit fcc6bbd

File tree

1 file changed

+14
-9
lines changed
  • keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs

1 file changed

+14
-9
lines changed

keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs/README.md

+14-9
Original file line numberDiff line numberDiff line change
@@ -406,20 +406,18 @@ type JobSpec struct {
406406
// failures per index is kept in the pod's
407407
// batch.kubernetes.io/job-index-failure-count annotation. It can only
408408
// be set when Job's completionMode=Indexed, and the Pod's restart
409-
// policy is Never.
410-
// When specified, then the backoffLimit field is defaulted to the max int32
411-
// value. Also, once specified it cannot be set to nil.
409+
// policy is Never. The field is immutable.
412410
// +optional
413411
BackoffLimitPerIndex *int32
414412
415413
// Specifies the maximal number of failed indexes before marking the Job as
416414
// failed, when backoffLimitPerIndex is set. Once the number of failed
417415
// indexes exceeds this number the entire Job is marked as Failed and its
418-
// execution is terminated. When left as nil the job continues execution of
416+
// execution is terminated. When left as null the job continues execution of
419417
// all of its indexes and is marked with the `Complete` Job condition.
420418
// It can only be specified when backoffLimitPerIndex is set.
421-
// The value is might be up to 10^5 when completions is <= 10^5, or 10^4 when
422-
// completions > 10^5.
419+
// It can be null or up to completions. It is required and must be
420+
// less than or equal to 10^4 when is completions greater than 10^5.
423421
// +optional
424422
MaxFailedIndexes *int32
425423
...
@@ -515,9 +513,6 @@ all indexes are succeeded. The Job is marked as failed (the `Failed` Job conditi
515513
when at least one index is failed. The `Failed` condition is added once
516514
all indexes completed their execution (either failed or succeeded), or when
517515
the number of failed indexes exceeds the specified `.spec.maxFailedIndexes`.
518-
The `reason` field for the failed Job is `FailedIndexes` and the `message` field
519-
will list up to 3 of the failed indexes (in case there are more indexes the
520-
message will indicate that, for example with `...`).
521516

522517
### FailIndex action
523518

@@ -730,6 +725,12 @@ in back-to-back releases.
730725
- Address reviews and bug reports from Alpha users
731726
- Propose and implement metrics
732727
- E2e tests are in Testgrid and linked in KEP
728+
- Evaluate performance of Job controller for jobs using backoff limit per index
729+
with benchmarks at the integration or e2e level (discussion pointers from Alpha
730+
review: [thread1](https://github.com/kubernetes/kubernetes/pull/118009#discussion_r1261694406) and [thread2](https://github.com/kubernetes/kubernetes/pull/118009#discussion_r1263862076))
731+
- Reevaluate ideas of not using `.status.uncountedTerminatedPods` for keeping track
732+
in the `.status.Failed` field. The idea is to prevent `backoffLimit` for setting.
733+
Discussion [link](https://github.com/kubernetes/kubernetes/pull/118009#discussion_r1263879848).
733734
- The feature flag enabled by default
734735

735736
#### GA
@@ -1214,6 +1215,10 @@ Major milestones might include:
12141215
- 2023-01-23: Initial version of the KEP PR [Backoff Limit Per Job #3774](https://github.com/kubernetes/enhancements/pull/3774)
12151216
- 2023-04-26: The KEP PR [Backoff limit per Job Index #3967](https://github.com/kubernetes/enhancements/pull/3967) takes over from [#3774](https://github.com/kubernetes/enhancements/pull/3774)
12161217
- 2023-05-08: The KEP PR ready for review
1218+
- 2023-06-07: The KEP PR merged
1219+
- 2023-07-13: The implementation PR [Support BackoffLimitPerIndex in Jobs #118009](https://github.com/kubernetes/kubernetes/pull/118009) under review
1220+
- 2023-07-18: Merge the API PR [Extend the Job API for BackoffLimitPerIndex](https://github.com/kubernetes/kubernetes/pull/119294)
1221+
- 2023-07-18: Merge the Job Controller PR [Support BackoffLimitPerIndex in Jobs](https://github.com/kubernetes/kubernetes/pull/118009)
12171222

12181223
## Drawbacks
12191224

0 commit comments

Comments
 (0)