Skip to content

Commit 5895f7d

Browse files
authored
KEP-3939: update metrics and e2e test parts to reflect latest implementation details (kubernetes#4316)
* KEP-3939: update metrics and e2e test parts to reflect latest implementation details * KEP-3939: fix typo in e2e test section
1 parent e09ddc2 commit 5895f7d

File tree

1 file changed

+19
-24
lines changed
  • keps/sig-apps/3939-allow-replacement-when-fully-terminated

1 file changed

+19
-24
lines changed

keps/sig-apps/3939-allow-replacement-when-fully-terminated/README.md

+19-24
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ See [Jobs create replacement Pods as soon as a Pod is marked for deletion](https
258258
#### Story 2
259259

260260
As a cloud user, users would want to guarantee that the number of pods that are running is exactly the amount that they specify.
261-
Terminating pods do not relinguish resources so scarce compute resource are still scheduled to those pods.
261+
Terminating pods do not relinquish resources so scarce compute resource are still scheduled to those pods.
262262
Replacement pods do not produce unnecessary scale ups.
263263

264264
#### Story 3
@@ -520,30 +520,24 @@ Tests will verify counting of terminating fields regardless of `PodDisruptionCon
520520

521521
##### e2e tests
522522

523-
Generally the only tests that are useful for this feature are when `PodReplacementPolicy: Failed`.
523+
Generally the only tests that are useful for this feature are when `PodReplacementPolicy: Failed`.
524+
Test should to create a Job which can catch a SIGTERM signal and allow for graceful termination, so when we delete the test
525+
we can first assert that pods aren't created while the Pod is terminating and finally when it terminates that a new Pod is created.
524526

525-
An example job spec that can reproduce this issue is below:
527+
We can use the default `busybox` image which is generally used in e2e tests and override the command field with something like:
526528

527-
```yaml
528-
apiVersion: batch/v1
529-
kind: Job
530-
metadata:
531-
name: job-slow-cleanup-with-pod-recreate-feature
532-
spec:
533-
completions: 1
534-
parallelism: 1
535-
backoffLimit: 2
536-
podReplacementPolicy: Failed
537-
template:
538-
spec:
539-
restartPolicy: Never
540-
containers:
541-
- name: sleep
542-
image: gcr.io/k8s-staging-perf-tests/sleep
543-
args: ["-termination-grace-period", "1m", "60s"]
544-
```
529+
```shell
530+
_term(){
531+
sleep 5
532+
exit 143
533+
}
534+
trap _term SIGTERM
535+
while true; do
536+
sleep 1
537+
done
538+
```
545539

546-
A e2e test can verify that deletion will not trigger a new pod creation until the exiting pod is fully deleted.
540+
An e2e test can verify that deletion will not trigger a new pod creation until the exiting pod is fully deleted.
547541

548542
If `podReplacementPolicy: TerminatingOrFailed` is specified we would test that pod creation happens closely after deletion.
549543

@@ -905,8 +899,9 @@ During pod terminations, an operator can see that the terminating field is being
905899

906900
We will use a new metric:
907901

908-
- `job_pods_creation_total` (new) the `action` label will mention what triggers creation (`new`, `recreateTerminatingOrFailed`, `recreateTerminated`))
909-
This can be used to get the number of pods that are being recreated due to `recreateTerminated`. Otherwise we would expect to see `new` or `recreateTerminatingOrFailed` as the normal values.
902+
- `job_pods_creation_total` (new) the `reason` label will mention what triggers creation (`new`, `recreate_terminating_or_failed`, `recreate_failed`))
903+
and the `status` label will mention the status of the pod creation (`succeeded`, `failed`).
904+
This can be used to get the number of pods that are being recreated due to `recreateTerminated`. Otherwise, we would expect to see `new` or `recreateTerminatingOrFailed` as the normal values.
910905

911906
<!--
912907
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,

0 commit comments

Comments
 (0)