Skip to content

Conversation

ibihim
Copy link
Contributor

@ibihim ibihim commented Oct 9, 2025

  • One-line PR description: Refining PRR questionnaire based on alpha implementation learnings
  • Other comments:
    • Updates Production Readiness Review sections with corrections from alpha implementation

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 9, 2025
@k8s-ci-robot k8s-ci-robot requested review from liggitt and ritazh October 9, 2025 16:49
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 9, 2025
@ibihim ibihim force-pushed the enhancements-ibihim-3926-prr-1.35 branch from 6dec232 to a893ad0 Compare October 9, 2025 17:06
@enj
Copy link
Member

enj commented Oct 9, 2025

/approve
(as general update from SIG Auth)
/assign deads2k
(for PRR)

@enj enj added this to SIG Auth Oct 10, 2025
@enj enj moved this to Needs Triage in SIG Auth Oct 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: enj, ibihim
Once this PR has been reviewed and has the lgtm label, please ask for approval from deads2k. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aramase
Copy link
Member

aramase commented Oct 14, 2025

@deads2k could you take a look at this for PRR?

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some missing answers for beta requirement, but the biggest one is links to integration tests to ensure this new and risky functionality is working as expected.


#### Alpha

- Error type is implemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above, missing links to integration tests, since you're not planning e2e at all, that's a major blocker for promotion.

https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
-->
The implementation, including tests, is waiting for an approval of this enhancement.
All tests verify feature enablement / disablement to ensure backwards
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, links for the tests, or make sure they are included in the earlier sections.

If the average time of `apiserver_request_duration_seconds{verb="delete"}` of the kube-apiserver
increases greatly, this feature might have caused a performance regression.
If the average time of `apiserver_request_duration_seconds{verb="delete"}` or
`apiserver_request_duration_seconds{verb="list"}` the amount of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, can you make sure these metrics are mentioned at the end of kep.yaml in metrics section, please?

Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->
No testing of upgrade->downgrade->upgrade necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why no such tests are necessary?

- Audit logs: Search for annotation: `apiserver.k8s.io/unsafe-delete-ignore-read-error`.
- RBAC: Check RoleBindings/ClusterRoleBindings granting `unsafe-delete-ignore-read-errors` verb.

###### How can someone using this feature know that it is working for their instance?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question is missing answers.

-->

All corrupt object DELETEs complete, when feature is enabled, option is set and
the user is authorized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is about SLO, iow. what is the excpected time for delete completion? Check https://github.com/kubernetes/community/blob/master/sig-scalability/slos/slos.md for suggestions.

Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
implementation difficulties, etc.).
-->
- No metric tracking when unsafe deletions actually occur. Not assumed to happen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
- No metric tracking when unsafe deletions actually occur. Not assumed to happen
- No new metric for tracking unsafe deletions is required. The available metrics (`apiserver_request_duration_seconds`) are sufficient to track the functionality of this feature.

- Impact of its outage on the feature:
- Impact of its degraded performance or high-error rates on the feature:
-->
- kube-apiserver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a question about external services to kubernetes. So in your case No is sufficient answer.

approver: "@deads2k"
approver: "@deads2k"
beta:
approver: "@deads2k"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put my name here, since I'm looking at this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

6 participants