-
Notifications
You must be signed in to change notification settings - Fork 162
CNV-61721: Add ValidatingAdmissionPolicy to validate the HyperConverged namespace #3912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The current implementation of preventing the creation of the HyperConverged CR in non-allowed namespace, is not working in Openshift, where becasue of a race condition, the webhook's namespace selector is removed by OLM. This commit adds a new controller, to create and reconcile a ValidatingAdmissionPolicy and the related ValidatingAdmissionPolicyBinding, to perform the same validation. The reason we're doing it in a new controller, is because we need the ValidatingAdmissionPolicy to be set, even if the HyperConverged CR is not deployed, while our main controller only reconciles resources if the HyperConverged CR is deployed. Signed-off-by: Nahshon Unna Tsameret <[email protected]>
Signed-off-by: Nahshon Unna Tsameret <[email protected]>
Remove the existing validation of the HyperConverged CR namespace from the validation webhook, as it is now done by the policy, created by the admission policy controller. Signed-off-by: Nahshon Unna Tsameret <[email protected]>
OLM adds a namespace selection on the validation webhook CR, causing the namespace validation to be not relevant. The webhook setup logic removes this selector, but actually this is reconciled by OLM, and eventually, user can still create the HyperConverged CR in any namespace. The issue is now handled by a ValidationgAdmissionPolicy, and so we don't need this logic anymore, and so this commit removes it. Signed-off-by: Nahshon Unna Tsameret <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
- In
reconcilePolicy/reconcileBindingyou detect label changes withCompareLabels, but then callMergeLabelson the desired object (policy/binding) instead of the live object (foundPolicy/foundBinding), so the updated labels are never persisted; swap the arguments so you actually mutate the object being updated. - The logger acquisition in
Reconcileuseslogger, err := logr.FromContext(ctx), butlogr.FromContextdoes not return an error; this code as written will not compile and should instead uselogr.FromContextOrDiscard(ctx)or a single-valueFromContextcall. - The startup reconciliation event is sent on
startupEventbeforeadd(mgr, r)wires the channel into the controller and the channel is then closed viadefer, so the startup request is likely dropped; consider moving the send after the controller is watching the channel and avoiding closing it prematurely so the initial reconcile is guaranteed to run.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `reconcilePolicy`/`reconcileBinding` you detect label changes with `CompareLabels`, but then call `MergeLabels` on the desired object (`policy`/`binding`) instead of the live object (`foundPolicy`/`foundBinding`), so the updated labels are never persisted; swap the arguments so you actually mutate the object being updated.
- The logger acquisition in `Reconcile` uses `logger, err := logr.FromContext(ctx)`, but `logr.FromContext` does not return an error; this code as written will not compile and should instead use `logr.FromContextOrDiscard(ctx)` or a single-value `FromContext` call.
- The startup reconciliation event is sent on `startupEvent` before `add(mgr, r)` wires the channel into the controller and the channel is then closed via `defer`, so the startup request is likely dropped; consider moving the send after the controller is watching the channel and avoiding closing it prematurely so the initial reconcile is guaranteed to run.
## Individual Comments
### Comment 1
<location> `controllers/admissionpolicy/resources.go:41-42` </location>
<code_context>
+func getRequiredPolicy() *admissionv1.ValidatingAdmissionPolicy {
+ policyOnce.Do(func() {
+ namespace := hcoutil.GetOperatorNamespaceFromEnv()
+ requiredPolicy = &admissionv1.ValidatingAdmissionPolicy{
+ ObjectMeta: metav1.ObjectMeta{
+ Name: policyName,
+ Labels: hcoutil.GetLabels(hcov1beta1.HyperConvergedName, hcoutil.AppComponentDeployment),
</code_context>
<issue_to_address>
**issue (bug_risk):** Using a namespaced Deployment as ownerReference for cluster-scoped admission resources is invalid
`ValidatingAdmissionPolicy` and `ValidatingAdmissionPolicyBinding` are cluster-scoped, but their `ownerReferences` point to `ownresources.GetDeploymentRef()`, which is namespaced. Kubernetes forbids namespaced→cluster-scoped owner references, so these resources will fail validation and may not be created/updated. To avoid this, either use a cluster-scoped owner, switch to a different cleanup mechanism (e.g., finalizers), or remove the `ownerReference` entirely.
</issue_to_address>
### Comment 2
<location> `controllers/admissionpolicy/resources.go:67` </location>
<code_context>
+ },
+ },
+ },
+ Validations: []admissionv1.Validation{
+ {
+ Expression: fmt.Sprintf(`request.namespace == '%s'`, namespace),
</code_context>
<issue_to_address>
**issue (bug_risk):** CEL expression uses single-quoted string literals, which are invalid in CEL and will break the policy
The expression is currently generated as `request.namespace == '%s'`, but CEL for Kubernetes only accepts double-quoted string literals (e.g., `"ns"`); single quotes are invalid and will cause the policy to fail compilation, so the namespace check will never be enforced. Please change the format string to use double quotes, e.g. `fmt.Sprintf(`request.namespace == "%s"`, namespace).
</issue_to_address>
### Comment 3
<location> `pkg/webhooks/validator/validator_test.go:175-184` </location>
<code_context>
Expect(res.Result.Message).To(Equal("unknown operation request \"MALFORMED\""))
})
- It("should correctly handle operation errors", func() {
- cr.Namespace = ResourceInvalidNamespace
- req := newRequest(admissionv1.Create, cr, v1beta1Codec, false)
-
- res := wh.Handle(ctx, req)
- Expect(res.Allowed).To(BeFalse())
- Expect(res.Result.Code).To(Equal(int32(403)))
- Expect(res.Result.Reason).To(Equal(metav1.StatusReasonForbidden))
- Expect(res.Result.Message).To(Equal("invalid namespace for v1beta1.HyperConverged - please use the kubevirt-hyperconverged namespace"))
- })
-
It("should accept creation of a resource with a valid namespace", func() {
Expect(wh.ValidateCreate(ctx, dryRun, cr)).To(Succeed())
})
- It("should reject creation of a resource with an arbitrary namespace", func() {
- cr.Namespace = ResourceInvalidNamespace
- Expect(wh.ValidateCreate(ctx, dryRun, cr)).ToNot(Succeed())
</code_context>
<issue_to_address>
**suggestion (testing):** Namespace validation behavior is no longer covered by the webhook tests
With these tests removed, this suite no longer verifies that creating a HyperConverged CR in an invalid namespace is rejected. Even though the logic moved from the webhook to the ValidatingAdmissionPolicy, we should still test that the new policy enforces the namespace constraint. Please either add a focused unit test that checks the policy’s validation expression and error message, or an integration/e2e test that creates a CR in the wrong namespace and confirms the API server rejects it via the policy, so the regression this PR fixes remains covered by automated tests.
Suggested implementation:
```golang
It("should accept creation of a resource with a valid namespace", func() {
Expect(wh.ValidateCreate(ctx, dryRun, cr)).To(Succeed())
})
It("should reject creation of a resource with an arbitrary namespace", func() {
cr.Namespace = ResourceInvalidNamespace
err := wh.ValidateCreate(ctx, dryRun, cr)
Expect(err).ToNot(Succeed())
Expect(err.Error()).To(ContainSubstring("invalid namespace"))
Expect(err.Error()).To(ContainSubstring("kubevirt-hyperconverged"))
})
DescribeTable("Validate annotations", func(annotations map[string]string, assertion types.GomegaMatcher) {
```
1. If `wh.ValidateCreate` already returns a `error`, the `Expect(err).ToNot(Succeed())` line assumes the use of Gomega's `Succeed()` matcher on an `error`. If your tests typically use `Expect(err).To(HaveOccurred())` instead, replace that line with `Expect(err).To(HaveOccurred())` to match existing style.
2. Ensure that the error message produced by the ValidatingAdmissionPolicy for an invalid namespace actually contains both `"invalid namespace"` and `"kubevirt-hyperconverged"`. If the message differs, adjust the `ContainSubstring(...)` expectations to match the exact message emitted by the policy.
3. If `ResourceInvalidNamespace` is not already set to a clearly wrong namespace in the shared test setup, update its value so that this test meaningfully exercises the policy's namespace constraint.
</issue_to_address>
### Comment 4
<location> `controllers/admissionpolicy/resources.go:26` </location>
<code_context>
+ requiredPolicy *admissionv1.ValidatingAdmissionPolicy
+ requiredBinding *admissionv1.ValidatingAdmissionPolicyBinding
+
+ policyOnce = &sync.Once{}
+ bindingOnce = &sync.Once{}
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the singleton-style globals and sync.Once with simple per-call constructors that directly build and return the policy and binding objects.
You can drop the singleton pattern and globals entirely and keep the same behavior with simpler, easier-to-test code.
### 1. Remove `sync.Once` + global instances
Instead of:
```go
var (
requiredPolicy *admissionv1.ValidatingAdmissionPolicy
requiredBinding *admissionv1.ValidatingAdmissionPolicyBinding
policyOnce = &sync.Once{}
bindingOnce = &sync.Once{}
)
func getRequiredPolicy() *admissionv1.ValidatingAdmissionPolicy {
policyOnce.Do(func() {
namespace := hcoutil.GetOperatorNamespaceFromEnv()
requiredPolicy = &admissionv1.ValidatingAdmissionPolicy{
// ...
}
})
return requiredPolicy.DeepCopy()
}
func getRequiredBinding() *admissionv1.ValidatingAdmissionPolicyBinding {
bindingOnce.Do(func() {
requiredBinding = &admissionv1.ValidatingAdmissionPolicyBinding{
// ...
}
})
return requiredBinding.DeepCopy()
}
```
Build a fresh struct on each call and return it directly:
```go
func getRequiredPolicy() *admissionv1.ValidatingAdmissionPolicy {
namespace := hcoutil.GetOperatorNamespaceFromEnv()
return &admissionv1.ValidatingAdmissionPolicy{
ObjectMeta: metav1.ObjectMeta{
Name: policyName,
Labels: hcoutil.GetLabels(hcov1beta1.HyperConvergedName, hcoutil.AppComponentDeployment),
OwnerReferences: []metav1.OwnerReference{ownresources.GetDeploymentRef()},
},
Spec: admissionv1.ValidatingAdmissionPolicySpec{
FailurePolicy: ptr.To(admissionv1.Fail),
MatchConstraints: &admissionv1.MatchResources{
MatchPolicy: ptr.To(admissionv1.Equivalent),
NamespaceSelector: &metav1.LabelSelector{},
ObjectSelector: &metav1.LabelSelector{},
ResourceRules: []admissionv1.NamedRuleWithOperations{{
RuleWithOperations: admissionv1.RuleWithOperations{
Rule: admissionv1.Rule{
APIGroups: []string{hcov1beta1.APIVersionGroup},
APIVersions: []string{hcov1beta1.APIVersionBeta},
Resources: []string{"hyperconvergeds"},
Scope: ptr.To(admissionv1.NamespacedScope),
},
Operations: []admissionv1.OperationType{admissionv1.Create},
},
}},
},
Validations: []admissionv1.Validation{{
Expression: fmt.Sprintf(`request.namespace == '%s'`, namespace),
Message: fmt.Sprintf(`HyperConverged CR can only be created in the '%s' namespace.`, namespace),
}},
},
}
}
func getRequiredBinding() *admissionv1.ValidatingAdmissionPolicyBinding {
return &admissionv1.ValidatingAdmissionPolicyBinding{
ObjectMeta: metav1.ObjectMeta{
Name: policyBindingName,
Labels: hcoutil.GetLabels(hcov1beta1.HyperConvergedName, hcoutil.AppComponentDeployment),
OwnerReferences: []metav1.OwnerReference{ownresources.GetDeploymentRef()},
},
Spec: admissionv1.ValidatingAdmissionPolicyBindingSpec{
PolicyName: policyName,
ValidationActions: []admissionv1.ValidationAction{admissionv1.Deny},
},
}
}
```
Benefits:
- No global mutable state, no `sync.Once`, no `DeepCopy` calls.
- Construction is local and obvious; env and owner refs are evaluated per call, which is usually fine for such small objects.
- Concurrency is simpler: each caller gets its own instance.
### 2. Keep predicates as simple, stateless helpers
You can keep the predicates as plain globals without coupling them to the singleton logic:
```go
var (
policyPredicate = predicate.NewTypedPredicateFuncs[*admissionv1.ValidatingAdmissionPolicy](func(policy *admissionv1.ValidatingAdmissionPolicy) bool {
return policy.Name == policyName && policy.DeletionTimestamp == nil
})
bindingPredicate = predicate.NewTypedPredicateFuncs[*admissionv1.ValidatingAdmissionPolicyBinding](func(binding *admissionv1.ValidatingAdmissionPolicyBinding) bool {
return binding.Name == policyBindingName && binding.DeletionTimestamp == nil
})
)
```
If you want to go one step further in decoupling, you can also thread in the namespace/owner ref via parameters instead of reading env/globals inside the helpers, but the change above already removes the main indirection and global mutability without affecting functionality.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Pull Request Test Coverage Report for Build 20021419104Details
💛 - Coveralls |
This is just wrong. the signature of MergeLabels is
This is just wrong.
This is not true. Since the channel is cached channel, we can read from it even if it is closed. This code is working. Just for example (runnable version here: https://go.dev/play/p/x0GaTpwpx3o): func main() {
ch := make(chan string, 1)
ch <- "hello"
close(ch)
go func() {
fmt.Println("Starting listening to the channel")
str := <-ch
if str == "" {
} else {
fmt.Printf("Got %q\n", str)
<-ch
fmt.Println("Now the channel is finally closed")
}
}()
time.Sleep(time.Second)
fmt.Println("Done; exiting...")
}The output is: |
|
hco-e2e-operator-sdk-sno-aws lane succeeded. |
|
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-sno-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded. |
|
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-gcp, ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded. |
|
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
| } | ||
|
|
||
| // add adds a new Controller to mgr with r as the reconcile.Reconciler | ||
| func add(mgr manager.Manager, r *ReconcileAdmissionPolicy) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func add(mgr manager.Manager, r *ReconcileAdmissionPolicy) error { | |
| func addController(mgr manager.Manager, r *ReconcileAdmissionPolicy) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use this name in all existing controller. Let's keep the name for consistency.
|
hco-e2e-kv-smoke-gcp lane succeeded. |
|
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
hco-e2e-upgrade-operator-sdk-aws lane succeeded. |
|
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |



What this PR does / why we need it:
HCO webhook forces creating the HyperConverged CR only in the installation namespace, and rejects any request to create it in another namespace. However, when installing with OLM and with the
allNamespacesinstall mode, the webhook only applied to the current namespace, so the webhook actually only checks for the right namespace, missing the creation request in any other namespace. To solve it, the webhook removes the namespace selector from the ValidatingWebhookConfiguration, making it work for all namespaces.Apparently, there is some kind of race condition with OLM, so eventually, OLM set the namespace selector back, after the webhook removes it.
The ValidatingAdmissionPolicy became stable at kubernetes v1.30, offers better way to validate the HyperConverged namespace.
This PR adds the new admission policy controller, to reconcile new ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding resources, to verify the HyperConverged namespace.
Note: we need a new controller, and can't use the main (hyperconverged) controller, because the hyperconverged controller only reconciles resources if they the HyperConverged CR exists. We need the validation for the HyperConverged creation, so it must be in place before that.
Jira Ticket:
Release note: