-
Notifications
You must be signed in to change notification settings - Fork 2
feat(webhook): add mutating and validating admission webhooks #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
fernando-villalba
wants to merge
9
commits into
main
Choose a base branch
from
webhook
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit introduces the `pkg/webhook` module, implementing the Admission Control layer for the Multigres operator. This ensures that `MultigresCluster` resources are validated for referential integrity and populated with "visible defaults" before persistence. Key Features: - **Mutating Webhook (Defaulter):** - Injects the System Catalog (`postgres` database) if missing. - Implements "Visible Defaults": Resolves implicit templates (Level 2) and hydrates the Parent `Spec` so users see the fully realized configuration in Etcd. - Respects API Mutual Exclusion: Explicit `templateRef` values are preserved (not expanded) to strictly adhere to validation rules forbidding both `spec` and `ref` on the same component. - Uses a request-scoped `Resolver` to respect Cluster-level `TemplateDefaults`, mirroring Controller behavior. - **Validating Webhook:** - **Referential Integrity:** Blocks the creation/update of Clusters referencing non-existent Core, Cell, or Shard templates. - **Deletion Protection:** Prevents the deletion of any Template (Core/Cell/Shard) that is currently referenced by an active MultigresCluster, preventing orphaned references. - **Child Resource Protection:** Blocks direct modification of managed child resources (Cells, Shards) by non-operator users. - **Infrastructure & Testing:** - Updated `cmd/multigres-operator/main.go` to wire up the Webhook Server, including CLI flags for cert management and default templates. - Updated `Makefile` to generate `MutatingWebhookConfiguration` and `ValidatingWebhookConfiguration` manifests. - Added a high-performance Integration Test suite (`pkg/webhook/integration_test.go`) using a shared `envtest` environment. This suite covers happy paths, edge cases (override precedence), and concurrent cache consistency (deletion races).
🔬 Go Test Coverage ReportSummary
StatusDetailShow New Coverage |
…itable cert volume This commit resolves multiple critical issues preventing the Operator's webhook server from starting up successfully in a secure, production-like environment (e.g., Kind). 1. Fix RBAC Permissions: The self-signed certificate strategy requires the operator to manage Secrets (for the CA), Services (for the endpoint), and to patch WebhookConfigurations (to inject the CA bundle). - Added `// +kubebuilder:rbac` markers to `pkg/webhook/cert/manager.go`. - Regenerated `config/rbac/role.yaml` to include permissions for `secrets`, `services`, `mutatingwebhookconfigurations`, and `validatingwebhookconfigurations`. 2. Fix Namespace Mismatch (CrashLoop): The operator was crashing because it defaulted to looking for resources in `multigres-system` while running in `multigres-operator`. - Updated `config/manager/manager.yaml` to inject the `POD_NAMESPACE` environment variable via the Kubernetes Downward API. - Updated `cmd/multigres-operator/main.go` to use `POD_NAMESPACE` as the default for the `--webhook-service-namespace` flag, ensuring the operator effectively "knows where it lives." 3. Fix Read-Only Filesystem Error: The deployment enables `readOnlyRootFilesystem: true`, causing the self-signed certificate generator to fail when writing `tls.crt` and `tls.key` to disk. - Added an `emptyDir` volume named `cert-dir` in `config/manager/manager.yaml`. - Mounted this volume to `/var/run/secrets/webhook` to provide a secure, writable scratch space for ephemeral certificates. 4. Build System: - Updated `Makefile` to include `./pkg/webhook/...` in the `controller-gen` object generation paths to ensuring deepcopy methods are generated for webhook-related types.
… resolver scoping
This commit finalizes the "Hybrid Admission Control" implementation by polishing the Webhook Certificate Manager, enforcing "Deep Defaults" (resources/limits) across all components, and fixing critical scoping bugs in the Template Resolver.
Webhook & Certificate Management:
* Auto-Discovery: Removed the manual `--webhook-service-name` flag. The Certificate Manager now automatically discovers the Service targeting the operator's pods (via `control-plane=controller-manager` label) to generate the correct SANs.
* Service Account: Injected `POD_SERVICE_ACCOUNT` env var to allow the webhook to identify the operator's own identity for child-resource validation exemptions.
* Setup: Refactored `webhook.Setup` to use a direct API client for certificate bootstrapping, avoiding cache start-up race conditions.
Resolver & Defaulting Logic:
* Namespace Scoping Bug: Fixed a critical bug in `handlers/defaulter.go` where the Resolver was using the Operator's namespace to look up templates instead of the Request's namespace.
* Deep Defaults: Implemented `DefaultResources*` functions in `resolver/defaults.go` and applied them in `cell.go` and `shard.go`. The Resolver now fully hydrates ResourceRequirements (Requests/Limits) and Storage if missing.
* Smart Defaulting: Updated `resolver/cluster.go` to inject a default `Pool` ("default") when automatically creating the system "0" shard.
* Template Fallbacks: Updated `handlers/validator.go` to skip "NotFound" errors for templates named "default" (Fallback templates), as the Resolver handles these via hardcoded values if they don't exist in the cluster.
Testing & Validation:
* Unit Tests: Updated `resolver/cell_test.go`, `shard_test.go`, and `cluster_test.go` to expect fully hydrated objects (with default resources and replicas) instead of empty structs.
* Integration Tests: Updated `integration_test.go`, `integration_lifecycle_test.go`, and `integration_resolution_enforcement_test.go` to match the new behavior where the controller enforces defaults immediately.
* Cert Manager Tests: Fixed `cert/manager_test.go` to mock the Service and WebhookConfigurations correctly for the new auto-discovery logic.
Dev Infrastructure:
* Samples: Added `config/samples/default-templates/` containing standard Core, Cell, and Shard templates.
* Makefile: Added `kind-deploy-no-webhook` target and a `config/no-webhook` Kustomize profile to allow deploying the operator in a simplified mode for debugging.
* Kustomize: Enabled the webhook component by default in `config/default`.
83a8eab to
bf70445
Compare
Refactors the webhook setup to use the controller-runtime builder pattern,
modernizes handlers to implement CustomValidator/Defaulter interfaces, and
overhauls internal certificate generation to be a managed background process.
Detailed Changes:
1. Webhook Registration & Handlers (Refactor)
- Switched pkg/webhook/setup.go to use ctrl.NewWebhookManagedBy(mgr).
This replaces manual server.Register calls and eliminates path drift
between Go code and Kubebuilder markers.
- Refactored handlers/defaulter.go and handlers/validator.go to
implement webhook.CustomDefaulter and webhook.CustomValidator interfaces,
removing manual JSON decoding logic.
2. Internal Certificate Management (Overhaul)
- Converted pkg/webhook/cert from a static manager to a CertRotator that
implements manager.Runnable.
- Added a Bootstrap() phase in main.go to ensure certificates exist
on disk before the webhook server binds to the port.
- Implemented auto-detection: checks if certificates exist (e.g., from
Cert Manager) and skips internal rotation if they do.
- Added logic to automatically patch Mutating/ValidatingWebhookConfigurations
with the generated CA bundle on every rotation.
- Secrets now attempt to set the Operator's Deployment as an owner for
garbage collection.
3. Configuration & RBAC
- Updated api/v1alpha1/multigrescluster_types.go with RBAC permissions
to manage Secrets and patch WebhookConfigurations.
- Updated main.go to disable caching for Secret and Deployment objects
to allow bootstrapping before the cache is fully started.
- Deprecated explicit cert-strategy flags in favor of auto-detection.
This change was smoke tested, the tests need to be fixed, will be done on next commit.
Refactors webhook handlers to use typed interfaces, hardens integration tests against race conditions, and tunes certificate rotation settings. Detailed changes: - tests(tablegroup): wrap updates in RetryOnConflict to fix flaky TestTableGroup_Lifecycle caused by controller/test race conditions. - refactor(webhook): migrate Defaulter and Validator handlers to implement controller-runtime's CustomDefaulter and CustomValidator interfaces, simplifying logic and fixing build errors. - fix(webhook/cert): increase internal certificate rotation threshold to 30 days (from 24h) to ensure ample recovery time during failures. - test(webhook/cert): update manager tests to validate DNS names and correctly assert rotation behavior with the new threshold. - style: fix staticcheck ST1005 error string formatting in validator.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat(webhook): add mutating and validating admission webhooks
This commit introduces the
pkg/webhookmodule, implementing the Admission Control layer for the Multigres operator. This ensures thatMultigresClusterresources are validated for referential integrity and populated with "visible defaults" before persistence.Key Features:
Mutating Webhook (Defaulter): - Injects the System Catalog (
postgresdatabase) if missing.Specso users see the fully realized configuration in Etcd.templateRefvalues are preserved (not expanded) to strictly adhere to validation rules forbidding bothspecandrefon the same component.Resolverto respect Cluster-levelTemplateDefaults, mirroring Controller behavior.Validating Webhook:
Infrastructure & Testing:
cmd/multigres-operator/main.goto wire up the Webhook Server, including CLI flags for cert management and default templates.Makefileto generateMutatingWebhookConfigurationandValidatingWebhookConfigurationmanifests.pkg/webhook/integration_test.go) using a sharedenvtestenvironment. This suite covers happy paths, edge cases (override precedence), and concurrent cache consistency (deletion races).