-
Notifications
You must be signed in to change notification settings - Fork 127
Amadeus patches: Advanced Statefulset scaling + Support for updating VolumeClaimTemplates #1644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces significant enhancements to StatefulSet scaling behavior in the Splunk Operator, focusing on CPU-aware scaling, parallel pod updates, and configurable scale-up timeout mechanisms. However, the PR description is incomplete ("To be updated...."), making it difficult to understand the full context and motivation.
Key Changes
- CPU-preserving scaling that maintains total CPU allocation when pod CPU requests change
- Parallel pod update support to speed up large cluster updates (configurable via annotation)
- Configurable scale-up timeout to handle scenarios where pods don't become ready
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/splunk/test/controller.go | Added PodList support to mock client for testing |
| pkg/splunk/splkcontroller/statefulset.go | Core implementation of CPU-aware scaling, parallel updates, and timeout mechanisms |
| pkg/splunk/splkcontroller/statefulset_test.go | Comprehensive tests for scale-up timeout and CPU-aware scaling |
| pkg/splunk/splkcontroller/statefulset_parallel_test.go | New test suite for parallel pod update functionality |
| pkg/splunk/splkcontroller/statefulset_cpu_test.go | New test suite for CPU-aware scaling features |
| pkg/splunk/splkcontroller/statefulset_cpu_scaledown_test.go | Tests for CPU-aware scale-down operations |
| pkg/splunk/enterprise/indexercluster.go | Enhanced PrepareScaleDown with fallback cleanup mechanism |
| pkg/splunk/enterprise/indexercluster_test.go | Updated tests for new scale-down behavior and zombie peer prevention |
| pkg/splunk/enterprise/searchheadcluster_test.go | Test updates to match new scale-down call patterns |
| docs/CustomResources.md | Documentation for new scaling behavior annotations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if metrics.OriginalCPUPerPod > 0 { | ||
| oldPodsPerNewPod = metrics.TargetCPUPerPod / metrics.OriginalCPUPerPod | ||
| if oldPodsPerNewPod > 1 { | ||
| oldPodsPerNewPod-- // Safety margin | ||
| } | ||
| if oldPodsPerNewPod < 1 { | ||
| oldPodsPerNewPod = 1 | ||
| } | ||
| } |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'oldPodsPerNewPod' is calculated but may produce incorrect results when OriginalCPUPerPod is zero or when integer division truncates important information. For example, if TargetCPUPerPod is 3000 and OriginalCPUPerPod is 2000, the result would be 1 (after safety margin), but the actual ratio should allow for more deletions. Consider using floating-point arithmetic or validating that OriginalCPUPerPod is non-zero before performing the division.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do that on purpose as metrics.OriginalCPUPerPod is never zero, and we want to know how many old pods can be deleted when having new pods safely (without losing the capacity of the cluster)
| minTimeout = 30 * time.Second | ||
| maxTimeout = 24 * time.Hour |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded minimum timeout of 30 seconds and maximum of 24 hours may not be appropriate for all cluster sizes and operational requirements. The minimum value of 30 seconds could be too long for small test clusters, while 24 hours may be too long for production environments where faster feedback is desired. Consider making these values configurable or documenting the rationale for these specific limits.
docs/CustomResources.md
Outdated
| - Duration strings like `"5m"`, `"10m"`, `"1h"`, `"30s"`, `"5m30s"` to set a specific timeout | ||
| - `"0s"` or `"0"` to immediately proceed with scale-up without waiting | ||
| - Empty or missing annotation waits indefinitely (no timeout) | ||
| - Invalid or negative values wait indefinitely with a warning logged |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states that invalid or negative values result in waiting indefinitely with "a warning logged", but the implementation in getScaleUpReadyWaitTimeout (lines 1140-1190) does not actually log any warning when invalid values are encountered. The function silently returns the default timeout. Either update the documentation to match the implementation or add appropriate warning logs when invalid values are detected.
| - Invalid or negative values wait indefinitely with a warning logged | |
| - Invalid or negative values wait indefinitely (no timeout) |
07c9115 to
ef69362
Compare
There are several different changes in this PR, in some dedicated commits.
As they have some overlapped parts, it is difficult to split them into different PRs.
1.0. Flexible scaling
Description
Makes StatefulSet scaling more flexible by fixing two annoying behaviors:
Scale-down no longer blocks on unhealthy pods - Previously the operator waited for all pods to be ready before scaling down, which doesn't make sense since removing pods doesn't need healthy pods. Now it just checks if we have more replicas than desired and proceeds.
Scale-up has configurable timeout - Instead of waiting forever for pods to be ready, users can now set a timeout via annotation. After the timeout, scale-up proceeds anyway (with a warning logged). Default is still "wait forever" to maintain backward compat.
Also includes some defensive improvements to scale-down:
PrepareScaleDown()to handle edge cases like manually deleted podsKey Changes
New annotations:
operator.splunk.com/scale-up-ready-wait-timeout: time of waiting for all pods to be ready before scaling upoperator.splunk.com/scale-up-wait-started: internal tracking the starting time of scale up waitpkg/splunk/splkcontroller/statefulset.go
ScaleUpReadyWaitTimeoutAnnotation(operator.splunk.com/scale-up-ready-wait-timeout) andScaleUpWaitStartedAnnotationconstantsUpdateStatefulSetPods()- now checks scale-down first, before the ready-wait logichandleScaleDown()andhandleScaleUp()pkg/splunk/enterprise/indexercluster.go
PrepareScaleDown(): Added bounds checking and fallback to query CM directly when status is staledecommission(): Added bounds check, empty status now returns true (decommission complete)cleanupPeerFromClusterManager(): Fallback mechanism for stale CR status scenariosTesting and Verification
Unit tests added:
Please check the code
Manual E2E testing:
"0s"value - immediate scale-up without waitingRelated Issues
#1646
PR Checklist
2.0. CPU-preserving scaling and parallel pod updates in StatefulSet management
Description
Add CPU-preserving scaling and parallel pod updates to StatefulSet management. This lets the operator automatically adjust replica counts when CPU requests change, keeping total cluster CPU capacity stable. Also adds ability to update multiple pods in parallel during rolling updates.
This change is fully backward compatible.
Key Changes
New Annotations:
operator.splunk.com/preserve-total-cpu- enables CPU-preserving replica adjustmentoperator.splunk.com/parallel-pod-updates- controls concurrent pod updates (< 1.0 = percentage, >= 1.0 = absolute count)operator.splunk.com/cpu-aware-target-replicas- internal tracking for gradual scale-downFiles changed:
ScalingCPUMetricsstruct for tracking CPU state during transitionsApplyStatefulSetto detect CPU changes and adjust replicasUpdateStatefulSetPodswithhandleCPUPreservingTransitionfor CPU-aware transitions with bounds checkingCheckStatefulSetPodsForUpdatesfor revision-based pod recycling with parallel update supportBehavior:
Testing and Verification
...
Related Issues
#1645
PR Checklist
3.0. Support for updating VolumeClaimTemplates
Description
Adds support for updating VolumeClaimTemplates (VCT) in StatefulSets managed by the Splunk Operator. Previously, any VCT change was silently ignored because K8s doesn't allow modifying VCTs on existing StatefulSets.
Now the operator can handle:
This means you can finally migrate storage classes or expand disk space without manual intervention.
Key Changes
pkg/splunk/splkcontroller/statefulset.go
VCTStorageChangeandVCTCompareResultstructs to track VCT differencesCompareVolumeClaimTemplates()- compares current vs desired VCTs, detects storage increases vs other changesExpandPVCStorage()- patches PVCs with new storage requests, handles in-progress resizesApplyStatefulSet()to:CompareVolumeClaimTemplates()before checking other updatespkg/splunk/util/util.go
DeleteResource()to acceptclient.DeleteOptionfor orphan cascade supportTesting and Verification
Unit tests added:
Some new unit tests are added to cover different scenarios
Manual testing:
Related Issues
#1647
PR Checklist
4.0 Supporting improving startup performance on large persistent volumes by configuring
fsGroupChangePolicyDescription
This PR adds support for configuring
fsGroupChangePolicyon Splunk pod security contexts to improve startup performance on large persistent volumes.When pods with large PVs start up, Kubernetes can spend a long time recursively changing ownership/permissions on all files. By setting the policy to
OnRootMismatch, we skip this expensive operation if the root dir already has the correct fsGroup - which is the case for most restarts.Key Changes
api/v4/common_types.go: Added
FSGroupChangePolicyfield to the common Spec struct. Users can now set this in their CR spec with validation forAlwaysorOnRootMismatchvalues.pkg/splunk/enterprise/configuration.go: Added
getFSGroupChangePolicy()helper that implements the precedence logic (annotation > spec > default). The default is nowOnRootMismatchfor better out-of-the-box perf.pkg/splunk/splkcontroller/statefulset.go: Added the
FSGroupChangePolicyAnnotationconstant for theoperator.splunk.com/fs-group-change-policyannotation.docs/CustomResources.md: Added docs explaining the feature, configuration options (spec field vs annotation), and the precedence rules.
CRD yaml files: Updated all CRD definitions to include the new field (generated from
make generateand commandmake manifestTesting and Verification
getFSGroupChangePolicy()covering all precedence scenariosRelated Issues
#1648
PR Checklist