Amadeus patches: Advanced Statefulset scaling + Support for updating VolumeClaimTemplates #1644

ductrung-nguyen · 2025-12-15T13:32:53Z

There are several different changes in this PR, in some dedicated commits.
As they have some overlapped parts, it is difficult to split them into different PRs.

1.0. Flexible scaling

Description

Makes StatefulSet scaling more flexible by fixing two annoying behaviors:

Scale-down no longer blocks on unhealthy pods - Previously the operator waited for all pods to be ready before scaling down, which doesn't make sense since removing pods doesn't need healthy pods. Now it just checks if we have more replicas than desired and proceeds.
Scale-up has configurable timeout - Instead of waiting forever for pods to be ready, users can now set a timeout via annotation. After the timeout, scale-up proceeds anyway (with a warning logged). Default is still "wait forever" to maintain backward compat.

Also includes some defensive improvements to scale-down:

Better bounds checking in PrepareScaleDown() to handle edge cases like manually deleted pods
Fallback mechanism to query Cluster Manager directly when CR status is stale
Empty peer status now treated as "decommission complete" instead of blocking

Key Changes

New annotations:

operator.splunk.com/scale-up-ready-wait-timeout: time of waiting for all pods to be ready before scaling up
operator.splunk.com/scale-up-wait-started: internal tracking the starting time of scale up wait

pkg/splunk/splkcontroller/statefulset.go

Added ScaleUpReadyWaitTimeoutAnnotation (operator.splunk.com/scale-up-ready-wait-timeout) and ScaleUpWaitStartedAnnotation constants
Refactored UpdateStatefulSetPods() - now checks scale-down first, before the ready-wait logic
Refactored some functions by extracting the code into handleScaleDown() and handleScaleUp()
Some helper functions
Timeout validation: accepts 30s-24h range, 0s for immediate bypass, missing annotation or invalid values default to "wait forever" (current behavior)

pkg/splunk/enterprise/indexercluster.go

PrepareScaleDown(): Added bounds checking and fallback to query CM directly when status is stale
decommission(): Added bounds check, empty status now returns true (decommission complete)
New cleanupPeerFromClusterManager(): Fallback mechanism for stale CR status scenarios

Testing and Verification

Unit tests added:
Please check the code

Manual E2E testing:

Deployed IndexerCluster, killed one pod, verified scale-down proceeds despite unhealthy pod
Tested timeout annotation with 2m value - confirmed scale-up proceeds after timeout
Tested "0s" value - immediate scale-up without waiting
Tested missing/invalid annotation - falls back to wait-forever behavior

Related Issues

#1646

PR Checklist

Code changes adhere to the project's coding standards.
Relevant unit and integration tests are included.
Documentation has been updated accordingly.
All tests pass locally.
The PR description follows the project's guidelines.

2.0. CPU-preserving scaling and parallel pod updates in StatefulSet management

Description

Add CPU-preserving scaling and parallel pod updates to StatefulSet management. This lets the operator automatically adjust replica counts when CPU requests change, keeping total cluster CPU capacity stable. Also adds ability to update multiple pods in parallel during rolling updates.

This change is fully backward compatible.

Key Changes

New Annotations:

operator.splunk.com/preserve-total-cpu - enables CPU-preserving replica adjustment
operator.splunk.com/parallel-pod-updates - controls concurrent pod updates (< 1.0 = percentage, >= 1.0 = absolute count)
operator.splunk.com/cpu-aware-target-replicas - internal tracking for gradual scale-down

Files changed:

pkg/splunk/splkcontroller/statefulset.go - main implementation
- Added helper functions
- Added ScalingCPUMetrics struct for tracking CPU state during transitions
- Modified ApplyStatefulSet to detect CPU changes and adjust replicas
- Modified UpdateStatefulSetPods with handleCPUPreservingTransition for CPU-aware transitions with bounds checking
- Modified CheckStatefulSetPodsForUpdates for revision-based pod recycling with parallel update support

Behavior:

Scale-up: happens immediately when CPU decreases per pod (more pods needed)
Scale-down: happens gradually with safety checks (waits for new-spec pods to be ready, respects CPU bounds)
Parallel updates: defaults to 1 pod at a time, configurable via annotation

Testing and Verification

Added unit tests for all new helper functions
Added integration-style tests for CPU-aware scaling scenarios
Added tests for percentage and absolute parallel update modes
Existing tests updated to reflect new scale-down prioritization behavior
...

Related Issues

#1645

PR Checklist

Code changes adhere to the project's coding standards.
Relevant unit and integration tests are included.
Documentation has been updated accordingly.
All tests pass locally.
The PR description follows the project's guidelines.

3.0. Support for updating VolumeClaimTemplates

Description

Adds support for updating VolumeClaimTemplates (VCT) in StatefulSets managed by the Splunk Operator. Previously, any VCT change was silently ignored because K8s doesn't allow modifying VCTs on existing StatefulSets.

Now the operator can handle:

Immutable VCT changes (storage class, access modes) - replaces StatefulSet using orphan cascade
Storage expansion - patches existing PVCs directly

This means you can finally migrate storage classes or expand disk space without manual intervention.

Key Changes

pkg/splunk/splkcontroller/statefulset.go

Added VCTStorageChange and VCTCompareResult structs to track VCT differences
Added CompareVolumeClaimTemplates() - compares current vs desired VCTs, detects storage increases vs other changes
Added ExpandPVCStorage() - patches PVCs with new storage requests, handles in-progress resizes
Updated ApplyStatefulSet() to:
- Call CompareVolumeClaimTemplates() before checking other updates
- Trigger PVC expansion when only storage size changed
- Delete/recreate StatefulSet (orphan cascade) for immutable VCT changes
- Skip replacement if StatefulSet already being deleted (prevents loops)

pkg/splunk/util/util.go

Updated DeleteResource() to accept client.DeleteOption for orphan cascade support

Testing and Verification

Unit tests added:
Some new unit tests are added to cover different scenarios

Manual testing:

Deployed IndexerCluster, changed storage class from "ssdv1" to "ssdv2" -> StatefulSet replaced, pods adopted, PVCs retained
Deployed PVC with 1Ti, bumped to 2Ti -> PVCs patched, resize completed
Verified storage decrease is ignored (no changes applied)
Verified no replacement loop when StatefulSet has DeletionTimestamp

Related Issues

#1647

PR Checklist

Code changes adhere to the project's coding standards.
Relevant unit and integration tests are included.
Documentation has been updated accordingly.
All tests pass locally.
The PR description follows the project's guidelines.

4.0 Supporting improving startup performance on large persistent volumes by configuring `fsGroupChangePolicy`

Description

This PR adds support for configuring fsGroupChangePolicy on Splunk pod security contexts to improve startup performance on large persistent volumes.

When pods with large PVs start up, Kubernetes can spend a long time recursively changing ownership/permissions on all files. By setting the policy to OnRootMismatch, we skip this expensive operation if the root dir already has the correct fsGroup - which is the case for most restarts.

Key Changes

api/v4/common_types.go: Added FSGroupChangePolicy field to the common Spec struct. Users can now set this in their CR spec with validation for Always or OnRootMismatch values.
pkg/splunk/enterprise/configuration.go: Added getFSGroupChangePolicy() helper that implements the precedence logic (annotation > spec > default). The default is now OnRootMismatch for better out-of-the-box perf.
pkg/splunk/splkcontroller/statefulset.go: Added the FSGroupChangePolicyAnnotation constant for the operator.splunk.com/fs-group-change-policy annotation.
docs/CustomResources.md: Added docs explaining the feature, configuration options (spec field vs annotation), and the precedence rules.
CRD yaml files: Updated all CRD definitions to include the new field (generated from make generate and command make manifest

Testing and Verification

Added unit tests for getFSGroupChangePolicy() covering all precedence scenarios
Tested with both spec field and annotation configurations
Verified invalid annotation values fall back gracefully with warning log
Manual testing on cluster with large PVs showed significant startup time improvement

Related Issues

#1648

PR Checklist

Code changes adhere to the project's coding standards.
Relevant unit and integration tests are included.
Documentation has been updated accordingly.
All tests pass locally.
The PR description follows the project's guidelines.

Copilot

Pull request overview

This PR introduces significant enhancements to StatefulSet scaling behavior in the Splunk Operator, focusing on CPU-aware scaling, parallel pod updates, and configurable scale-up timeout mechanisms. However, the PR description is incomplete ("To be updated...."), making it difficult to understand the full context and motivation.

Key Changes

CPU-preserving scaling that maintains total CPU allocation when pod CPU requests change
Parallel pod update support to speed up large cluster updates (configurable via annotation)
Configurable scale-up timeout to handle scenarios where pods don't become ready

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pkg/splunk/test/controller.go	Added PodList support to mock client for testing
pkg/splunk/splkcontroller/statefulset.go	Core implementation of CPU-aware scaling, parallel updates, and timeout mechanisms
pkg/splunk/splkcontroller/statefulset_test.go	Comprehensive tests for scale-up timeout and CPU-aware scaling
pkg/splunk/splkcontroller/statefulset_parallel_test.go	New test suite for parallel pod update functionality
pkg/splunk/splkcontroller/statefulset_cpu_test.go	New test suite for CPU-aware scaling features
pkg/splunk/splkcontroller/statefulset_cpu_scaledown_test.go	Tests for CPU-aware scale-down operations
pkg/splunk/enterprise/indexercluster.go	Enhanced PrepareScaleDown with fallback cleanup mechanism
pkg/splunk/enterprise/indexercluster_test.go	Updated tests for new scale-down behavior and zombie peer prevention
pkg/splunk/enterprise/searchheadcluster_test.go	Test updates to match new scale-down call patterns
docs/CustomResources.md	Documentation for new scaling behavior annotations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-15T13:50:01Z

pkg/splunk/splkcontroller/statefulset.go

+	if metrics.OriginalCPUPerPod > 0 {
+		oldPodsPerNewPod = metrics.TargetCPUPerPod / metrics.OriginalCPUPerPod
+		if oldPodsPerNewPod > 1 {
+			oldPodsPerNewPod-- // Safety margin
+		}
+		if oldPodsPerNewPod < 1 {
+			oldPodsPerNewPod = 1
+		}
+	}


The variable 'oldPodsPerNewPod' is calculated but may produce incorrect results when OriginalCPUPerPod is zero or when integer division truncates important information. For example, if TargetCPUPerPod is 3000 and OriginalCPUPerPod is 2000, the result would be 1 (after safety margin), but the actual ratio should allow for more deletions. Consider using floating-point arithmetic or validating that OriginalCPUPerPod is non-zero before performing the division.

we do that on purpose as metrics.OriginalCPUPerPod is never zero, and we want to know how many old pods can be deleted when having new pods safely (without losing the capacity of the cluster)

Copilot · 2025-12-15T13:50:01Z

pkg/splunk/splkcontroller/statefulset.go

+		minTimeout     = 30 * time.Second
+		maxTimeout     = 24 * time.Hour


The hardcoded minimum timeout of 30 seconds and maximum of 24 hours may not be appropriate for all cluster sizes and operational requirements. The minimum value of 30 seconds could be too long for small test clusters, while 24 hours may be too long for production environments where faster feedback is desired. Consider making these values configurable or documenting the rationale for these specific limits.

Copilot · 2025-12-15T13:50:01Z

docs/CustomResources.md

+- Duration strings like `"5m"`, `"10m"`, `"1h"`, `"30s"`, `"5m30s"` to set a specific timeout
+- `"0s"` or `"0"` to immediately proceed with scale-up without waiting
+- Empty or missing annotation waits indefinitely (no timeout)
+- Invalid or negative values wait indefinitely with a warning logged


The documentation states that invalid or negative values result in waiting indefinitely with "a warning logged", but the implementation in getScaleUpReadyWaitTimeout (lines 1140-1190) does not actually log any warning when invalid values are encountered. The function silently returns the default timeout. Either update the documentation to match the implementation or add appropriate warning logs when invalid values are detected.

Suggested change

- Invalid or negative values wait indefinitely with a warning logged

- Invalid or negative values wait indefinitely (no timeout)

…atch

Copilot AI review requested due to automatic review settings December 15, 2025 13:32

Copilot started reviewing on behalf of ductrung-nguyen December 15, 2025 13:33 View session

ductrung-nguyen changed the title ~~Amadeus patches~~ Amadeus patches: Advanced Statefulset scaling + Support for updating VolumeClaimTemplates Dec 15, 2025

Copilot AI reviewed Dec 15, 2025

View reviewed changes

NGUYEN Duc Trung added 3 commits December 16, 2025 00:58

feat: support scaling up/down in more flexible way

c2b486d

feat: support advanced scaling (CPU-awareness + parallel pod processing)

8ec4798

feat: support updating VolumeClaimTemplate in the statefulset

ef69362

ductrung-nguyen force-pushed the amadeus-patches branch from 07c9115 to ef69362 Compare December 16, 2025 00:00

feat: support speeding up indexer start up time by setting OnRootMism…

9cfd1ce

…atch

ductrung-nguyen marked this pull request as draft December 19, 2025 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Amadeus patches: Advanced Statefulset scaling + Support for updating VolumeClaimTemplates #1644

Amadeus patches: Advanced Statefulset scaling + Support for updating VolumeClaimTemplates #1644

Uh oh!

ductrung-nguyen commented Dec 15, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

ductrung-nguyen Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- Invalid or negative values wait indefinitely with a warning logged
	- Invalid or negative values wait indefinitely (no timeout)

Amadeus patches: Advanced Statefulset scaling + Support for updating VolumeClaimTemplates #1644

Are you sure you want to change the base?

Amadeus patches: Advanced Statefulset scaling + Support for updating VolumeClaimTemplates #1644

Uh oh!

Conversation

ductrung-nguyen commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1.0. Flexible scaling

Description

Key Changes

Testing and Verification

Related Issues

PR Checklist

2.0. CPU-preserving scaling and parallel pod updates in StatefulSet management

Description

Key Changes

Testing and Verification

Related Issues

PR Checklist

3.0. Support for updating VolumeClaimTemplates

Description

Key Changes

Testing and Verification

Related Issues

PR Checklist

4.0 Supporting improving startup performance on large persistent volumes by configuring fsGroupChangePolicy

Description

Key Changes

Testing and Verification

Related Issues

PR Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

ductrung-nguyen Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ductrung-nguyen commented Dec 15, 2025 •

edited

Loading

4.0 Supporting improving startup performance on large persistent volumes by configuring `fsGroupChangePolicy`