Skip to content

Commit 59988f4

Browse files
Add AEP for restoring selector support to VPA
Signed-off-by: Manoj Sudheendra <[email protected]>
1 parent 9db3d25 commit 59988f4

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# AEP-XXXX: Restore Label Selector Support to VPA
2+
3+
## Summary
4+
5+
This proposal enhances the VerticalPodAutoscalerSpec to allow the co-existence of TargetRef and Selector (present in v1beta1 but removed in v1). When both are present, the Selector acts as a filter applied to the Pods managed by the TargetRef. This allows multiple VPA objects to manage disjoint subsets of a single Controller's pods (e.g., partitioning a StatefulSet/Deployment into Leaders and Followers).
6+
7+
## Motivation
8+
9+
Currently, VPA relies exclusively on `targetRef` to identify Pods. This enforces a 1:1 relationship between the VPA and a Workload Controller (Deployment, StatefulSet, etc.).
10+
11+
While this is sufficient for stateless workloads, it fails for **Heterogeneous Stateful Workloads** where pods in the same controller perform different roles with different resource footprints.
12+
13+
**The Problem: Leader vs. Follower**
14+
VPA aggregates metrics from all Pods in the target controller into a single histogram. This averages the usage of "high-utilization" (Leader) and "low-utilization" (Follower) pods.
15+
16+
**The Solution:**
17+
By restoring the `selector` field, users can partition a single workload into multiple VPA profiles based on the Pod's current state:
18+
1. **VPA-Leader:** Selects `role=leader`
19+
2. **VPA-Follower:** Selects `role=follower`
20+
21+
When a Pod promotes from Follower to Leader, its label changes, and it effectively migrates from the Follower VPA to the Leader VPA instantly.
22+
23+
## Proposal
24+
25+
Modify the `VerticalPodAutoscalerSpec` to re-introduce `Selector` as an optional field. This field was present in `v1beta1` but removed in `v1`.
26+
27+
### API Spec
28+
29+
```go
30+
type VerticalPodAutoscalerSpec struct {
31+
// TargetRef points to the controller managing the set of pods.
32+
// + required
33+
TargetRef *autoscaling.CrossVersionObjectReference `json:"targetRef"`
34+
35+
// [PROPOSED]
36+
// A label query that further restricts the set of pods controlled by the Autoscaler.
37+
//
38+
// If provided, the VPA manages only the subset of pods that:
39+
// 1. Are owned by the TargetRef Controller, AND
40+
// 2. Match this Label Selector.
41+
//
42+
// +optional
43+
Selector *metav1.LabelSelector `json:"selector,omitempty"`
44+
}
45+
```
46+
47+
### Behavior
48+
Refinement Mode (TargetRef + Selector):
49+
50+
1. The VPA Updater/Recommender identifies the set of pods managed by the TargetRef (Current behavior).
51+
52+
2. It applies the Selector as a filter to this list.
53+
54+
Result: The VPA only generates recommendations and acts upon pods that pass the filter.
55+
56+
### Risks and Mitigations
57+
58+
#### Spec Divergence
59+
**Risk**: A Pod's role may not be known at creation time. Therefore, every new Pod inherits the Deployment pod spec and only promotes to "Leader" (and the corresponding VPA profile) after the application starts and wins an election.
60+
61+
**Consequence**: There is an unavoidable window between Election and VPA Actuation where the new Leader is running with default pod spec resources.
62+
63+
##### Mitigation
64+
65+
**Safe Baseline Requests**: Users must configure the Deployment.spec.template.resources.requests to be a "Safe Floor"—sufficient to handle the application's boot sequence and the initial election workload without crashing.
66+
67+
**Reactive Resizing**: The VPA Updater will detect the role=leader label transition and perform an In-Place Update to increase capacity to the "Leader Profile."
68+
69+
#### Conflicting Targets
70+
**Risk**: A user might configure two VPAs for the same TargetRef that overlap.
71+
72+
**VPA A**: `targetRef: app` (No selector = All pods)
73+
74+
**VPA B**: `targetRef: app, selector: role=leader`
75+
76+
**Result**: The Leader pod is managed by both.
77+
78+
**Mitigation**: The Admission Webhook must be updated.
79+
80+
**Current Rule**: "One VPA per TargetRef."
81+
82+
**New Rule**: "Multiple VPAs per TargetRef are allowed only if their Selectors are non-overlapping."
83+
84+
#### Recommender Data Sparsity
85+
**Risk**: Partitioning a set of pods reduces the sample size for the histogram (e.g., a single Leader pod).
86+
87+
**Mitigation**: The Recommender must aggregate historical samples at the VPA Object Level.
88+
89+
This ensures that when a Pod promotes to Leader, it inherits the "Leader Profile" history immediately, rather than starting with a cold cache.
90+
91+
#### Unmanaged Pods
92+
**Risk**: A pod might match the TargetRef but fail to match any VPA Selector.
93+
94+
**Result**: The pod receives no recommendations.
95+
96+
**Mitigation**: This is acceptable behavior. It allows users to intentionally exclude specific pods from autoscaling.
97+
98+
## Future Possibilities: Direct Lease Integration
99+
100+
We discussed the possibility of VPA directly watching `coordination.k8s.io/Lease` objects to detect Leader/Follower transitions.
101+
102+
This proposal (or something similar) will be required to support such a feature in VPA.
103+
104+
1. **The Foundation:** To support Leader/Follower scaling, the VPA core must first possess the ability to maintain separate metric histories and generate distinct recommendations for subsets of a single Controller. The `Selector` field provides this mechanism.
105+
2. **The Additive Layer:** Once `Selector` support is added, future enhancements could allow VPA to automatically infer these selectors by watching a Lease, or users can simply use a "Lease-to-Label" controller to bridge the gap without modifying VPA core further.

0 commit comments

Comments
 (0)