Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions vertical-pod-autoscaler/enhancements/XXXX-restore-selector-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# AEP-XXXX: Restore Label Selector Support to VPA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the AEP title should change to reflect that this proposal aims to improve/solve vertical autoscaling for heterogeneous workloads when Deployments and StatefulSets controllers are used. Restoring label selector support is only one part of the solution.

so please make the intention of the AEP in its title more explicit, thanks!


## Summary

This proposal enhances the VerticalPodAutoscalerSpec to allow the co-existence of TargetRef and Selector (present in v1beta1 but removed in v1). When both are present, the Selector acts as a filter applied to the Pods managed by the TargetRef. This allows multiple VPA objects to manage disjoint subsets of a single Controller's pods (e.g., partitioning a StatefulSet/Deployment into Leaders and Followers).

## Motivation

Currently, VPA relies exclusively on `targetRef` to identify Pods. This enforces a 1:1 relationship between the VPA and a Workload Controller (Deployment, StatefulSet, etc.).

While this is sufficient for stateless workloads, it fails for **Heterogeneous Stateful Workloads** where pods in the same controller perform different roles with different resource footprints.

**The Problem: Leader vs. Follower**
VPA aggregates metrics from all Pods in the target controller into a single histogram. This averages the usage of "high-utilization" (Leader) and "low-utilization" (Follower) pods.

**The Solution:**
By restoring the `selector` field, users can partition a single workload into multiple VPA profiles based on the Pod's current state:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry a little about restoring a field that was once there, I don't know if there are API considerations that need to be made

1. **VPA-Leader:** Selects `role=leader`
2. **VPA-Follower:** Selects `role=follower`

When a Pod promotes from Follower to Leader, its label changes, and it effectively migrates from the Follower VPA to the Leader VPA instantly.

## Proposal

Modify the `VerticalPodAutoscalerSpec` to re-introduce `Selector` as an optional field. This field was present in `v1beta1` but removed in `v1`.

### API Spec

```go
type VerticalPodAutoscalerSpec struct {
// TargetRef points to the controller managing the set of pods.
// + required
TargetRef *autoscaling.CrossVersionObjectReference `json:"targetRef"`

// [PROPOSED]
// A label query that further restricts the set of pods controlled by the Autoscaler.
//
// If provided, the VPA manages only the subset of pods that:
// 1. Are owned by the TargetRef Controller, AND
// 2. Match this Label Selector.
//
// +optional
Selector *metav1.LabelSelector `json:"selector,omitempty"`
}
```

### Behavior
Refinement Mode (TargetRef + Selector):

1. The VPA Updater/Recommender identifies the set of pods managed by the TargetRef (Current behavior).

2. It applies the Selector as a filter to this list.

Result: The VPA only generates recommendations and acts upon pods that pass the filter.

### Risks and Mitigations

#### Spec Divergence
**Risk**: A Pod's role may not be known at creation time. Therefore, every new Pod inherits the Deployment pod spec and only promotes to "Leader" (and the corresponding VPA profile) after the application starts and wins an election.

**Consequence**: There is an unavoidable window between Election and VPA Actuation where the new Leader is running with default pod spec resources.

##### Mitigation

**Safe Baseline Requests**: Users must configure the Deployment.spec.template.resources.requests to be a "Safe Floor"—sufficient to handle the application's boot sequence and the initial election workload without crashing.

**Reactive Resizing**: The VPA Updater will detect the role=leader label transition and perform an In-Place Update to increase capacity to the "Leader Profile."

#### Conflicting Targets
**Risk**: A user might configure two VPAs for the same TargetRef that overlap.

**VPA A**: `targetRef: app` (No selector = All pods)

**VPA B**: `targetRef: app, selector: role=leader`

**Result**: The Leader pod is managed by both.

**Mitigation**: The Admission Webhook must be updated.

**Current Rule**: "One VPA per TargetRef."

**New Rule**: "Multiple VPAs per TargetRef are allowed only if their Selectors are non-overlapping."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we have 2 VPAs, one with selector of role=primary and another with purpose=web, and a pod were created with both of these labels, that means that both VPAs match that pod.


#### Recommender Data Sparsity
**Risk**: Partitioning a set of pods reduces the sample size for the histogram (e.g., a single Leader pod).

**Mitigation**: The Recommender must aggregate historical samples at the VPA Object Level.

This ensures that when a Pod promotes to Leader, it inherits the "Leader Profile" history immediately, rather than starting with a cold cache.

#### Unmanaged Pods
**Risk**: A pod might match the TargetRef but fail to match any VPA Selector.

**Result**: The pod receives no recommendations.

**Mitigation**: This is acceptable behavior. It allows users to intentionally exclude specific pods from autoscaling.

## Future Possibilities: Direct Lease Integration

We discussed the possibility of VPA directly watching `coordination.k8s.io/Lease` objects to detect Leader/Follower transitions.

This proposal (or something similar) will be required to support such a feature in VPA.

1. **The Foundation:** To support Leader/Follower scaling, the VPA core must first possess the ability to maintain separate metric histories and generate distinct recommendations for subsets of a single Controller. The `Selector` field provides this mechanism.
2. **The Additive Layer:** Once `Selector` support is added, future enhancements could allow VPA to automatically infer these selectors by watching a Lease, or users can simply use a "Lease-to-Label" controller to bridge the gap without modifying VPA core further.