Skip to content

hubble-relay: relax podAffinity to preferred so Karpenter can drain the last node#459

Open
luca-rui wants to merge 2 commits into
mainfrom
cabbage/hubble-relay-preferred-affinity
Open

hubble-relay: relax podAffinity to preferred so Karpenter can drain the last node#459
luca-rui wants to merge 2 commits into
mainfrom
cabbage/hubble-relay-preferred-affinity

Conversation

@luca-rui

@luca-rui luca-rui commented May 19, 2026

Copy link
Copy Markdown

Summary

Switches hubble-relay's podAffinity from requiredDuringSchedulingIgnoredDuringExecution to preferredDuringSchedulingIgnoredDuringExecution. The upstream chart hard-requires hubble-relay to co-locate with a cilium-agent pod, which blocks Karpenter from draining the last cilium-agent-bearing node during cluster upgrades or consolidation.

Why

Since the Karpenter migration, cluster upgrades on several workload clusters have been hanging on the final node — SREs have had to manually drain it (or kill the hubble-relay pod) to let the upgrade complete.

Discussed internally with team-cabbage; no apparent reason for the affinity to be hard-required. preferred keeps the co-location intent under normal operation but lets the scheduler place relay elsewhere when the only remaining cilium-agent node is being drained.

Change

In sync/patches/values/values.yaml.tmpl (and the synced helm/cilium/values.yaml.tmpl / rendered helm/cilium/values.yaml):

hubble:
  relay:
    affinity:
      podAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  k8s-app: cilium

diffs/helm__cilium__values.yaml.tmpl.patch regenerated against the existing vendored upstream (verified by reverse-applying the old patch to reconstruct the upstream, then re-diffing).

Test plan

  • CI validate-sync-diffs passes (regenerated diffs match committed)
  • Deploy to a test workload cluster, drain a non-cilium-bearing node, confirm hubble-relay schedules elsewhere when needed
  • Confirm on an affected cluster that upgrade/consolidation no longer hangs on the final node
  • Verify hubble-relay still lands next to a cilium agent in steady state (preferred should keep this behaviour)

Karpenter cannot drain the last cilium-agent-bearing node during cluster
upgrades / consolidation because the upstream chart requires hubble-relay
to be co-located with a cilium agent. With Karpenter trying to remove the
final node, the drain hangs.

Switching to preferredDuringSchedulingIgnoredDuringExecution keeps the
co-location preference under normal operation but lets the scheduler
place hubble-relay elsewhere when the only cilium-agent node is being
drained.

Observed on multiple workload clusters since the Karpenter migration.
@luca-rui luca-rui force-pushed the cabbage/hubble-relay-preferred-affinity branch from ad88023 to 14675b4 Compare May 19, 2026 12:10
Applied straight from validate-sync-diffs CI output; equivalent to
running `cd ./helm && make` locally.
@luca-rui luca-rui marked this pull request as ready for review May 19, 2026 13:12
@luca-rui luca-rui requested a review from a team as a code owner May 19, 2026 13:12
@luca-rui

Copy link
Copy Markdown
Author

/run app-test-suites

@pipo02mix

Copy link
Copy Markdown

From human review (not agent involved) the change is fine and I dont see any problem changing to preferred

@mcharriere

mcharriere commented May 20, 2026

Copy link
Copy Markdown
Contributor

I'm not sure if I understand what the problem is, it'd be nice to have more context and the discussions concluding that this is the right solution.

which blocks Karpenter from draining the last cilium-agent-bearing node during cluster upgrades or consolidation.

This describes an scenario that in our clusters (unless it's cluster deletion) it would never happen. There's no last cilium-agent-bearing node. Every node has the cilium-agent, regardless of the version. So if hubble-relay has to be scheduled somewhere else, upgraded nodes should be apt under these conditions.

Since the Karpenter migration, cluster upgrades on several workload clusters have been hanging on the final node — SREs have had to manually drain it (or kill the hubble-relay pod) to let the upgrade complete.

Does this mean that the E2E tests are not covering this scenario (Upgrade+Karpenter)? I think we would have caught this in such test.


Now, regarding the problem please consider:

  • IIRC affinity doesn't interfere with pod eviction. They are 2 separated processes and affinity is for scheduling.
  • Hubble-relay has no PDB because we run only 1 pod of it.
  • We already have the annotation cluster-autoscaler.kubernetes.io/safe-to-evict: "true". Is karpenter honoring that or they have their own?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants