Replies: 1 comment 9 replies
-
|
We wanted to use https://github.com/kubernetes/autoscaler |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm building
[omni-infra-provider-truenas](https://github.com/bearbinary/omni-infra-provider-truenas)which is an infrastructure provider that provisions Talos Linux VMs on TrueNAS SCALE via its JSON-RPC 2.0 API.Because TrueNAS virtualizes hardware on demand (zvols + VMs), it behaves more like a cloud provider than a static bare metal pool.
So We can actually create new machines dynamically in response to cluster pressure.
This puts us in an interesting position: MachineClasses + autoprovision already handles the fulfillment side beautifully, but there's a gap on the trigger side for pressure-based autoscaling (i.e. unschedulable pods → new node).
The Problem
Omni's model is intentionally declarative. You set MachineSet replica counts and Omni fulfills them.
That's great for GitOps, but doesn't close the loop on pod-scheduling pressure automatically. Classic Cluster Autoscaler solves this for CAPI via
MachineDeploymentannotations, but that integration doesn't exist for Omni's infrastructure provider model.Since our provider runs outside the cluster (alongside Omni, not inside K8s), we can't simply deploy an in-cluster controller to watch for
FailedSchedulingevents and bump replica counts. The options I have looked at identified:Option A: Embed a K8s watcher goroutine in the provider binary
The provider uses the Omni-vended kubeconfig to watch
v1.EventforFailedScheduling, then calls the Omni API to scale the MachineSet replica count. No in-cluster dependency, single binary.However, this feels a bit like bolting two concerns onto the provider, and this gave me pause.
Option B: Karpenter custom cloud provider
Implement Karpenter's
CloudProviderinterface whereCreate()calls the Omni API → MachineRequest → provider → TrueNAS. Karpenter runs inside the cluster, handles bin-packing and scheduling pressure natively. Build cost is high and introduces an in-cluster dependency.Option C: Something in Omni itself???
Does Omni have, or are there plans for, a native scheduling-pressure signal that could trigger MachineSet scaling via the infrastructure provider API? Something like a
ScalingPolicyresource that watches for unschedulable pods and adjusts replica counts automatically?Questions for the Team
We're happy to prototype whichever direction the team thinks is most aligned with Omni's architecture. Linking the provider repo in case it's a useful context for the discussion.
Reference:
[bearbinary/omni-infra-provider-truenas](https://github.com/bearbinary/omni-infra-provider-truenas)— TrueNAS SCALE infrastructure provider (JSON-RPC 2.0, ZFS-native, SideroLink)Beta Was this translation helpful? Give feedback.
All reactions