Skip to content

Add autoscaling module#250

Open
PizieDust wants to merge 13 commits into
mainfrom
scale_up
Open

Add autoscaling module#250
PizieDust wants to merge 13 commits into
mainfrom
scale_up

Conversation

@PizieDust

@PizieDust PizieDust commented May 29, 2026

Copy link
Copy Markdown
Collaborator

This module file implements the logic for monitoring cpu usage of the currently running unikernels and decides when to scale up or scale down.

Configuration

  • poll_interval: how frequently the system expects to evaluate the unikernel stats.
  • scale_up_threshold_percent: cpu load (e.g., 90.0%) that triggers a scale up.
  • scale_up_trigger_ticks: how many consecutive stats reports must exceed the scale up threshold before a clone is created, so we don't have false positives.
  • scale_down_threshold_percent: cpu load (e.g., 40.0%) that triggers a scale down event if we already had clones.
  • scale_down_trigger_ticks: how many consecutive stats reports must fall below the scale down threshold before a clone is destroyed.
  • cooldown_period: the grace period after any scaling action where the system pauses scaling to prevent rapid creation/destruction of clones.
  • death_timeout: the maximum time to wait for stats from a vm. If a vm stays silent longer than this, it is considered dead and pruned from the system.

Modules

Cpu_monitor

This module takes care of how we convert the raw stats from albatross into a float we can use against the thresholds to determine if to scale.

Cluster_manager

This module tracks "groups" (a primary unikernel and all of its clones) and evaluates their combined load.

  • get_or_create: finds an existing group for a primary unikernel or creates an empty one.
  • find_group_by_name: finds an existing group using the primary unikernel's name.
  • extract_name_and_clone_id: parses a vm name to separate the primary name from the clone ID
  • find_or_create_group: the main entry point when stats arrive. it parses the incoming vm's name to route the stats to the correct group.
  • next_clone_name: generates a unique name for a new clone based on the group's next_id.
  • register_clone: adds a newly created clone to the group and triggers the cooldown period.
  • remove_clone: removes a dead or intentionally destroyed clone from the group and triggers the cooldown period.
  • prune_dead_clusters: prunes individual dead clones or destroys the entire group if the primary dies.
  • in_cooldown: checks if the group recently had a scale operation and should ignore load spikes/drops.
  • check_group_average: calculates the average cpu load across all the active instances (the primary and all its active clones) to determine the true load of the cluster.
  • check_group_status: takes the average load and compares it against the thresholds to decide if the cluster should scale or not

@PizieDust PizieDust requested a review from reynir May 29, 2026 06:34
Comment thread autoscaler.ml
Comment thread autoscaler.ml
Comment thread autoscaler.ml Outdated
Comment thread autoscaler.ml
let pct = cpu_delta /. elasped_time_in_seconds *. 100.0 in
(* TODO: use numcpus to cap it at 100.0% if the vm has more than 1 cpu. Now most
vms use 1 cpu, so capping at 100% is fine. *)
Float.min 100.0 pct

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why floating point values are used all over here, and what would be wrong to use microseconds as int instead. But I guess we have other things to do than to argue about that.

Comment thread autoscaler.ml Outdated
Comment thread autoscaler.ml
Ptime.Span.to_float_s elasped_time_difference
in
if elasped_time_in_seconds <= 0.000001 then 0.0
else if cpu_delta < 0.0 then 0.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can this happen?

Comment thread autoscaler.ml
Comment on lines +54 to +58
type t = {
mutable monitor : Cpu_monitor.t;
mutable last_cpu_usage : float;
mutable last_stats_received : Ptime.t;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A type with all-mutable fields... this smells a bit... could this instead be pure immutable values, and you pass in/out a t (i.e. always construct a fresh one when you want to modify a field)?

Comment thread autoscaler.ml Outdated
Comment thread autoscaler.ml Outdated
Comment thread autoscaler.ml Outdated
Comment thread autoscaler.ml Outdated
Comment thread autoscaler.ml Outdated
Comment thread autoscaler.ml
m "[Cluster Manager] Invalid clone name '%s'." (fst clone));
Error "Invalid clone name"

let check_group_average group key now rusage =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to elaborate what this function is supposed to do, and what are the input arguments?

I had the impression from the name, it should compute the average CPU usage!?

But then, what does the key argument do? And why is there a if String.equal ...? What is the case for your group.primary :: group.clones that any element of this list is not name = key?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your comment. I'm still wondering what the actions should be and triggered when.

so, for a new measurement that arrives certainly we want to compute the cpu load.

now, the iterating over all clones and the primary, this should be done once when all measurements arrived, or? so, shouldn't 1 and 2-4 be separate? maybe once the primary measurement is received is the time when to compute the group average?

Comment thread autoscaler.ml Outdated
Comment thread autoscaler.ml
in
Ok (average_usage, state)

let check_group_status group key now rusage =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no clue what check_group_status is supposed to do with the input arguments that exceed the group? Why is there a key and a now and a rusage?

Comment thread autoscaler.ml Outdated
Logs.debug ~src:a_logs (fun m ->
m "[Cluster Manager] Pruning dead cluster: %s" key);
Hashtbl.remove clusters key)
dead_keys

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on matrix, I'm not a big fan of Hashtbl and background tasks. Can we design something that is a bit more robust and doesn't rely on "prune_dead_clusters"?

I don't quite understand the semantics. What should happen if a unikernel (the primary) disappears, which has been scaled up? It looks like it is then removed from the clusters hash table, but what happens with all the clones? Won't they be re-inserted? How's the code dealing with a "group" that doesn't have a "primary"?

Comment thread autoscaler.ml
let get_total_cpu_time (r : Vmm_core.Stats.rusage) =
let user_t = timeval_to_float r.utime in
let sys_t = timeval_to_float r.stime in
user_t +. sys_t

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's as well a runtime field in kinfo_mem. any reason why rusage is used here? (I'm curious, there's no need to change any code.)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so stats is type t = rusage * kinfo_mem option * vmm option * ifdata list, since kinfo_mem is optional, I decided to use rusage since i was sure it's always present

@hannesm

hannesm commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

I think this is fine,still I am missing some more high-level view of what should happen and what is being computed when and what is being kept in memory...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants