Conversation
| let pct = cpu_delta /. elasped_time_in_seconds *. 100.0 in | ||
| (* TODO: use numcpus to cap it at 100.0% if the vm has more than 1 cpu. Now most | ||
| vms use 1 cpu, so capping at 100% is fine. *) | ||
| Float.min 100.0 pct |
There was a problem hiding this comment.
I don't quite understand why floating point values are used all over here, and what would be wrong to use microseconds as int instead. But I guess we have other things to do than to argue about that.
| Ptime.Span.to_float_s elasped_time_difference | ||
| in | ||
| if elasped_time_in_seconds <= 0.000001 then 0.0 | ||
| else if cpu_delta < 0.0 then 0.0 |
| type t = { | ||
| mutable monitor : Cpu_monitor.t; | ||
| mutable last_cpu_usage : float; | ||
| mutable last_stats_received : Ptime.t; | ||
| } |
There was a problem hiding this comment.
A type with all-mutable fields... this smells a bit... could this instead be pure immutable values, and you pass in/out a t (i.e. always construct a fresh one when you want to modify a field)?
| m "[Cluster Manager] Invalid clone name '%s'." (fst clone)); | ||
| Error "Invalid clone name" | ||
|
|
||
| let check_group_average group key now rusage = |
There was a problem hiding this comment.
Would you mind to elaborate what this function is supposed to do, and what are the input arguments?
I had the impression from the name, it should compute the average CPU usage!?
But then, what does the key argument do? And why is there a if String.equal ...? What is the case for your group.primary :: group.clones that any element of this list is not name = key?
There was a problem hiding this comment.
thanks for your comment. I'm still wondering what the actions should be and triggered when.
so, for a new measurement that arrives certainly we want to compute the cpu load.
now, the iterating over all clones and the primary, this should be done once when all measurements arrived, or? so, shouldn't 1 and 2-4 be separate? maybe once the primary measurement is received is the time when to compute the group average?
| in | ||
| Ok (average_usage, state) | ||
|
|
||
| let check_group_status group key now rusage = |
There was a problem hiding this comment.
I've no clue what check_group_status is supposed to do with the input arguments that exceed the group? Why is there a key and a now and a rusage?
| Logs.debug ~src:a_logs (fun m -> | ||
| m "[Cluster Manager] Pruning dead cluster: %s" key); | ||
| Hashtbl.remove clusters key) | ||
| dead_keys |
There was a problem hiding this comment.
As discussed on matrix, I'm not a big fan of Hashtbl and background tasks. Can we design something that is a bit more robust and doesn't rely on "prune_dead_clusters"?
I don't quite understand the semantics. What should happen if a unikernel (the primary) disappears, which has been scaled up? It looks like it is then removed from the clusters hash table, but what happens with all the clones? Won't they be re-inserted? How's the code dealing with a "group" that doesn't have a "primary"?
| let get_total_cpu_time (r : Vmm_core.Stats.rusage) = | ||
| let user_t = timeval_to_float r.utime in | ||
| let sys_t = timeval_to_float r.stime in | ||
| user_t +. sys_t |
There was a problem hiding this comment.
there's as well a runtime field in kinfo_mem. any reason why rusage is used here? (I'm curious, there's no need to change any code.)
There was a problem hiding this comment.
so stats is type t = rusage * kinfo_mem option * vmm option * ifdata list, since kinfo_mem is optional, I decided to use rusage since i was sure it's always present
|
I think this is fine,still I am missing some more high-level view of what should happen and what is being computed when and what is being kept in memory... |
This module file implements the logic for monitoring cpu usage of the currently running unikernels and decides when to scale up or scale down.
Configuration
poll_interval: how frequently the system expects to evaluate the unikernel stats.scale_up_threshold_percent: cpu load (e.g.,90.0%) that triggers a scale up.scale_up_trigger_ticks: how many consecutive stats reports must exceed the scale up threshold before a clone is created, so we don't have false positives.scale_down_threshold_percent: cpu load (e.g.,40.0%) that triggers a scale down event if we already had clones.scale_down_trigger_ticks: how many consecutive stats reports must fall below the scale down threshold before a clone is destroyed.cooldown_period: the grace period after any scaling action where the system pauses scaling to prevent rapid creation/destruction of clones.death_timeout: the maximum time to wait for stats from a vm. If a vm stays silent longer than this, it is considered dead and pruned from the system.Modules
Cpu_monitorThis module takes care of how we convert the raw stats from albatross into a float we can use against the thresholds to determine if to scale.
Cluster_managerThis module tracks "groups" (a primary unikernel and all of its clones) and evaluates their combined load.
get_or_create: finds an existing group for a primary unikernel or creates an empty one.find_group_by_name: finds an existing group using the primary unikernel's name.extract_name_and_clone_id: parses a vm name to separate the primary name from the clone IDfind_or_create_group: the main entry point when stats arrive. it parses the incoming vm's name to route the stats to the correct group.next_clone_name: generates a unique name for a new clone based on the group'snext_id.register_clone: adds a newly created clone to the group and triggers the cooldown period.remove_clone: removes a dead or intentionally destroyed clone from the group and triggers the cooldown period.prune_dead_clusters: prunes individual dead clones or destroys the entire group if the primary dies.in_cooldown: checks if the group recently had a scale operation and should ignore load spikes/drops.check_group_average: calculates the average cpu load across all the active instances (the primary and all its active clones) to determine the true load of the cluster.check_group_status: takes the average load and compares it against the thresholds to decide if the cluster should scale or not