[ARCHIVED] Registry + leaderboard design v1 #139
Replies: 3 comments 5 replies
-
Some (raw) notes from my brainstorm session with @andyk on 9/10/2021:
|
Beta Was this translation helpful? Give feedback.
-
In the last couple of days I cloned and explored the aos_web repo w/ the prototype web app, looked at the current prototype port of the runtime to work with it, and started thinking about what changes we might need to make to update the registry + leaderboard architecture to support the new component system (#148). I started drafting some ideas for a v2 design of the registry in this google doc. |
Beta Was this translation helpful? Give feedback.
-
@nickjalbert I renamed this design discussion to include "v1". I figure that we can use versions our design discussions as we iterate on core project elements to keep the conversation threads easier to manage |
Beta Was this translation helpful? Give feedback.
-
This thread will track ideas related to the registry and leaderboard web services.
Executive Summary
We built a prototype web service that allows us to publish the performance of an agent and then download and reproduce those results all from the command-line. The web service integrates the AgentOS registry and allows us to associate agent components with agent performance in particular environments.
Top of Mind and Next Steps
Desired functionality
From Andy's slides, the vision of the AgentOS web service is to combine ideas from PyPI, MLflow, Hugging Face, Kaggle, and Papers with Code into a website that will build a community around the AgentOS platform. Specific high-level features of these services that we find compelling:
Some lower level features that we wanted to investigate in this first prototype:
AgentOS Web Service Prototype
We built out a prototype web service to get a better sense of the design challenges we face in this line of work. Some artifacts:
Mentions of the AgentOS CLI in the following sections refer to the CLI developed in tandem with the AgentOS web service (on
nj_leaderboard
branch in Nick's AgentOS fork) unless otherwise noted.Registry
The AgentOS registry stores information about the various components that can compose an AgentOS agent. This information allows the CLI to download these components and incorporate them into agents.
In the prototype, the registry of components (Agents, Environments, Policies, Trainers, Datasets, etc) was extracted from registry.yaml in the AgentOS master branch and placed into the web service's database across two tables (Component, ComponentRelease) with the following structure:
When
agentos install [component]
is called from the command-line, the registry is pulled down from the web service and the information about the requested component is extracted from the registry and the install proceeds as in AgentOS master.Registry future work
Benchmark Runs
AgentOS web service is designed to make it easy to share benchmarks and the agents that run against those benchmarks. The idea is to combine the model hosting and sharing enabled by Hugging Face with the competitive aspects of Kaggle. To that end, the prototype introduces the concept of a benchmark run which captures the performance of a particular trained agent against a particular environment.
The AgentOS CLI now distinguishes between training runs (i.e. those runs initiated by executing
agentos learn ...
) and benchmark runs (i.e. those runs initiated by executingagentos run ...
). A training run is one in which the agent's experience is recorded and the Trainer is called on to improve the agent's policy. A benchmark run, on the other hand, does not involve any policy improvement--it is simply a way to assess the agent's current competence in its environment--so experience is not recorded and the agent's policy is not update but metrics related to the agent's performance are recorded.The envisioned use case is that an agent developer will alternate between training sessions and benchmarking sessions to track the improvement in their agent. After a while, the history of a given agent will look similar to the following:
The AgentOS CLI offers a new command
agentos publish
that performs the following actions to share and make a benchmark run reproducible:agentos.ini
fileAll of the above get packaged up (including artifacts used by the agent during the benchmark run) and shipped to the AgentOS web service via two API calls: The first API call creates a record of the benchmark run by inserting a row into the Run table with the following schema:
The Run row associates the benchmark results with each component that constituted the agent and thus allows us to associate benchmark runs with particular components.
The second API call uploads a zipped tarball of all the backing data (e.g. trained neural nets, etc) required to reproduce the benchmark run and associates that data with the previously created Run object.
Making runs reproducible
Our prototype also implements an
agentos get [benchmark run number] [local directory]
command that pulls down the data for the chosen benchmark run from the AgentOS web service and recreates the agent that generated that benchmark run in the local directory.This command downloads the zipped tarball that is associated with the chosen benchmark run. This tarball contains the following files:
agentos.ini
file specifying the components constituting the agentparameters.yaml
that captures the hyperparameters that were passed to the agentrun_data.yaml
file that contains info on the performance of the agentdata/
directory that contains all the backing artifacts required to run the agent (e.g. a trained neural net)This tarball is extracted and untarred and then:
agentos.ini
file into the local directoryagentos.ini
fileparameters.yaml
file (However, currently we don't respect these updated hyperparameters; see future work)data/
directory into the new agent's data directoryThe local agent is now in an identical state to the agent that generated the benchmark run results and can be run with
agentos run
.Benchmark runs future work
get
and agent, train it for one episode, and then re-upload it, it seems like the system should somehow track that you didn't train your agent from scratch. Alternatively, you may want private and public agents.ComponentRelease
and notComponent
BenchmarkRun
once we've settled on terminologyparameters.yaml
should override all hyperparameters in a runCommand-line interface
We've discussed the changes to the AgentOS CLI in the previous sections, but in summary:
agentos publish
publishes a benchmark run to the AgentOS web service.agentos get ...
reconstitutes an agent used to generated a benchmark run listed on the AgentOS web service.MLflow
The current prototype half-way commits to using MLflow Tracking and does not use MLflow Projects.
We only half use MLflow Tracking because we rolled our own system for managing agent backing data (e.g. neural nets) when we first prototyped the componentized version of AgentOS, and it seemed low priority to port everything over to MLflow Tracking for this new web service prototyping exercise. Current MLflow Tracking usage is concentrated in the
run_agent()
function. However, I think a full port to MLflow Tracking will be helpful for both correctness and consistency in the runtime.The usage of MLflow Projects requires further investigation but it may help us handle dependencies in a cleaner way.
MLflow future work
High-level Thoughts
Beta Was this translation helpful? Give feedback.
All reactions