-
Notifications
You must be signed in to change notification settings - Fork 2
Add esrunner sample project with how-to guide. #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
data/esrunner_sample/README.md
Outdated
|
||
## Setting up your environment | ||
|
||
- Install `esrunner` (earth-system-run) in your development environment (Or clone the repository and add to your `PYTHONPATH`. If you go this route, ensure you install the packages listed in `earth-system-run/requirements.txt`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think anybody should clone the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you provide the exact command to install esrunner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it not be part of the project-specific requirements.txt, pointing to a git repo at a specific tag? that way when we're installing deps for the docker image we can reliably just say pip install rslp-projects/thingy/requirements.txt
(or requirements.frozen.txt).
They should probably be running a separate venv per project, so this doesn't seem like a hard sell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our conversation toady, we intend to continue to have one set of requirements for all of rslearn_projects
. We use one environment and build one Docker container for all projects. Projects (as in different applications for which we want to fine tune models) share the vast majority of requirements and those that are not shared are typically specific to individual model architectures like TerraMind vs OLMo-Earth so even then they would not be project-specific, unless "experiments to compare OLMo-Earth against TerraMind / DINOv2" is considered a project which is not really how we think about it. There are some requirements like prometheus-client that are only used for specific projects like vessel detection but I think the intention is to make things more consistent across projects e.g. using the same system for observability.
- `partition_strategies.yaml`: | ||
- `postprocessing_strategies.yaml`: This file defines how the esrunner will post-process the predictions. | ||
- `requirements.txt`: This file contains the additional Python packages required for the pipeline. It should include any dependencies that are not part of the base environment. | ||
- `prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period. Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `partition_strategies.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a directory of geojson files vs just a single feature collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came from a conversation with Henry where he mentioned wanting to be able to work with different input geometries. I figured providing a pattern for managing these different inputs would work better than saying "you must only have one input file".
## Setting up your environment | ||
|
||
- Install `esrunner` (earth-system-run) in your development environment (Or clone the repository and add to your `PYTHONPATH`. If you go this route, ensure you install the packages listed in `earth-system-run/requirements.txt`) | ||
- Following the project structure below, create a directory in the `rslearn-projects/data/` directory. This directory will contain all the necessary files for your prediction or fine-tuning pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're telling folks to have the rslp/data/{my_project}/
directory match this structure; why do we need to have each filename passed in? vs just passing in rslp/data/{my_project}/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of those things I want to validate with the ML folks. There are a lot of cases where they store a variety of model configs for different experiments in the same directory. I am assuming they will want to continue doing that to some degree. Perhaps a happy medium would be to read the prescribed names but also allow them to be overridden in the EsPredictionRunner.__init__()
for flexibility.
rslp/espredict_runner.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this appears to be the same as in esrun. why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i need to delete this. I had it here first and then copied to es run and forgot to remove this.
file = ["requirements.txt"] | ||
|
||
[tool.setuptools.dynamic.optional-dependencies] | ||
ai2 = { file = ["ai2_requirements.txt"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see ai2_requirements.txt here.
Can we chat with @StephenWithPH about the path forward here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it already existed. I'm just wiring it up to be accessible via pip install rslearn_projects[ai2]
. Totally agree on the Stephen thing.
34eab54
to
ae936d3
Compare
.github/workflows/build-esrunner.yml
Outdated
uses: actions/checkout@v4 | ||
with: | ||
repository: allenai/helios | ||
ref: josh/split-evals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HEY
ae936d3
to
691d32a
Compare
I think we should merge this but I'm not sure about mixing it with the requirements change. If we need to change the requirements, can we remove the part that prevents it from installing with Python 3.12+ and make it so rslearn[extra,dev] can be installed? Currently only extra appears in pyproject.toml and it is called all instead of extra, but may be more clear to match the name of the file. Also if this format is desired then ai2_requirements.txt should be renamed requirements-ai2.txt. |
@favyen2 I am probably going to close this one and reimplement from master so that its all clean. A lot has changed since I opened this so its probably best just to start fresh and build back up. I will put that on my list for today. |
Sounds good maybe can have separate PRs for adding the example versus building the Docker container for esrun (which may involve updates to how dependencies are split up). |
No description provided.