Skip to content

Conversation

joshhvulcan
Copy link
Contributor

No description provided.


## Setting up your environment

- Install `esrunner` (earth-system-run) in your development environment (Or clone the repository and add to your `PYTHONPATH`. If you go this route, ensure you install the packages listed in `earth-system-run/requirements.txt`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think anybody should clone the repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide the exact command to install esrunner.

Copy link

@cmwilhelm cmwilhelm Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it not be part of the project-specific requirements.txt, pointing to a git repo at a specific tag? that way when we're installing deps for the docker image we can reliably just say pip install rslp-projects/thingy/requirements.txt (or requirements.frozen.txt).

They should probably be running a separate venv per project, so this doesn't seem like a hard sell?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our conversation toady, we intend to continue to have one set of requirements for all of rslearn_projects. We use one environment and build one Docker container for all projects. Projects (as in different applications for which we want to fine tune models) share the vast majority of requirements and those that are not shared are typically specific to individual model architectures like TerraMind vs OLMo-Earth so even then they would not be project-specific, unless "experiments to compare OLMo-Earth against TerraMind / DINOv2" is considered a project which is not really how we think about it. There are some requirements like prometheus-client that are only used for specific projects like vessel detection but I think the intention is to make things more consistent across projects e.g. using the same system for observability.

- `partition_strategies.yaml`:
- `postprocessing_strategies.yaml`: This file defines how the esrunner will post-process the predictions.
- `requirements.txt`: This file contains the additional Python packages required for the pipeline. It should include any dependencies that are not part of the base environment.
- `prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period. Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `partition_strategies.yaml`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a directory of geojson files vs just a single feature collection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from a conversation with Henry where he mentioned wanting to be able to work with different input geometries. I figured providing a pattern for managing these different inputs would work better than saying "you must only have one input file".

## Setting up your environment

- Install `esrunner` (earth-system-run) in your development environment (Or clone the repository and add to your `PYTHONPATH`. If you go this route, ensure you install the packages listed in `earth-system-run/requirements.txt`)
- Following the project structure below, create a directory in the `rslearn-projects/data/` directory. This directory will contain all the necessary files for your prediction or fine-tuning pipeline.
Copy link
Contributor

@hunterp hunterp Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're telling folks to have the rslp/data/{my_project}/ directory match this structure; why do we need to have each filename passed in? vs just passing in rslp/data/{my_project}/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those things I want to validate with the ML folks. There are a lot of cases where they store a variety of model configs for different experiments in the same directory. I am assuming they will want to continue doing that to some degree. Perhaps a happy medium would be to read the prescribed names but also allow them to be overridden in the EsPredictionRunner.__init__() for flexibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to be the same as in esrun. why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i need to delete this. I had it here first and then copied to es run and forgot to remove this.

file = ["requirements.txt"]

[tool.setuptools.dynamic.optional-dependencies]
ai2 = { file = ["ai2_requirements.txt"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see ai2_requirements.txt here.

Can we chat with @StephenWithPH about the path forward here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it already existed. I'm just wiring it up to be accessible via pip install rslearn_projects[ai2]. Totally agree on the Stephen thing.

@joshhvulcan joshhvulcan force-pushed the josh/sample-esrunner branch from 34eab54 to ae936d3 Compare August 8, 2025 00:20
uses: actions/checkout@v4
with:
repository: allenai/helios
ref: josh/split-evals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEY

@joshhvulcan joshhvulcan force-pushed the josh/sample-esrunner branch from ae936d3 to 691d32a Compare August 13, 2025 18:33
@favyen2
Copy link
Collaborator

favyen2 commented Aug 21, 2025

I think we should merge this but I'm not sure about mixing it with the requirements change. If we need to change the requirements, can we remove the part that prevents it from installing with Python 3.12+ and make it so rslearn[extra,dev] can be installed? Currently only extra appears in pyproject.toml and it is called all instead of extra, but may be more clear to match the name of the file. Also if this format is desired then ai2_requirements.txt should be renamed requirements-ai2.txt.

@joshhvulcan
Copy link
Contributor Author

@favyen2 I am probably going to close this one and reimplement from master so that its all clean. A lot has changed since I opened this so its probably best just to start fresh and build back up. I will put that on my list for today.

@favyen2
Copy link
Collaborator

favyen2 commented Aug 21, 2025

Sounds good maybe can have separate PRs for adding the example versus building the Docker container for esrun (which may involve updates to how dependencies are split up).

@favyen2 favyen2 closed this pull request by merging all changes into master in 2e8f28c Aug 26, 2025
@favyen2 favyen2 deleted the josh/sample-esrunner branch August 26, 2025 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants