Skip to content

Commit 34eab54

Browse files
committed
Update to working state
1 parent 2b73f1f commit 34eab54

File tree

9 files changed

+511
-409
lines changed

9 files changed

+511
-409
lines changed

data/esrunner_sample/.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,6 @@
1+
.python-version
2+
checkpoint.ckpt
3+
14
scratch/
5+
earth-system-run/
6+
lightning_logs

data/esrunner_sample/README.md

Lines changed: 120 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,25 @@ ESRunner provides the [EsPredictRunner](https://github.com/allenai/earth-system-
88

99
- Install `esrunner` (earth-system-run) in your development environment.
1010
```
11-
pip install earth-system-run @ git+https://github.com/allenai/earth-system-run.git
11+
pip install earth-system-run[runner] @ git+https://github.com/allenai/earth-system-run.git@v1-develop
1212
```
13+
14+
- or create a virtual environment using `pyenv` and `pyenv-virtualenv`:
15+
```
16+
pyenv virtualenv 3.11 esrunner-project
17+
pyenv local esrunner-project
18+
git clone [email protected]:allenai/earth-system-run.git earth-system-run
19+
pip3 install ./earth-system-run/[runner]
20+
```
21+
1322
- Following the project structure below, create a directory in the `rslearn-projects/data/` directory. This directory will contain all the necessary files for your prediction or fine-tuning pipeline.
1423

1524
## Project Structure
1625
- `checkpoint.ckpt`: (Optional)
1726
- `dataset.json`: This is the rslearn dataset definition file.
1827
- `model.yaml`: This is the rslearn (pytorch) model definition file.
19-
- `partition_strategies.yaml`:
20-
- `postprocessing_strategies.yaml`: This file defines how the esrunner will post-process the predictions.
21-
- `requirements.txt`: This file contains the additional Python packages required for the pipeline. It should include any dependencies that are not part of the base environment.
22-
- `prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period. Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `partition_strategies.yaml`.
28+
- `esrun.yaml`: This file defines the esrun model configuration. It defines partitioning and post-processing strategies.
29+
- `prediction_request_geometry.geojson`: The prediction request GeoJSON feature collection.
2330
- `run_pipeline.py`: This script is used to run the prediction pipeline. It will read the configuration files and execute the necessary steps to perform predictions or fine-tuning. You can customize this script to suit your specific needs, such as adding additional logging or error handling.
2431

2532
## Partitioning Strategies
@@ -33,18 +40,28 @@ Partitioning strategies can be mixed and matched for flexible development.
3340
Available partitioners:
3441
- `FixedWindowPartitioner` - Given a fixed window size, this partitioner will create partitions of that size for each lat/lon or polygon centroid in the prediction request.
3542
- `GridPartitioner` - Given a grid size, this partitioner will create partitions based on the grid cells that intersect with the prediction request.
36-
- NoopPartitioner - Does not partition the prediction request. This is useful for testing or when you want to run the entire prediction request as a single task.
43+
- `NoopPartitioner` - Does not partition the prediction request. This is useful for testing or when you want to run the entire prediction request as a single task.
44+
45+
Example `esrun.yaml` snippet. This will leave the original input as a single partition, but will create individual windows of size 128x128 pixels for each feature.
3746

38-
Example `partition_strategies.yaml`. This will leave the original input as a single partition, but will create individual windows of size 128x128 pixels for each feature.
3947
```yaml
40-
partition_request_geometry:
41-
class_path: esrun.tools.partitioners.noop_partitioner.NoopPartitioner
42-
init_args:
43-
44-
prepare_window_geometries:
45-
class_path: esrun.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
46-
init_args:
47-
window_size: 128 # intended to be a pixel value
48+
partition_strategies:
49+
partition_request_geometry:
50+
class_path: esrun.runner.tools.partitioners.noop_partitioner.NoopPartitioner
51+
52+
prepare_window_geometries:
53+
class_path: esrun.runner.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
54+
init_args:
55+
window_size: 128
56+
output_projection:
57+
class_path: rslearn.utils.Projection
58+
init_args:
59+
crs:
60+
_module_: rasterio.crs
61+
_callable_: CRS.from_epsg
62+
code: 3857
63+
x_resolution: 9.554628535647032 # PIXEL_SIZE in rslp/forest_loss_driver/extract_dataset/extract_alerts.py
64+
y_resolution: -9.554628535647032
4865
```
4966
5067
## Post-Processing Strategies
@@ -53,101 +70,134 @@ There are 3 different stages to postprocessing:
5370
- `postprocess_partition()` - This is the stage where the outputs from the window postprocessors are combined into a single per-partition artifact.
5471
- `postprocess_dataset()` - This is the final stage of postprocessing where the partition level outputs are combined into a artifact.
5572

73+
Example: This uses the `CombineGeojson` postprocessor to combine the outputs from the window postprocessors into a single GeoJSON file for each partition and the entire dataset.
74+
75+
```yaml
76+
postprocessing_strategies:
77+
process_window:
78+
class_path: esrun.runner.tools.postprocessors.combine_geojson.CombineGeojson
79+
80+
process_partition:
81+
class_path: esrun.runner.tools.postprocessors.combine_geojson.CombineGeojson
82+
83+
process_dataset:
84+
class_path: esrun.runner.tools.postprocessors.combine_geojson.CombineGeojson
85+
```
86+
5687
## Samples
5788

89+
### Running the sample in this project
90+
91+
Setup Your environment as described above, and copy a forest loss checkpoint into the project directory.
92+
93+
```
94+
gsutil cp gs://earth-system-run-dev/models/1b82c096-9ba2-424f-a1c9-86d2a8986b7d/stage_0/checkpoint.ckpt checkpoint.ckpt
95+
```
96+
97+
Run the full pipeline.
98+
99+
```
100+
python3 run_pipeline.py
101+
```
102+
103+
The `scratch/results/result.geojson` file should contain the results of the prediction request. The scratch/dataset directory should contain the materialized dataset for the prediction request.
104+
58105
### Run a pipeline end-to-end
59106
60107
```python file=run_pipeline.py
61-
from rslp.espredict_runner import EsPredictRunner
62-
63-
runner = EsPredictRunner(
64-
'model.yaml',
65-
'dataset.json',
66-
'partition_strategies.yaml',
67-
'postprocessing_strategies.yaml',
68-
'prediction/test-request1.geojson',
69-
scratch_path='scratch/'
70-
)
108+
from pathlib import Path
109+
110+
from esrun.runner.local.predict_runner import EsPredictRunner
111+
112+
CONFIG_PATH = Path(__file__).parent
113+
114+
runner = EsPredictRunner(project_path=CONFIG_PATH,
115+
scratch_path=CONFIG_PATH / 'scratch')
116+
71117
partitions = runner.partition()
72118
for partition_id in partitions:
119+
print(f"Processing partition: {partition_id}")
73120
runner.build_dataset(partition_id)
74121
runner.run_inference(partition_id)
75122
runner.postprocess(partition_id)
76-
77123
runner.combine(partitions)
78124
```
79125

80126
### Run dataset building for the entire prediction request.
81127
```python file=run_dataset_building.py
82-
from rslp.espredict_runner import EsPredictRunner
83-
84-
runner = EsPredictRunner(
85-
'model.yaml',
86-
'dataset.json',
87-
'partition_strategies.yaml',
88-
'postprocessing_strategies.yaml',
89-
'prediction/test-request1.geojson',
90-
scratch_path='scratch/'
91-
)
92-
93-
for partition_id in runner.partition():
128+
from pathlib import Path
129+
from esrun.runner.local.predict_runner import EsPredictRunner
130+
131+
CONFIG_PATH = Path(__file__).parent
132+
133+
runner = EsPredictRunner(project_path=CONFIG_PATH,
134+
scratch_path=CONFIG_PATH / 'scratch')
135+
136+
partitions = runner.partition()
137+
for partition_id in partitions:
138+
print(f"Processing partition: {partition_id}")
94139
runner.build_dataset(partition_id)
95140
```
96141

97142
### Run inference for a single partition.
98143
(Assumes you have an existing materialized dataset for the partition.)
99144
```python file=run_inference_single_partition.py
100-
from rslp.espredict_runner import EsPredictRunner
101-
102-
runner = EsPredictRunner(
103-
'model.yaml',
104-
'dataset.json',
105-
'partition_strategies.yaml',
106-
'postprocessing_strategies.yaml',
107-
'prediction/test-request1.geojson',
108-
scratch_path='scratch/'
109-
)
145+
from pathlib import Path
146+
from esrun.runner.local.predict_runner import EsPredictRunner
147+
148+
CONFIG_PATH = Path(__file__).parent
149+
150+
runner = EsPredictRunner(project_path=CONFIG_PATH,
151+
scratch_path=CONFIG_PATH / 'scratch')
152+
110153
partition_id = 'my-existing-partition-id' # Replace with the actual partition ID you want to run
111154
runner.run_inference(partition_id)
112155
```
113156

114157
### Run inference for a single window.
115158
Since we don't expose window-level inference via the runner API, you can configure your partitioners to produce limited sets of partitions and windows.
116159

117-
```yaml file=partition_strategies.yaml
118-
strategy_large:
119-
class_path: esrun.tools.partitioners.noop_partitioner.NoopPartitioner
120-
init_args:
121-
122-
strategy_small:
123-
class_path: esrun.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
124-
init_args:
125-
window_size: 128 # intended to be a pixel value
126-
limit: 1 # This will limit window generation to a single window per large partition, effectively allowing you to run inference on a single window.
160+
```yaml file=esrun.yaml
161+
partition_strategies:
162+
partition_request_geometry:
163+
class_path: esrun.runner.tools.partitioners.noop_partitioner.NoopPartitioner
164+
165+
prepare_window_geometries:
166+
class_path: esrun.runner.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
167+
init_args:
168+
window_size: 128
169+
limit: 1 # Limit to a single window for testing
170+
output_projection:
171+
class_path: rslearn.utils.Projection
172+
init_args:
173+
crs:
174+
_module_: rasterio.crs
175+
_callable_: CRS.from_epsg
176+
code: 3857
177+
x_resolution: 9.554628535647032 # PIXEL_SIZE in rslp/forest_loss_driver/extract_dataset/extract_alerts.py
178+
y_resolution: -9.554628535647032
127179
```
128180
129181
```python file=run_inference_single_window.py
130-
from rslp.espredict_runner import EsPredictRunner
131-
132-
runner = EsPredictRunner(
133-
'model.yaml',
134-
'dataset.json',
135-
'partition_strategies.yaml',
136-
'postprocessing_strategies.yaml',
137-
'prediction/test-request1.geojson',
138-
scratch_path='scratch/'
139-
)
182+
from pathlib import Path
183+
from esrun.runner.local.predict_runner import EsPredictRunner
184+
185+
CONFIG_PATH = Path(__file__).parent
186+
187+
runner = EsPredictRunner(project_path=CONFIG_PATH,
188+
scratch_path=CONFIG_PATH / 'scratch')
189+
140190
partition_id = 'my-existing-partition-id' # Replace with the actual partition ID you want to run
141191
partitions = runner.partition()
142192
for partition_id in partitions:
143193
runner.run_inference(partition_id)
144194
```
145195
146196
## Writing Your Own Partitioners
147-
You may supply your own partitioners by creating a new class that implements the ` PartitionInterface` class in the `esrun.tools.partitioners.partition_interface` module. You can then specify your custom partitioner in the `partition_strategies.yaml` file. This class must exist on your PYTHONPATH and be importable by the esrunner. As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
197+
You may supply your own partitioners by creating a new class that implements the ` PartitionInterface` class in the `esrun.runner.tools.partitioners.partition_interface` module. You can then specify your custom partitioner in the `esrun.yaml` file. This class must exist on your PYTHONPATH and be importable by the esrunner. As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
148198

149199
## Writing your own post-processing strategies
150-
You may supply your own post-processing strategies by creating a new class that implements the `PostprocessInterface` class in the `esrun.tools.postprocessors.postprocess_inferface` module. You can then specify your custom post-processing strategy in the `postprocessing_strategies.yaml` file. This class must exist on your `PYTHONPATH` and be importable by the esrunner. As such we recommend you place your custom post-processing strategy in the `rslp/common/postprocessing` directory of this repository to ensure it gets installed into the final Docker image artifact.
200+
You may supply your own post-processing strategies by creating a new class that implements the `PostprocessInterface` class in the `esrun.runner.tools.postprocessors.postprocess_inferface` module. You can then specify your custom post-processing strategy in the `esrun.yaml` file. This class must exist on your `PYTHONPATH` and be importable by the esrunner. As such we recommend you place your custom post-processing strategy in the `rslp/common/postprocessing` directory of this repository to ensure it gets installed into the final Docker image artifact.
151201

152202
### Testing Partitioner & Post-Processing Implementations
153203
See the [earth-system-run](https://github.com/allenai/earth-system-run) repository for tests covering existing [partitioner](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/tools/partitioners) and [post-processor](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/tools/postprocessors) implementations.

0 commit comments

Comments
 (0)