allenai
diff --git a/‎data/esrunner_sample/.gitignore‎
Lines changed: 5 additions & 0 deletions b/‎data/esrunner_sample/.gitignore‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎data/esrunner_sample/README.md‎
Lines changed: 120 additions & 70 deletions b/‎data/esrunner_sample/README.md‎
Lines changed: 120 additions & 70 deletions
@@ -1 +1,6 @@
+.python-version
+checkpoint.ckpt
+
 scratch/
+earth-system-run/
+lightning_logs
@@ -8,18 +8,25 @@ ESRunner provides the [EsPredictRunner](https://github.com/allenai/earth-system-
 
 - Install `esrunner` (earth-system-run) in your development environment. 
   ```
-  pip install earth-system-run @ git+https://github.com/allenai/earth-system-run.git
+  pip install earth-system-run[runner] @ git+https://github.com/allenai/earth-system-run.git@v1-develop
   ```
+  
+- or create a virtual environment using `pyenv` and `pyenv-virtualenv`:
+  ```
+  pyenv virtualenv 3.11 esrunner-project
+  pyenv local esrunner-project
+  git clone [email protected]:allenai/earth-system-run.git earth-system-run
+  pip3 install ./earth-system-run/[runner]
+  ```
+
 - Following the project structure below, create a directory in the `rslearn-projects/data/` directory. This directory will contain all the necessary files for your prediction or fine-tuning pipeline.
 
 ## Project Structure
 - `checkpoint.ckpt`: (Optional)
 - `dataset.json`: This is the rslearn dataset definition file.
 - `model.yaml`: This is the rslearn (pytorch) model definition file.
-- `partition_strategies.yaml`: 
-- `postprocessing_strategies.yaml`: This file defines how the esrunner will post-process the predictions.  
-- `requirements.txt`: This file contains the additional Python packages required for the pipeline. It should include any dependencies that are not part of the base environment.
-- `prediction/test-request1.geojson`: This directory contains the prediction requests in GeoJSON format. Each file represents a set of prediction requests for a specific region or time period.  Many different prediction requests can be defined within a single file as separate features in the feature collection. The esrunner will partition these requests into smaller tasks based on the partition strategies defined in `partition_strategies.yaml`.
+- `esrun.yaml`: This file defines the esrun model configuration. It defines partitioning and post-processing strategies.
+- `prediction_request_geometry.geojson`: The prediction request GeoJSON feature collection.
 - `run_pipeline.py`: This script is used to run the prediction pipeline. It will read the configuration files and execute the necessary steps to perform predictions or fine-tuning. You can customize this script to suit your specific needs, such as adding additional logging or error handling.
 
 ## Partitioning Strategies
@@ -33,18 +40,28 @@ Partitioning strategies can be mixed and matched for flexible development.
 Available partitioners:
 - `FixedWindowPartitioner` - Given a fixed window size, this partitioner will create partitions of that size for each lat/lon or polygon centroid in the prediction request.
 - `GridPartitioner` - Given a grid size, this partitioner will create partitions based on the grid cells that intersect with the prediction request.
-- NoopPartitioner - Does not partition the prediction request. This is useful for testing or when you want to run the entire prediction request as a single task.
+- `NoopPartitioner` - Does not partition the prediction request. This is useful for testing or when you want to run the entire prediction request as a single task.
+
+Example `esrun.yaml` snippet. This will leave the original input as a single partition, but will create individual windows of size 128x128 pixels for each feature.
 
-Example `partition_strategies.yaml`. This will leave the original input as a single partition, but will create individual windows of size 128x128 pixels for each feature.
 ```yaml
-partition_request_geometry:
-  class_path: esrun.tools.partitioners.noop_partitioner.NoopPartitioner
-  init_args:
-
-prepare_window_geometries:
-  class_path: esrun.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
-  init_args:
-    window_size: 128 # intended to be a pixel value
+partition_strategies:
+  partition_request_geometry:
+    class_path: esrun.runner.tools.partitioners.noop_partitioner.NoopPartitioner
+
+  prepare_window_geometries:
+    class_path: esrun.runner.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
+    init_args:
+      window_size: 128
+      output_projection:
+        class_path: rslearn.utils.Projection
+        init_args:
+          crs:
+            _module_: rasterio.crs
+            _callable_: CRS.from_epsg
+            code: 3857
+          x_resolution: 9.554628535647032  # PIXEL_SIZE in rslp/forest_loss_driver/extract_dataset/extract_alerts.py
+          y_resolution: -9.554628535647032
 ```
 
 ## Post-Processing Strategies
@@ -53,101 +70,134 @@ There are 3 different stages to postprocessing:
   - `postprocess_partition()` - This is the stage where the outputs from the window postprocessors are combined into a single per-partition artifact.
   - `postprocess_dataset()` - This is the final stage of postprocessing where the partition level outputs are combined into a artifact.
 
+Example: This uses the `CombineGeojson` postprocessor to combine the outputs from the window postprocessors into a single GeoJSON file for each partition and the entire dataset.
+
+```yaml
+postprocessing_strategies:
+  process_window:
+    class_path: esrun.runner.tools.postprocessors.combine_geojson.CombineGeojson
+
+  process_partition:
+    class_path: esrun.runner.tools.postprocessors.combine_geojson.CombineGeojson
+
+  process_dataset:
+    class_path: esrun.runner.tools.postprocessors.combine_geojson.CombineGeojson
+```
+
 ## Samples
 
+### Running the sample in this project
+
+Setup Your environment as described above, and copy a forest loss checkpoint into the project directory.
+
+```
+  gsutil cp gs://earth-system-run-dev/models/1b82c096-9ba2-424f-a1c9-86d2a8986b7d/stage_0/checkpoint.ckpt checkpoint.ckpt
+```
+
+Run the full pipeline.
+
+```
+python3 run_pipeline.py
+```
+
+The `scratch/results/result.geojson` file should contain the results of the prediction request.  The scratch/dataset directory should contain the materialized dataset for the prediction request.
+
 ### Run a pipeline end-to-end
 
 ```python file=run_pipeline.py
-from rslp.espredict_runner import EsPredictRunner
-
-runner = EsPredictRunner(
-    'model.yaml',
-    'dataset.json',
-    'partition_strategies.yaml',
-    'postprocessing_strategies.yaml',
-    'prediction/test-request1.geojson',
-    scratch_path='scratch/'
-)
+from pathlib import Path
+
+from esrun.runner.local.predict_runner import EsPredictRunner
+
+CONFIG_PATH = Path(__file__).parent
+
+runner = EsPredictRunner(project_path=CONFIG_PATH,
+                         scratch_path=CONFIG_PATH / 'scratch')
+
 partitions = runner.partition()
 for partition_id in partitions:
+    print(f"Processing partition: {partition_id}")
     runner.build_dataset(partition_id)
     runner.run_inference(partition_id)
     runner.postprocess(partition_id)
-
 runner.combine(partitions)
 ```
 
 ### Run dataset building for the entire prediction request.
 ```python file=run_dataset_building.py
-from rslp.espredict_runner import EsPredictRunner
-
-runner = EsPredictRunner(
-    'model.yaml',
-    'dataset.json',
-    'partition_strategies.yaml',
-    'postprocessing_strategies.yaml',
-    'prediction/test-request1.geojson',
-    scratch_path='scratch/'
-)
-
-for partition_id in runner.partition():
+from pathlib import Path
+from esrun.runner.local.predict_runner import EsPredictRunner
+
+CONFIG_PATH = Path(__file__).parent
+
+runner = EsPredictRunner(project_path=CONFIG_PATH,
+                         scratch_path=CONFIG_PATH / 'scratch')
+
+partitions = runner.partition()
+for partition_id in partitions:
+    print(f"Processing partition: {partition_id}")
     runner.build_dataset(partition_id)
 ```
 
 ### Run inference for a single partition.  
 (Assumes you have an existing materialized dataset for the partition.)
 ```python file=run_inference_single_partition.py
-from rslp.espredict_runner import EsPredictRunner
-
-runner = EsPredictRunner(
-    'model.yaml',
-    'dataset.json',
-    'partition_strategies.yaml',
-    'postprocessing_strategies.yaml',
-    'prediction/test-request1.geojson',
-    scratch_path='scratch/'
-)
+from pathlib import Path
+from esrun.runner.local.predict_runner import EsPredictRunner
+
+CONFIG_PATH = Path(__file__).parent
+
+runner = EsPredictRunner(project_path=CONFIG_PATH,
+                         scratch_path=CONFIG_PATH / 'scratch')
+
 partition_id = 'my-existing-partition-id'  # Replace with the actual partition ID you want to run
 runner.run_inference(partition_id)
 ```
 
 ### Run inference for a single window. 
 Since we don't expose window-level inference via the runner API, you can configure your partitioners to produce limited sets of partitions and windows.
 
-```yaml file=partition_strategies.yaml
-strategy_large:
-  class_path: esrun.tools.partitioners.noop_partitioner.NoopPartitioner
-  init_args:
-
-strategy_small:
-  class_path: esrun.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
-  init_args:
-    window_size: 128 # intended to be a pixel value
-    limit: 1  # This will limit window generation to a single window per large partition, effectively allowing you to run inference on a single window.
+```yaml file=esrun.yaml
+partition_strategies:
+  partition_request_geometry:
+    class_path: esrun.runner.tools.partitioners.noop_partitioner.NoopPartitioner
+
+  prepare_window_geometries:
+    class_path: esrun.runner.tools.partitioners.fixed_window_partitioner.FixedWindowPartitioner
+    init_args:
+      window_size: 128
+      limit: 1  # Limit to a single window for testing
+      output_projection:
+        class_path: rslearn.utils.Projection
+        init_args:
+          crs:
+            _module_: rasterio.crs
+            _callable_: CRS.from_epsg
+            code: 3857
+          x_resolution: 9.554628535647032  # PIXEL_SIZE in rslp/forest_loss_driver/extract_dataset/extract_alerts.py
+          y_resolution: -9.554628535647032
 ```
 
 ```python file=run_inference_single_window.py
-from rslp.espredict_runner import EsPredictRunner
-
-runner = EsPredictRunner(
-    'model.yaml',
-    'dataset.json',
-    'partition_strategies.yaml',
-    'postprocessing_strategies.yaml',
-    'prediction/test-request1.geojson',
-    scratch_path='scratch/'
-)
+from pathlib import Path
+from esrun.runner.local.predict_runner import EsPredictRunner
+
+CONFIG_PATH = Path(__file__).parent
+
+runner = EsPredictRunner(project_path=CONFIG_PATH,
+                         scratch_path=CONFIG_PATH / 'scratch')
+
 partition_id = 'my-existing-partition-id'  # Replace with the actual partition ID you want to run
 partitions = runner.partition()
 for partition_id in partitions:
     runner.run_inference(partition_id)
 ```
 
 ## Writing Your Own Partitioners
-You may supply your own partitioners by creating a new class that implements the ` PartitionInterface` class in the `esrun.tools.partitioners.partition_interface` module.  You can then specify your custom partitioner in the `partition_strategies.yaml` file.  This class must exist on your PYTHONPATH and be importable by the esrunner.  As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
+You may supply your own partitioners by creating a new class that implements the ` PartitionInterface` class in the `esrun.runner.tools.partitioners.partition_interface` module.  You can then specify your custom partitioner in the `esrun.yaml` file.  This class must exist on your PYTHONPATH and be importable by the esrunner.  As such we recommend you place your custom partitioner in the `rslp/common/partitioners` directory of this repository to ensure it gets installed into the final Dockerimage artifact.
 
 ## Writing your own post-processing strategies
-You may supply your own post-processing strategies by creating a new class that implements the `PostprocessInterface` class in the `esrun.tools.postprocessors.postprocess_inferface` module.  You can then specify your custom post-processing strategy in the `postprocessing_strategies.yaml` file.  This class must exist on your `PYTHONPATH` and be importable by the esrunner.  As such we recommend you place your custom post-processing strategy in the `rslp/common/postprocessing` directory of this repository to ensure it gets installed into the final Docker image artifact.
+You may supply your own post-processing strategies by creating a new class that implements the `PostprocessInterface` class in the `esrun.runner.tools.postprocessors.postprocess_inferface` module.  You can then specify your custom post-processing strategy in the `esrun.yaml` file.  This class must exist on your `PYTHONPATH` and be importable by the esrunner.  As such we recommend you place your custom post-processing strategy in the `rslp/common/postprocessing` directory of this repository to ensure it gets installed into the final Docker image artifact.
 
 ### Testing Partitioner & Post-Processing Implementations
 See the [earth-system-run](https://github.com/allenai/earth-system-run) repository for tests covering existing [partitioner](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/tools/partitioners) and [post-processor](https://github.com/allenai/earth-system-run/tree/v1-develop/tests/unit/tools/postprocessors) implementations.