Tags: climate-data, etl, geoprocessing, python, geoserver, chirps, copernicus, AGera5
Python package for processing spatial historical climate data with a complete ETL pipeline that includes:
- Data download from CHIRPS and Copernicus sources
- Spatial clipping by country boundaries
- Monthly aggregation and climatology calculations
- Climate indicators calculation (TR20, TXx, etc.)
- GeoServer integration for data publishing
Key Features:
- Automated processing of temperature, precipitation, and solar radiation data
- Parallel processing for downloads and indicator calculations
- Flexible indicator calculation with custom year ranges
- Flexible configuration for multiple countries and variables
- End-to-end pipeline from raw data to published layers
- Database-backed configuration management
- Python > 3.10
- Copernicus Climate Data Store (CDS) API key - Register here
- GeoServer
- PostgreSQL database for configuration storage
pip install git+https://github.com/CIAT-DAPA/aclimate_v3_cut_spatial_data.git
pip install git+https://github.com/CIAT-DAPA/aclimate_v3_spatial_importer.git
pip install git+https://github.com/CIAT-DAPA/aclimate_v3_orm
pip install git+https://github.com/CIAT-DAPA/aclimate_v3_historical_spatial_etlTo install a specific version:
pip install git+https://github.com/CIAT-DAPA/[email protected]python -m aclimate_v3_historical_spatial_etl.aclimate_run_etl \
--country HONDURAS \
--start_date 2020-01 \
--end_date 2020-12 \
--data_path /path/to/data \
--climatologypython -m aclimate_v3_historical_spatial_etl.aclimate_run_etl \
--country HONDURAS \
--start_date 2025-01 \
--end_date 2025-01 \
--data_path /path/to/data \
--indicators \
--indicator_years 2015-2020python -m aclimate_v3_historical_spatial_etl.aclimate_run_etl \
--country COLOMBIA \
--start_date 2025-01 \
--end_date 2025-01 \
--data_path /path/to/data \
--skip_download \
--skip_processing \
--indicators \
--indicator_years 2010-2020Note
New Options:
--skip_download: Skip the data download step--skip_processing: Skip data processing (clipping, monthly aggregation)--climatology: Calculate monthly averages-climatology--indicators: Calculate climate indicators--indicator_years YYYY-YYYY: Specify year range for indicator calculation--no_cleanup: Keep intermediate files after processing
from aclimate_v3_historical_spatial_etl.aclimate_run_etl import run_etl_pipeline
run_etl_pipeline(
country="HONDURAS",
start_date="2020-01",
end_date="2020-12",
data_path="/path/to/data",
climatology=True,
indicators=True,
indicator_years="2015-2020"
)data/
βββ config/ # Must contain required JSON config files
βββ raw_data/ # Downloaded raw datasets
βββ process_data/ # Intermediate raster data
βββ calc_data/
β βββ climatology_data/ # Climatology outputs
β βββ monthly_data/ # Monthly processed rasters
β βββ indicators_data/ # Climate indicators (TR20, TXx, etc.)
βββ upload_geoserver/ # Output prepared for GeoServer publishing| Indicator | Name | Description | Unit |
|---|---|---|---|
| TR20 | Tropical Days | Annual count of days with Tmax > 20Β°C | days |
| TXx | Maximum Temperature Maximum | Annual maximum of daily maximum temperature | Β°C |
- Parallel Processing: Multiple years calculated simultaneously
- Flexible Year Ranges: Calculate indicators for any historical period
- Raster Output: Results saved as GeoTIFF files with proper georeferencing
- Multi-year Statistics: Automatic calculation of multi-year averages
- Memory Efficient: Processes data in chunks to handle large datasets
π Detailed Usage Guide: See INDICATORS_USAGE.md for comprehensive examples and configuration options.
All configurations are stored in the database. Ensure your database contains the required configuration entries for:
chirps_config- CHIRPS download settingscopernicus_config- Copernicus/ERA5 settingsclipping_config- Country boundaries and ISO codesgeoserver_config- GeoServer workspace and store namesnaming_config- Output file naming conventions
- Windows:
# GeoServer credentials
set GEOSERVER_URL=http://localhost:8086/geoserver/rest/
set GEOSERVER_USER=admin
set GEOSERVER_PASSWORD=password
set OTLP_ENDPOINT=localhost:4317
set ENABLE_SIGNOZ=false
set LOG_FILE_PATH=path/application.log
set DATABASE_URL=postgresql://postgres:admin@localhost:5432/acimate_v3- Linux/Ubuntu:
# GeoServer credentials
export GEOSERVER_URL=http://localhost:8086/geoserver/rest/
export GEOSERVER_USER=admin
export GEOSERVER_PASSWORD=password
export OTLP_ENDPOINT=localhost:4317
export ENABLE_SIGNOZ=false
export LOG_FILE_PATH=path/application.log
export DATABASE_URL=postgresql://postgres:admin@localhost:5432/acimate_v3Note
Options:
GEOSERVER_URL: Geoserver URLGEOSERVER_USER: Geoserver userGEOSERVER_PASSWORD: Geoserver passwordOTLP_ENDPOINT: Signoz endpoint to send logsENABLE_SIGNOZ: Flag to send logs to signozLOG_FILE_PATH: Path to save logsDATABASE_URL: Connection string to database
# Install test requirements
pip install pytest pytest-mock
# Run tests
pytest tests/Our GitHub Actions pipeline implements a three-stage deployment process:
Code Push β Test Stage β Merge Stage β Release Stageaclimate_v3_historical_spatial_etl/
β
βββ .github/
β βββ workflows/ # CI/CD pipeline configurations
βββ src/
β βββ aclimate_v3_historical_spatial_etl/
β βββ connectors/ # Downloaders: CHIRPS, Copernicus
β βββ tools/ # Clipping and GeoServer tools
β βββ climate_processing/ # Monthly and climatology processors
β βββ config/ # Example config files
β βββ aclimate_run_etl.py # Main ETL entry script
βββ tests/ # Unit and integration tests
βββ requirements.txt # Dependencies
βββ pyproject.toml # Packaging