A toolkit for processing CANOE databases to apply representative periods using clustering algorithms. This tool supports multiple versions of the Temoa schema (v2, v3, v3.1).
- Time Series Clustering: Generates representative periods from raw time series data (
clustering.py).- Supports Principal Component Analysis (PCA) for dimensionality reduction (
pca.py). - Allows custom feature selection strategies (
feature_selection.py).
- Supports Principal Component Analysis (PCA) for dimensionality reduction (
- Database Processing: Applies generated representative periods to SQLite databases for:
- Temoa Legacy Schema (
database_processing.py) - Temoa Schema v3 (
database_processing_v3.py) - Temoa Schema v3.1 (
database_processing_v3_1.py- usescanoe_schema_v3_1.sql)
- Temoa Legacy Schema (
- Automated Workflow:
process_all.pyorchestrates the entire flow. - Configurable: Highly customizable via
config.yaml.
-
Clone the repository (if you haven't already).
-
Set up the environment: It is recommended to use Conda/Mamba.
conda env create -f environment.yml conda activate canoe-backend
Alternatively, you can install dependencies via pip:
pip install -r requirements.txt
-
CRITICAL: Patch
tsamlibrary: Thetsamlibrary requires a custom modification for this tool to work correctly. You must replace thetimeseriesaggregation.pyfile in your Python environment'stsampackage with the one provided in this repository.Source file:
./timeseriesaggregation.py(in this root directory)Destination: Find where
tsamis installed. You can find the exact path by running:python -c "import tsam, os; print(os.path.dirname(tsam.__file__))"It is typically located at:
- Windows:
C:\Users\<user>\miniconda3\envs\canoe-backend\Lib\site-packages\tsam\ - macOS / Linux:
~/miniconda3/envs/canoe-backend/lib/python3.12/site-packages/tsam/(Note: On Apple Silicon Macs, this might be underminiforge3instead ofminiconda3)
Replace the existing
timeseriesaggregation.pyfile in that directory with the one from this repo.macOS / Linux Shortcut: If you have your conda environment activated, you can run this command from the
representative_periodsdirectory to automatically patch the file:cp ./timeseriesaggregation.py $(python -c "import tsam, os; print(os.path.dirname(tsam.__file__))")/ - Windows:
-
Prepare Input Data:
- Place your source SQLite databases (
.sqlite) in theinput_sqlite/directory. - Ensure your time series data is correctly structured in
timeseries/as referenced inconfig.yaml.
- Place your source SQLite databases (
-
Configuration:
- Edit
config.yamlto adjust clustering parameters, select time series columns, and define output settings.
- Edit
-
Run the Processor: To run the full workflow (clustering + database updating):
python process_all.py
Or run individual steps:
# Step 1: Generate representative periods python clustering.py # Step 2: Update databases (choose the script matching your schema version) python database_processing_v3_1.py
-
Outputs:
- Processed Databases: Found in
output_sqlite/. - Clustering Data: Debugging and visualization data in
clustering_output_data/. - Periods File:
periods.csv(the raw representative periods).
- Processed Databases: Found in
process_all.py: Main entry point.clustering.py: Logic for time series clustering.pca.py: Utilities for performing PCA on time series groups before clustering.feature_selection.py: Contains custom strategies for selecting specific periods (e.g. max mean).database_processing*.py: Scripts to update SQLite databases with new periods.canoe_schema_v3_1.sql: SQL schema definition for Temoa v3.1 databases.config.yaml: Main configuration file.timeseries/: Directory containing raw input data for clustering.input_sqlite/: Drop your raw databases here.output_sqlite/: Pick up your processed databases from here.