CANOE Representative Periods Processor

A toolkit for processing CANOE databases to apply representative periods using clustering algorithms. This tool supports multiple versions of the Temoa schema (v2, v3, v3.1).

Features

Time Series Clustering: Generates representative periods from raw time series data (clustering.py).
- Supports Principal Component Analysis (PCA) for dimensionality reduction (pca.py).
- Allows custom feature selection strategies (feature_selection.py).
Database Processing: Applies generated representative periods to SQLite databases for:
- Temoa Legacy Schema (database_processing.py)
- Temoa Schema v3 (database_processing_v3.py)
- Temoa Schema v3.1 (database_processing_v3_1.py - uses canoe_schema_v3_1.sql)
Automated Workflow: process_all.py orchestrates the entire flow.
Configurable: Highly customizable via config.yaml.

Installation

Clone the repository (if you haven't already).
Set up the environment: It is recommended to use Conda/Mamba.
```
conda env create -f environment.yml
conda activate canoe-backend
```
Alternatively, you can install dependencies via pip:
```
pip install -r requirements.txt
```
CRITICAL: Patch tsam library: The tsam library requires a custom modification for this tool to work correctly. You must replace the timeseriesaggregation.py file in your Python environment's tsam package with the one provided in this repository.

Source file: ./timeseriesaggregation.py (in this root directory)

Destination: Find where tsam is installed. You can find the exact path by running:
```
python -c "import tsam, os; print(os.path.dirname(tsam.__file__))"
```
It is typically located at:
- Windows: C:\Users\<user>\miniconda3\envs\canoe-backend\Lib\site-packages\tsam\
- macOS / Linux: ~/miniconda3/envs/canoe-backend/lib/python3.12/site-packages/tsam/ (Note: On Apple Silicon Macs, this might be under miniforge3 instead of miniconda3)
Replace the existing timeseriesaggregation.py file in that directory with the one from this repo.

macOS / Linux Shortcut: If you have your conda environment activated, you can run this command from the representative_periods directory to automatically patch the file:
```
cp ./timeseriesaggregation.py $(python -c "import tsam, os; print(os.path.dirname(tsam.__file__))")/
```

Usage

Prepare Input Data:
- Place your source SQLite databases (.sqlite) in the input_sqlite/ directory.
- Ensure your time series data is correctly structured in timeseries/ as referenced in config.yaml.
Configuration:
- Edit config.yaml to adjust clustering parameters, select time series columns, and define output settings.

Run the Processor: To run the full workflow (clustering + database updating):

python process_all.py

Or run individual steps:

# Step 1: Generate representative periods
python clustering.py

# Step 2: Update databases (choose the script matching your schema version)
python database_processing_v3_1.py

Outputs:
- Processed Databases: Found in output_sqlite/.
- Clustering Data: Debugging and visualization data in clustering_output_data/.
- Periods File: periods.csv (the raw representative periods).

Project Structure

process_all.py: Main entry point.
clustering.py: Logic for time series clustering.
pca.py: Utilities for performing PCA on time series groups before clustering.
feature_selection.py: Contains custom strategies for selecting specific periods (e.g. max mean).
database_processing*.py: Scripts to update SQLite databases with new periods.
canoe_schema_v3_1.sql: SQL schema definition for Temoa v3.1 databases.
config.yaml: Main configuration file.
timeseries/: Directory containing raw input data for clustering.
input_sqlite/: Drop your raw databases here.
output_sqlite/: Pick up your processed databases from here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CANOE Representative Periods Processor

Features

Installation

Usage

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
clustering_output_data		clustering_output_data
input_sqlite		input_sqlite
output_sqlite		output_sqlite
timeseries		timeseries
.gitignore		.gitignore
README.md		README.md
README.txt		README.txt
canoe_schema_v3_1.sql		canoe_schema_v3_1.sql
clustering.py		clustering.py
config.yaml		config.yaml
database_processing.py		database_processing.py
database_processing_v3.py		database_processing_v3.py
database_processing_v3_1.py		database_processing_v3_1.py
environment.yml		environment.yml
feature_selection.py		feature_selection.py
pca.py		pca.py
process_all.py		process_all.py
timeseriesaggregation.py		timeseriesaggregation.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

CANOE Representative Periods Processor

Features

Installation

Usage

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages