Extend Dataset Preparation for AI-Ready Format in OCELOT

### Summary:
We need to enhance the current dataset preparation pipeline to support additional observation types and ensure all features are structured in a model-ready format for the OCELOT GNN model.

**Current Functionality:**

- The current script:
- Filters satellite observations by satellite ID and time range
- Groups data into 12-hour bins
- Extracts sensor and BT features
- Normalizes features using MinMaxScaler
- Adds lat/lon metadata and converts everything to PyTorch tensors

### Requested Enhancements:

**1. Add More Observation Types:**

- Extend `extract_features()` to include additional variables from the Zarr dataset (Similar to GraphDOP). Preprocess continuous and categorical variables accordingly.

**2. Flexible Normalization:**

- Make the normalization method configurable (e.g., MinMaxScaler, StandardScaler, or none). Apply scaling only to continuous variables and exclude categorical or geolocation fields (lat/lon) from normalization.

**3. Generalize Time Binning Logic:**

- Allow configurable time bin sizes (e.g., 6h, 12h, 24h) rather than hardcoding 12-hour intervals.

**4. Add Validation and Logging:**

- Add logging for major steps (e.g., time bin creation, feature extraction). Include data validation (e.g., NaN checks, missing value handling) for robustness.


### Deliverables:

- [ ] Updated and modular Python code
- [ ] Preprocessing pipeline that supports additional variables and configurable normalization
- [ ] Clear documentation and comments in the code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Dataset Preparation for AI-Ready Format in OCELOT #16

Summary:

Requested Enhancements:

Deliverables:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extend Dataset Preparation for AI-Ready Format in OCELOT #16

Description

Summary:

Requested Enhancements:

Deliverables:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions