Skip to content

eeg2025/downsample-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EEG Data Processing Pipeline

Caution

You probably SHOULD NOT use this repository directly for the EEG2025 Challenge. All available datasets have already been processed using this pipeline. This repository is provided for transparency and reproducibility. If you want to process your own BIDS datasets, feel free to use it.

A comprehensive pipeline for processing, resampling, and converting EEG-BIDS datasets. Designed for the HBN (Healthy Brain Network) EEG datasets and the EEG2025 NeurIPS Challenge.

πŸ“ Project Structure

resample_dataset/
β”‚
β”œβ”€β”€ resampling/              # EEG resampling and filtering tools
β”‚   β”œβ”€β”€ process_eeg_data.m      # MATLAB script for filtering & resampling
β”‚   └── process_metadata.py     # Update BIDS metadata after resampling
β”‚
β”œβ”€β”€ format_conversion/       # Format conversion tools
β”‚   β”œβ”€β”€ convert_set_to_bdf.py   # Convert EEGLAB SET β†’ BDF format
β”‚   β”œβ”€β”€ convert_set_to_edf.py   # Convert EEGLAB SET β†’ EDF format
β”‚   └── complete_bids_datasets.py # Maintain BIDS structure after conversion
β”‚
β”œβ”€β”€ utils/                   # Utility and validation tools
β”‚   β”œβ”€β”€ compare_signal_formats.py    # Compare signals between formats
β”‚   β”œβ”€β”€ fix_events_duration.py       # Fix event duration issues
β”‚   β”œβ”€β”€ signal_comparison_results/   # Comparison visualizations
β”‚   └── SIGNAL_COMPARISON_SUMMARY.md # Detailed comparison report
β”‚
β”œβ”€β”€ run_eeg_processing.sh    # Main script for resampling pipeline
β”œβ”€β”€ run_bdf_conversion.sh    # Main script for BDF conversion
β”œβ”€β”€ LICENSE
└── README.md               # This file

πŸš€ Quick Start

Resample EEG Data (500Hz β†’ 100Hz)

./run_eeg_processing.sh /path/to/input/bids /path/to/output/resampled

Convert to BDF Format

./run_bdf_conversion.sh /path/to/input/bids /path/to/output/bdf

πŸ”§ Features

1. EEG Resampling & Filtering

  • Downsamples EEG data from 500 Hz to 100 Hz
  • Applies bandpass filter (0.5-50 Hz) to remove artifacts
  • Validates and cleans event markers
  • Updates BIDS metadata accordingly

2. Format Conversion

  • Converts EEGLAB SET files to:
    • BDF (BioSemi Data Format)
    • EDF (European Data Format)
  • Preserves all signal information and metadata
  • Maintains complete BIDS structure

3. Signal Validation

  • Compare signals between different formats
  • Generate correlation plots and difference visualizations
  • Produce detailed comparison reports

πŸ“‹ Requirements

For Resampling Pipeline

  • MATLAB with EEGLAB toolbox installed
  • Python 3.8+
  • Bash shell (macOS/Linux)

For Format Conversion

  • Python 3.8+

πŸ”§ Installation

1. Clone the repository

git clone https://github.com/eeg2025/downsample-datasets.git
cd downsample-datasets

2. Install Python dependencies

pip install -r requirements.txt

This will install:

  • pandas - Data manipulation
  • numpy - Numerical computing
  • emgio - EEG format conversion (from GitHub)
  • matplotlib - Visualization
  • scipy - Signal processing

πŸ“– Detailed Usage

Resampling Pipeline

The resampling pipeline processes an EEG-BIDS dataset to reduce sampling rate and apply filters:

./run_eeg_processing.sh <INPUT_DIR> <OUTPUT_DIR>

What it does:

  1. EEG Data (.set files):

    • Applies bandpass filter: 0.5-50 Hz
    • Resamples: 500 Hz β†’ 100 Hz
    • Validates and cleans events
  2. JSON Metadata:

    • Updates SamplingFrequency from 500 to 100
  3. Events Files:

    • Removes sample column (tied to original 500 Hz sampling)
  4. Other Files:

    • Copies unchanged to maintain BIDS structure

Format Conversion Pipeline

Convert EEGLAB SET files to BDF or EDF format:

# For BDF conversion
./run_bdf_conversion.sh <INPUT_DIR> <OUTPUT_DIR>

# For EDF conversion (create your own script using convert_set_to_edf.py)
python3 format_conversion/convert_set_to_edf.py <INPUT_DIR> <OUTPUT_DIR>

What it does:

  1. Converts all SET files to the specified format
  2. Maintains complete BIDS directory structure
  3. Updates metadata to reflect format change
  4. Generates conversion reports

Signal Comparison

Validate conversions by comparing signals:

python3 utils/compare_signal_formats.py <ORIGINAL_DIR> <CONVERTED_DIR>

This generates:

  • Correlation plots for each file
  • Signal difference visualizations
  • Summary statistics report

πŸ“Š Processing Details

Input Structure

/path/to/input/
β”œβ”€β”€ dataset_description.json
β”œβ”€β”€ participants.tsv
β”œβ”€β”€ sub-*/
β”‚   └── eeg/
β”‚       β”œβ”€β”€ *.set (EEG data files)
β”‚       β”œβ”€β”€ *.fdt (EEGLAB data files)
β”‚       β”œβ”€β”€ *_eeg.json (metadata)
β”‚       β”œβ”€β”€ *_events.tsv (event files)
β”‚       └── other BIDS files

Output Structure

/path/to/output/
β”œβ”€β”€ dataset_description.json (updated)
β”œβ”€β”€ participants.tsv (copied)
β”œβ”€β”€ sub-*/
β”‚   └── eeg/
β”‚       β”œβ”€β”€ *.set/.bdf/.edf (processed/converted)
β”‚       β”œβ”€β”€ *.fdt (if SET format)
β”‚       β”œβ”€β”€ *_eeg.json (updated metadata)
β”‚       β”œβ”€β”€ *_events.tsv (processed)
β”‚       └── other BIDS files (copied)

⚑ Performance

  • Resampling: ~1-2 minutes per subject
  • Format Conversion: ~30 seconds per file
  • Resume Capability: Skips already processed files
  • Error Handling: Individual file errors don't stop the pipeline

πŸ” Troubleshooting

EEGLAB Not Found

Error: EEGLAB not found. Please add EEGLAB to your MATLAB path.

Solution: In MATLAB, run:

addpath('/path/to/eeglab')
eeglab  % Initialize EEGLAB

Python Dependencies Missing

Warning: Required Python packages not found.

Solution: Install dependencies:

pip install -r requirements.txt

MATLAB Not in PATH

Error: MATLAB not found. Please ensure MATLAB is in your PATH.

Solution: Add MATLAB to PATH or use full path:

export PATH="/Applications/MATLAB_R2023a.app/bin:$PATH"

πŸ“ Notes

  • Original data is never modified
  • Output preserves complete BIDS structure
  • Processing can be resumed if interrupted
  • Events are validated and cleaned during resampling
  • All conversions maintain signal fidelity

🀝 Contributing

Feel free to submit issues, fork the repository, and create pull requests for any improvements.

πŸ“„ License

See LICENSE file for details.

πŸ‘₯ Authors

Developed for processing HBN-EEG datasets for the EEG2025 NeurIPS Challenge.


For more information about BIDS format: https://bids.neuroimaging.io/ For EEGLAB documentation: https://eeglab.org/

About

Downsample the datasets for efficient training and memory storage

Resources

License

Stars

Watchers

Forks