- Run the notebook: mtkd4ser.ipynb
1. Clone the Repository
git clone https://github.com/aalto-speech/mtkd4ser.gitcd mtkd4ser2. Create the Environment
conda env create -f environment.yml3. Activate the Environment
conda activate ser_venv1. Multi-Teacher Language-Aware Knowledge Distillation for English Speech Emotion Recognition Using the Monolingual Setup
python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 5 --TRAINING 1 --PARADIGM "MTKD" --LANGUAGE "EN" --LINGUALITY "Monolingual"2. Conventional Knowledge Distillation for Finnish Speech Emotion Recognition Using the Multilingual Setup
python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 9 --TRAINING 1 --PARADIGM "KD" --LANGUAGE "FI" --LINGUALITY "Multilingual"3. Vanilla Fine-Tuning for French Speech Emotion Recognition Using the Multilingual Setup
python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 1 --TRAINING 1 --PARADIGM "FT" --LANGUAGE "FR" --LINGUALITY "Multilingual"4. Available Configurations and Choices
It supports a range of configurable parameters for training, validation, and evaluation. The table below details each Configuration and its options. Select the options that fit your use case.
| Configuration | Options |
|---|---|
| LINGUALITY | Monolingual or Multilingual |
| LANGUAGE | EN or FI or FR |
| PARADIGM | MTKD or KD or FT |
| TRAINING | 1 or 0 |
| SESSION | EN: 1-5 or FI: 1-9 or FR: 1 |
| N_EPOCHS | ℤ⁺ |
| BATCH_SIZE | ℤ⁺ |
| LEARNING_RATE | ℝ⁺ |
- MTKD-based monolingual SER methods for English, Finnish, and French.
- Adapt the method for a new language (e.g., Chinese).
- MTKD-based multilingual SER method for English, Finnish, and French.
- Extend the multilingual method to include a resource-scarce language (e.g., Bangla).
- Incorporate heterogeneous Large Audio-Language Models in the MTKD method.
- Distill the internal knowledge of heterogeneous models to the student.