Skip to content

Latest commit

 

History

History
249 lines (158 loc) · 7.23 KB

README_EN.md

File metadata and controls

249 lines (158 loc) · 7.23 KB

Speech Emotion Recognition

Speech emotion recognition using LSTM, CNN, SVM and MLP, implemented in Keras.

We have improved the feature extracting method and achieved higher accuracy (about 80%). The original version is backed up under First-Version branch.

English Document | 中文文档

 

Environments

  • Python 3.8
  • Keras & TensorFlow 2

 

Structure

├── models/                // models
│   ├── common.py          // base class for all models
│   ├── dnn                // neural networks
│   │   ├── dnn.py         // base class for all neural networks models
│   │   ├── cnn.py         // CNN
│   │   └── lstm.py        // LSTM
│   └── ml.py              // SVM & MLP
├── extract_feats/         // features extraction
│   ├── librosa.py         // extract features using librosa
│   └── opensmile.py       // extract features using Opensmile
├── utils/
│   ├── files.py           // setup dataset (classify and rename)
│   ├── opts.py            // argparse
│   └── plot.py            // plot graphs
├── config/                // configure hyper parameters (.yaml)
├── features/              // store extracted features
├── checkpoints/           // store model weights
├── train.py               // train
├── predict.py             // recognize the emotion of a given audio
└── preprocess.py          // data preprocessing (extract features and store them locally)

 

Requirments

Python

Tools

 

Datasets

  1. RAVDESS

    English, around 1500 audios from 24 people (12 male and 12 female) including 8 different emotions (the third number of the file name represents the emotional type): 01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised.

  2. SAVEE

    English, around 500 audios from 4 people (male) including 7 different emotions (the first letter of the file name represents the emotional type): a = anger, d = disgust, f = fear, h = happiness, n = neutral, sa = sadness, su = surprise.

  3. EMO-DB

    German, around 500 audios from 10 people (5 male and 5 female) including 7 different emotions (the second to last letter of the file name represents the emotional type): N = neutral, W = angry, A = fear, F = happy, T = sad, E = disgust, L = boredom.

  4. CASIA

    Chinese, around 1200 audios from 4 people (2 male and 2 female) including 6 different emotions: neutral, happy, sad, angry, fearful and surprised.

 

Usage

Prepare

Install dependencies:

pip install -r requirements.txt

(Optional) Install Opensmile.

 

Configuration

Parameters can be configured in the config files (YAML) under configs/.

It should be noted that, currently only the following 6 Opensmile standard feature sets are supported:

You may should modify item FEATURE_NUM in extract_feats/opensmile.py if you want to use other feature sets.

 

Preprocess

First of all, you should extract features of each audio in dataset and store them locally. Features extracted by Opensmile will be saved in .csv files and by librosa will be saved in .p files.

python preprocess.py --config configs/example.yaml

where configs/test.yaml is the path to your config file

 

Train

The path of the datasets can be configured in configs/. Audios which express the same emotion should be put in the same folder (you may want to refer to utils/files.py when setting up datasets), for example:

└── datasets
    ├── angry
    ├── happy
    ├── sad
    ...

Then:

python train.py --config configs/example.yaml

 

Predict

This is for when you have trained a model and want to predict the emotion for an audio. Check out checkpoints/ for some checkpoints.

First modify following things in predict.py:

audio_path = 'str: path_to_your_audio'

Then:

python predict.py --config configs/example.yaml

 

Functions

Radar Chart

Plot a radar chart for demonstrating predicted probabilities.

Source: Radar

import utils

"""
Args:
    data_prob (np.ndarray): probabilities
    class_labels (list): labels
"""
utils.radar(data_prob, class_labels)

 

Play Audio

import utils

utils.play_audio(file_path)

 

Plot Curve

Plot loss curve or accuracy curve.

import utils

"""
Args:
    train (list): loss or accuracy on train set
    val (list): loss or accuracy on validation set
    title (str): title of figure
    y_label (str): label of y axis
"""
utils.curve(train, val, title, y_label)

 

Waveform

Plot a waveform for an audio file.

import utils

utils.waveform(file_path)

 

Spectrogram

Plot a spectrogram for an audio file.

import utils

utils.spectrogram(file_path)

 

Other Contributors