Skip to content

Latest commit

 

History

History
47 lines (40 loc) · 2.32 KB

README.md

File metadata and controls

47 lines (40 loc) · 2.32 KB

Speech-Recognition

based on NeMo

  • Open In Colab

based on espnet

PyTorch implementation of DeepSpeech2 trained with the CTC objective.

differences to deepspeech.pytorch

  • after 8 epochs and 24hours with Adam
python evaluation.py --model epoch=8.ckpt --datasets test-clean
2528 of 2620 samples are suitable for training
100%|█████████████████████████████████████| 127/127 [02:12<00:00,  1.04s/it]
Test Summary    Average WER 9.925       Average CER 3.239

python evaluation.py --model epoch=8.ckpt --datasets test-other
2893 of 2939 samples are suitable for training
100%|███████████████████████████████████████| 145/145 [01:19<00:00,  1.83it/s]
Test Summary    Average WER 27.879      Average CER 11.739

Datasets

Librispeech

  1. to download data see: https://github.com/dertilo/speech-to-text/corpora/download_corpora.py
  • splits
    datasets = [
        ("train", ["train-clean-100", "train-clean-360", "train-other-500"]),
        ("eval", ["dev-clean", "dev-other"]),
        ("test", ["test-clean", "test-other"]),
    ]
    
  • number of samples
    train got 281241 samples
    eval got 5567 samples
    test got 5559 samples