Speech-Recognition

based on NeMo

based on espnet

no batch inference yet?

based on deepspeech.pytorch

PyTorch implementation of DeepSpeech2 trained with the CTC objective.

differences to deepspeech.pytorch

no use of warp-ctc, instead torch.nn.CTCLoss
powered by pytorch-lightning

results

after 8 epochs and 24hours with Adam

python evaluation.py --model epoch=8.ckpt --datasets test-clean
2528 of 2620 samples are suitable for training
100%|█████████████████████████████████████| 127/127 [02:12<00:00,  1.04s/it]
Test Summary    Average WER 9.925       Average CER 3.239

python evaluation.py --model epoch=8.ckpt --datasets test-other
2893 of 2939 samples are suitable for training
100%|███████████████████████████████████████| 145/145 [01:19<00:00,  1.83it/s]
Test Summary    Average WER 27.879      Average CER 11.739

Datasets

Librispeech

to download data see: https://github.com/dertilo/speech-to-text/corpora/download_corpora.py

splits

datasets = [
    ("train", ["train-clean-100", "train-clean-360", "train-other-500"]),
    ("eval", ["dev-clean", "dev-other"]),
    ("test", ["test-clean", "test-other"]),
]

number of samples

train got 281241 samples
eval got 5567 samples
test got 5559 samples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speech-Recognition

based on NeMo

based on espnet

based on deepspeech.pytorch

differences to deepspeech.pytorch

results

Datasets

Librispeech

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speech-Recognition

based on NeMo

based on espnet

based on deepspeech.pytorch

differences to deepspeech.pytorch

results

Datasets

Librispeech