Esperanto STT

Using deepspeech/coqui ai and the common voice dataset

Tools/Iloj

eblaj datumfontoj

Datumaro	versio	grandeco	permesilo
Common Voice	CV Corpus 7.0	17 GB 748 h	CC 0
tatoeba	03.06.20	4 063 audio files	CC-BY
lingualibre	03.06.20	425 MB	CC BY-SA

datumaro	parametroj	GPU	rezultoj
eo_41h_2019-12-10	?	2 x 1080 Ti 32Gb RAM (leadertelecom)	WER 0.5
eo_844h_2021-07-21	english checkpoints, n_depth 2048, dropout_rate 0.3, learning_rate 0.0001 details	Google Colab Pro Plus	WER 24,7% (test was part of train dataset) download

There is an Esperanto Vosk Model that can be used in many tools such as Kdenlive to create subtitles: https://alphacephei.com/vosk/models

run ssh process in background next time (background + disown process)
experiment with different data tables e.g. ignore sentences with one no-vote
extract the Tatoeba corpus with the script from https://github.com/DanBmh/deepspeech-german
Extract lingua libre files https://github.com/mozilla/DeepSpeech/blob/master/bin/import_lingua_libre.py
create kenlm language model (scorer). https://tiefenauer.github.io/blog/wiki-n-gram-lm/ https://github.com/kpu/kenlm

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
colab-notebooks		colab-notebooks
deepspeech-coqui		deepspeech-coqui
scorer		scorer
vosk		vosk
README.md		README.md