ASR low‑resource language training #2757

libby-97 · 2026-04-02T04:01:35Z

libby-97
Apr 2, 2026

I’d like to inquire whether current ASR models support fine-tuning to improve recognition accuracy for an existing low-resource language, specifically Dutch.
If such fine-tuning is feasible, could you kindly provide a general list of the datasets that have already been used for training? This would help us avoid potential data duplication during subsequent training for Dutch and mitigate issues such as overfitting or catastrophic forgetting.
Thanks！

MukundaKatta · 2026-04-21T17:55:01Z

MukundaKatta
Apr 21, 2026

Fine-tuning Whisper on Dutch is very doable — mozilla-foundation/common_voice_17_0 has solid Dutch coverage and was used in the original training, so mixing it with domain-specific recordings is the standard approach. Hugging Face's whisper-fine-tune blog post walks the full loop. For catastrophic forgetting, freeze the encoder and LoRA-fine-tune only the decoder's cross-attention layers — preserves multilingual capability while adapting to Dutch.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR low‑resource language training #2757

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ASR low‑resource language training #2757

Uh oh!

libby-97 Apr 2, 2026

Replies: 1 comment

Uh oh!

MukundaKatta Apr 21, 2026

libby-97
Apr 2, 2026

MukundaKatta
Apr 21, 2026