Recommended parameters for low resource language #1875

spacetronics · 2025-02-09T17:36:20Z

Hi, i'm trying to train a streaming zipformer transducer model for Indonesian language. I use Colab to train and i need help on how should i train this model. There's 4 datasets that i'll be using:

Common Voice 17 (7h train, 4h dev, 4h test)
Google Fleurs ID (9h train, 1h dev, 2h test)
Librivox Indonesia (6h train, 1h test)
TITML IDN (14h train)
totalling 37 hours of training data, 5 hours of validation data, and 7 hours of test data.

I normalized the text and prepared all of the manifests for the cuts and combining the cuts by concatenate the datasets. I also prepared the cuts for MUSAN and compute the features of the cuts. So the next step is training the model. I customized pruned_transducer_stateless7_streaming Librispeech script a little bit to train my model by modifying the LibriSpeechAsrModule class and the dataloader.

I use SimpleCutSampler shuffled and the parameters of the model and training arguments are all default parameters, except:

base-lr = 0.025,
use_fp16 = True
max_duration = 60
Here are the parameters:

2025-02-09 09:21:20,721 INFO [train.py:961] Training started
2025-02-09 09:21:20,730 INFO [train.py:971] Device: cuda:0
2025-02-09 09:21:20,739 INFO [train.py:980] {
  "am_scale": 0.0,
  "attention_dims": "192,192,192,192,192",
  "average_period": 200,
  "base_lr": 0.025,
  "batch_idx_train": 0,
  "best_train_epoch": -1,
  "best_train_loss": Infinity,
  "best_valid_epoch": -1,
  "best_valid_loss": Infinity,
  "blank_id": 0,
  "bpe_model": "/content/drive/MyDrive/AI/lang_bpe_500/bpe.model",
  "bucketing_sampler": false,
  "cnn_module_kernels": "31,31,31,31,31",
  "concatenate_cuts": false,
  "context_size": 2,
  "decode_chunk_len": 32,
  "decoder_dim": 512,
  "drop_last": true,
  "duration_factor": 1.0,
  "enable_musan": true,
  "enable_spec_aug": true,
  "encoder_dims": "384,384,384,384,384",
  "encoder_unmasked_dims": "256,256,256,256,256",
  "env_info": {
    ...
  },
  "exp_dir": "/content/drive/MyDrive/pruned_transducer_stateless7_streaming/exp",
  "feature_dim": 80,
  "feedforward_dims": "1024,1024,2048,2048,1024",
  "full_libri": false,
  "gap": 1.0,
  "inf_check": false,
  "input_strategy": "PrecomputedFeatures",
  "joiner_dim": 512,
  "keep_last_k": 30,
  "lm_scale": 0.25,
  "log_interval": 50,
  "lr_batches": 5000,
  "lr_epochs": 3.5,
  "manifest_dir": "/content/drive/MyDrive/AI/fbank",
  "master_port": 12354,
  "max_duration": 60,
  "mini_libri": false,
  "nhead": "8,8,8,8,8",
  "num_buckets": 30,
  "num_encoder_layers": "2,4,3,2,4",
  "num_epochs": 30,
  "num_left_chunks": 4,
  "num_workers": 2,
  "on_the_fly_feats": false,
  "print_diagnostics": false,
  "prune_range": 5,
  "reset_interval": 200,
  "return_cuts": true,
  "save_every_n": 2000,
  "seed": 42,
  "short_chunk_size": 50,
  "shuffle": true,
  "simple_loss_scale": 0.5,
  "spec_aug_time_warp_factor": 80,
  "start_batch": 60000,
  "start_epoch": 1,
  "subsampling_factor": 4,
  "tensorboard": true,
  "use_fp16": true,
  "valid_interval": 1600,
  "vocab_size": 500,
  "warm_step": 2000,
  "world_size": 1,
  "zipformer_downsampling_factors": "1,2,4,8,2"
}

The training script works perfect, but i don't really know if this is the best parameter for the model:

My validation loss is way higher than my training loss, is that because i have small dev data? I decode the model and it got WER of 39% which is not great. Also i noticed after training that the "zipformer" model was the latest zipformer and this pruned transducer model is the old one, should i change to new one and what parameters should i change for the new model?

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended parameters for low resource language #1875

Recommended parameters for low resource language #1875

spacetronics commented Feb 9, 2025 •

edited

Loading

Recommended parameters for low resource language #1875

Recommended parameters for low resource language #1875

Comments

spacetronics commented Feb 9, 2025 • edited Loading

spacetronics commented Feb 9, 2025 •

edited

Loading