Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended parameters for low resource language #1875

Open
spacetronics opened this issue Feb 9, 2025 · 0 comments
Open

Recommended parameters for low resource language #1875

spacetronics opened this issue Feb 9, 2025 · 0 comments

Comments

@spacetronics
Copy link

spacetronics commented Feb 9, 2025

Hi, i'm trying to train a streaming zipformer transducer model for Indonesian language. I use Colab to train and i need help on how should i train this model. There's 4 datasets that i'll be using:

  1. Common Voice 17 (7h train, 4h dev, 4h test)
  2. Google Fleurs ID (9h train, 1h dev, 2h test)
  3. Librivox Indonesia (6h train, 1h test)
  4. TITML IDN (14h train)
    totalling 37 hours of training data, 5 hours of validation data, and 7 hours of test data.

I normalized the text and prepared all of the manifests for the cuts and combining the cuts by concatenate the datasets. I also prepared the cuts for MUSAN and compute the features of the cuts. So the next step is training the model. I customized pruned_transducer_stateless7_streaming Librispeech script a little bit to train my model by modifying the LibriSpeechAsrModule class and the dataloader.

I use SimpleCutSampler shuffled and the parameters of the model and training arguments are all default parameters, except:

  1. base-lr = 0.025,
  2. use_fp16 = True
  3. max_duration = 60
    Here are the parameters:
2025-02-09 09:21:20,721 INFO [train.py:961] Training started
2025-02-09 09:21:20,730 INFO [train.py:971] Device: cuda:0
2025-02-09 09:21:20,739 INFO [train.py:980] {
  "am_scale": 0.0,
  "attention_dims": "192,192,192,192,192",
  "average_period": 200,
  "base_lr": 0.025,
  "batch_idx_train": 0,
  "best_train_epoch": -1,
  "best_train_loss": Infinity,
  "best_valid_epoch": -1,
  "best_valid_loss": Infinity,
  "blank_id": 0,
  "bpe_model": "/content/drive/MyDrive/AI/lang_bpe_500/bpe.model",
  "bucketing_sampler": false,
  "cnn_module_kernels": "31,31,31,31,31",
  "concatenate_cuts": false,
  "context_size": 2,
  "decode_chunk_len": 32,
  "decoder_dim": 512,
  "drop_last": true,
  "duration_factor": 1.0,
  "enable_musan": true,
  "enable_spec_aug": true,
  "encoder_dims": "384,384,384,384,384",
  "encoder_unmasked_dims": "256,256,256,256,256",
  "env_info": {
    ...
  },
  "exp_dir": "/content/drive/MyDrive/pruned_transducer_stateless7_streaming/exp",
  "feature_dim": 80,
  "feedforward_dims": "1024,1024,2048,2048,1024",
  "full_libri": false,
  "gap": 1.0,
  "inf_check": false,
  "input_strategy": "PrecomputedFeatures",
  "joiner_dim": 512,
  "keep_last_k": 30,
  "lm_scale": 0.25,
  "log_interval": 50,
  "lr_batches": 5000,
  "lr_epochs": 3.5,
  "manifest_dir": "/content/drive/MyDrive/AI/fbank",
  "master_port": 12354,
  "max_duration": 60,
  "mini_libri": false,
  "nhead": "8,8,8,8,8",
  "num_buckets": 30,
  "num_encoder_layers": "2,4,3,2,4",
  "num_epochs": 30,
  "num_left_chunks": 4,
  "num_workers": 2,
  "on_the_fly_feats": false,
  "print_diagnostics": false,
  "prune_range": 5,
  "reset_interval": 200,
  "return_cuts": true,
  "save_every_n": 2000,
  "seed": 42,
  "short_chunk_size": 50,
  "shuffle": true,
  "simple_loss_scale": 0.5,
  "spec_aug_time_warp_factor": 80,
  "start_batch": 60000,
  "start_epoch": 1,
  "subsampling_factor": 4,
  "tensorboard": true,
  "use_fp16": true,
  "valid_interval": 1600,
  "vocab_size": 500,
  "warm_step": 2000,
  "world_size": 1,
  "zipformer_downsampling_factors": "1,2,4,8,2"
}

The training script works perfect, but i don't really know if this is the best parameter for the model:

Image

Image

Image

My validation loss is way higher than my training loss, is that because i have small dev data? I decode the model and it got WER of 39% which is not great. Also i noticed after training that the "zipformer" model was the latest zipformer and this pruned transducer model is the old one, should i change to new one and what parameters should i change for the new model?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant