Hi, I wanna ask how to fintune the pretrained speaker encoder with mandarin dataset?

Thanks for your great job! I haven't worked on VC task very long, and I find that the generalization ability of speaker encoder is quite important for voice cloning between the unseen speakers. So could you please teach me how to finetune the pretrained speaker encoder you provided? Any technical repo or  command is ok.