Skip to content

Potentiel issue excluding silent speaker #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Zenglinxiao opened this issue Sep 17, 2021 · 0 comments
Open

Potentiel issue excluding silent speaker #3

Zenglinxiao opened this issue Sep 17, 2021 · 0 comments

Comments

@Zenglinxiao
Copy link

Hello there,

Thanks for your efforts in open-sourcing the code, it's vital for us trying to reproduce the result presented in the paper.

Problem

But I've come across a RuntimeError when adapting the model with our private data which shows:

/*/EEND-vector-clustering/eend/pytorch_backend/train.py:186: RuntimeWarning: invalid value encountered in true_divide
  fet_arr[spk] = org / norm
...
Traceback (most recent call last):
...
RuntimeError: The loss (nan) is not finite.

Detail

After some debugging, I found the problem actually happens during the backpropagation step when there exists an entry left with zeros in the embedding layer:

fet_arr = np.zeros([spk_num, fet_dim])
# sum
bs = spklabs.shape[0]
for i in range(bs):
if spkidx_tbl[spklabs[i]] == -1:
raise ValueError(spklabs[i])
fet_arr[spkidx_tbl[spklabs[i]]] += spkvecs[i]
# normalize
for spk in range(spk_num):
org = fet_arr[spk]
norm = np.linalg.norm(org, ord=2)
fet_arr[spk] = org / norm

Since the embeddings are actually loaded from the dumped speaker embeddings generated by the save_spkv_lab.py script when adapting the model, I suspect there might exist some issue in the save_spkv_lab function.

After some careful step-by-step checking with pdb, I found there is actually some silent speaker label added in the all_labels variable when dumping the speaker embeddings:

for i in range(args.num_speakers):
# Exclude samples corresponding to silent speaker
if torch.sum(t_chunked_t[sigma[i]]) > 0:
vec = outputs[i+1][0].cpu().detach().numpy()
lab = chunk_data[2][sigma[i]]
all_outputs.append(vec)
all_labels.append(lab)

Even when if torch.sum(t_chunked_t[sigma[i]]) > 0, lab can still be -1 which is considered as silent speaker acroding to code in:

S_arr = -1 * np.ones(n_speakers).astype(np.int64)
for seg in filtered_segments:
speaker_index = speakers.index(self.data.utt2spk[seg['utt']])
all_speaker_index = self.all_speakers.index(
self.data.utt2spk[seg['utt']])
S_arr[speaker_index] = all_speaker_index
. (This is where makes me feels confused since it should not happen as both lab and T/t_chunked produced with info from kaldi_obj.utt2spk)

Since these silent speaker labels are -1 and the python list support negative indexing, this issue is silently ignored when dumping the embedding but will cause Exceptions when training begins.

Question

I could simply fix this issue by adding speaker label to all_labels only if lab < 0 when saving speaker embeddings and the followed training process could continue smoothly resulting in a good performing model.

But before opening any PR, I would like to know if you guys have ever come across such an issue or do you have any idea on why this will happen.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant