Potentiel issue excluding silent speaker #3

Zenglinxiao · 2021-09-17T13:00:48Z

Hello there,

Thanks for your efforts in open-sourcing the code, it's vital for us trying to reproduce the result presented in the paper.

Problem

But I've come across a RuntimeError when adapting the model with our private data which shows:

/*/EEND-vector-clustering/eend/pytorch_backend/train.py:186: RuntimeWarning: invalid value encountered in true_divide
  fet_arr[spk] = org / norm
...
Traceback (most recent call last):
...
RuntimeError: The loss (nan) is not finite.

Detail

After some debugging, I found the problem actually happens during the backpropagation step when there exists an entry left with zeros in the embedding layer:

EEND-vector-clustering/eend/pytorch_backend/train.py

Lines 173 to 186 in b3649ee

    
           fet_arr = np.zeros([spk_num, fet_dim]) 
        
           # sum 
        
           bs = spklabs.shape[0] 
        
           for i in range(bs): 
        
               if spkidx_tbl[spklabs[i]] == -1: 
        
                   raise ValueError(spklabs[i]) 
        
               fet_arr[spkidx_tbl[spklabs[i]]] += spkvecs[i] 
        
           # normalize 
        
           for spk in range(spk_num): 
        
               org = fet_arr[spk] 
        
               norm = np.linalg.norm(org, ord=2) 
        
               fet_arr[spk] = org / norm

Since the embeddings are actually loaded from the dumped speaker embeddings generated by the save_spkv_lab.py script when adapting the model, I suspect there might exist some issue in the save_spkv_lab function.

After some careful step-by-step checking with pdb, I found there is actually some silent speaker label added in the all_labels variable when dumping the speaker embeddings:

EEND-vector-clustering/eend/pytorch_backend/infer.py

Lines 349 to 355 in b3649ee

    
           for i in range(args.num_speakers): 
        
               # Exclude samples corresponding to silent speaker 
        
               if torch.sum(t_chunked_t[sigma[i]]) > 0: 
        
                   vec = outputs[i+1][0].cpu().detach().numpy() 
        
                   lab = chunk_data[2][sigma[i]] 
        
                   all_outputs.append(vec) 
        
                   all_labels.append(lab)

Even when if torch.sum(t_chunked_t[sigma[i]]) > 0, lab can still be -1 which is considered as silent speaker acroding to code in:

EEND-vector-clustering/eend/pytorch_backend/diarization_dataset.py

Lines 94 to 99 in b3649ee

    
           S_arr = -1 * np.ones(n_speakers).astype(np.int64) 
        
           for seg in filtered_segments: 
        
               speaker_index = speakers.index(self.data.utt2spk[seg['utt']]) 
        
               all_speaker_index = self.all_speakers.index( 
        
                   self.data.utt2spk[seg['utt']]) 
        
               S_arr[speaker_index] = all_speaker_index

. (This is where makes me feels confused since it should not happen as both lab and T/t_chunked produced with info from kaldi_obj.utt2spk)

Since these silent speaker labels are -1 and the python list support negative indexing, this issue is silently ignored when dumping the embedding but will cause Exceptions when training begins.

Question

I could simply fix this issue by adding speaker label to all_labels only if lab < 0 when saving speaker embeddings and the followed training process could continue smoothly resulting in a good performing model.

But before opening any PR, I would like to know if you guys have ever come across such an issue or do you have any idea on why this will happen.

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentiel issue excluding silent speaker #3

Potentiel issue excluding silent speaker #3

Zenglinxiao commented Sep 17, 2021

Potentiel issue excluding silent speaker #3

Potentiel issue excluding silent speaker #3

Comments

Zenglinxiao commented Sep 17, 2021

Problem

Detail

Question