Issue with Handling `key_padding_mask` in ESM Modules

As we know, when training DiffSinger, zeros are padded at the end of the token sequence to meet the maximum frame requirement.

Therefore, the `fft block` requires the input of `key_padding_mask` to ignore the padded zeros.

My question is, does the ESM module also need to address this issue?

We understand that ESM learns the latent representations of different language arrangements in the token sequence. Could the untreated zeros in the padding negatively impact the performance?

@hualizhou167 @linyueqian 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with Handling `key_padding_mask` in ESM Modules #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with Handling key_padding_mask in ESM Modules #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Issue with Handling `key_padding_mask` in ESM Modules #7