Skip to content

Issue with Handling key_padding_mask in ESM Modules #7

@KakaruHayate

Description

@KakaruHayate

As we know, when training DiffSinger, zeros are padded at the end of the token sequence to meet the maximum frame requirement.

Therefore, the fft block requires the input of key_padding_mask to ignore the padded zeros.

My question is, does the ESM module also need to address this issue?

We understand that ESM learns the latent representations of different language arrangements in the token sequence. Could the untreated zeros in the padding negatively impact the performance?

@hualizhou167 @linyueqian

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions