Skip to content

improving of position embedding #2223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

pass-lin
Copy link
Contributor

from #2192

Removed the submission about the export of the repair roformer

@pass-lin
Copy link
Contributor Author

ruff.....................................................................Passed
ruff-format..............................................................Passed
Error: Process completed with exit code 1.

What is the cause of this error? I need help
@madhusshivakumar @divyashreepathihalli

@mattdangerw
Copy link
Member

@pass-lin try just rebase/syncing to latest master. Some tooling changes just went out here. #2222

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a question for now. What models use this?

@@ -61,6 +61,7 @@ def __init__(
self,
sequence_length,
initializer="glorot_uniform",
hierarchical_alpha=0.4,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need a docstring to add this. But before we do, is this used by more than one model architecture? If this is just used in one, we probably would prefer to keep the layer simple and just leave the customization per model arch.

@pass-lin
Copy link
Contributor Author

Thanks! Just a question for now. What models use this?

The main function of this feature is to non-destructively expand the max sequence of models such as BERT and ViT.
For example, in BERT, this function will only be activated when the sequence length is greater than 512.

@pass-lin
Copy link
Contributor Author

pass-lin commented Apr 26, 2025

@mattdangerw
At present, the following Keras_hub models can probably benefit from this feature.
bert roberta albert gpt2
Also, in the future, I plan to add ESM2.
The current maximum length of esm2 is only 1024, which is far from enough in bioinformatics.

@mattdangerw
Copy link
Member

@pass-lin how would you expect to use this feature with BERT currently? Seems tricky to actual use given that bert will be loaded form a config without this option.

@pass-lin
Copy link
Contributor Author

@pass-lin how would you expect to use this feature with BERT currently? Seems tricky to actual use given that bert will be loaded form a config without this option.

I hope it can be used without feeling anything. It can be activated when the length exceeds the preset maximum length.

If you think this is unnecessary, you can close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants