-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Deepspeed Ulysses/ALST integration #3817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this is already looking quite nice ! Left some minor comments. Please ping me when you have finished the integration !
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Ulysses/ALST integration with HF Accelerate: - Allow `UlyssesSPAttentionHF.register_with_transformers` to get a `model` obj as an argument, to match HF accelerate's workflow - Fix existing Ulysses' tests to tests z2 instead of z1 - Improve documentation - Add a defensive check The HF Accelerate PR that depends on this PR is here huggingface/accelerate#3817 --------- Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
So we just have a few conversations above to complete and otherwise I'm just waiting for the deepspeed to make a new version so that we could anchor on it here. Otherwise it's ready for your complete review - but don't merge just yet until we get the new ds version here. And then we can discuss the HF Trainer integration. Should we somehow mark this API as experimental to let users use it for a bit and possibly adjust things? If so please give me an example to follow. |
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
There are some e2e examples in the example/torch_native_parallelism folder but we are not running them in the CI.
Let's try to integrate it into HF Trainer before merging this PR. Once it is tightly coupled to Trainer, even if the API is marked as experimental, we will most likely try to limit breaking changes. For experimental features, we just put it on the docs, like for big model inference (probably need to remove the warning for this feature) |
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the docs and the tests, this looks really nice. Just some minor nits
Co-authored-by: Marc Sun <[email protected]>
|
Taking this out of code comments so that it doesn't disappear with 'Resolve conversation'
Those features belong to
|
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
… into alst-integration
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
Signed-off-by: Stas Bekman <[email protected]>
src/accelerate/test_utils/scripts/external_deps/test_ds_alst_ulysses_sp.py
Outdated
Show resolved
Hide resolved
|
Can we merge this PR @stas00 ? |
|
Thank you for further improvements/fixes, Kashif Yes, let's merge it. Thank you. |
|
Merging this PR seems to have invalidated the original cp implement. accelerate/src/accelerate/accelerator.py Line 1639 in d1c96ba
However, parallelism_config.sp_backend can only be set to "deepspeed"
|
|
@egangu so for |
You're right. But what I mean is that the original CP cannot be enabled on the current version, no matter how user set the |
|
thanks for the report @egangu let me test and fix |
This is the completion of the work started by @S1ro1 at #3782 to integrate the ALST/Ulysses long sequence training into HF Accelerate. Paper https://arxiv.org/abs/2506.13996. This is Matej's original code with lots of additional work on top and docs+tests from me.
Here is the corresponding HF Trainer integration PR: huggingface/transformers#41832
If you want to try it out please first install deepspeed from
deepspeed@masteras deepspeed needed some tweaks to make this integration work.To use this feature a user needs
ParallelismConfigshift_labelsand an aggregation of loss across ranksQuality validation
I wrote 3 accelerate-based scripts (attached at the end of the OP):
The loss checks out with very small variations due the precision loss in aggregation.
TODO
These are really needed for the HF Trainer PR huggingface/transformers#41832 but since it anchors on accelerate let's make the dependency here instead.
Scripts used to perform the quality validation
The scripts and config files are:
You run the
.shfilescc: @SunMarc