Skip to content

Conversation

@kashif
Copy link
Contributor

@kashif kashif commented Nov 27, 2025

What does this PR do?

fix the SP loss example based on huggingface/transformers#42444

add a fix for CP as reported by #3817 (comment) by @egangu

Fixes #3856

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kashif kashif requested a review from SunMarc November 27, 2025 16:03
@kashif kashif added the bug Something isn't working label Nov 27, 2025
@kashif
Copy link
Contributor Author

kashif commented Nov 27, 2025

will need a patch for this i believe

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix !

@SunMarc SunMarc merged commit b923f65 into huggingface:main Nov 28, 2025
22 of 25 checks passed
@kashif kashif deleted the fix-sp-docs branch November 28, 2025 10:43
@SunMarc
Copy link
Member

SunMarc commented Nov 28, 2025

Indeed, I will do a patch early next week if this is fine !

Comment on lines -1639 to -1640
if self.parallelism_config.sp_backend == "deepspeed":
# deepspeed handles cp in a different way, configured in _prepare_deepspeed
Copy link
Member

@SunMarc SunMarc Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepare_cp is only called when cp_enabled no ? this shouldn't impact I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

context parallel _cp_context not created

3 participants