Skip to content

Default gradient_clipping to 1.0#8068

Merged
sfc-gh-truwase merged 4 commits into
masterfrom
grad-clip-default-1.0
Jun 23, 2026
Merged

Default gradient_clipping to 1.0#8068
sfc-gh-truwase merged 4 commits into
masterfrom
grad-clip-default-1.0

Conversation

@sfc-gh-truwase

Copy link
Copy Markdown
Collaborator

Summary

  • Change GRADIENT_CLIPPING_DEFAULT from 0. (disabled) to 1.0.

Motivation

With the old default, configs that omit gradient_clipping run unclipped. Most RL/LLM training (and the FSDP2 reference) clip at 1.0; this avoids silently-unclipped runs. Isolated into its own PR since it is a default behavior change.

Test plan

  • Init without gradient_clipping -> effective clip norm is 1.0.
  • Explicit gradient_clipping: 0.0 still disables clipping (override respected).

Made with Cursor

@tohtana tohtana left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Tunji for the update. I agree that we should enable gradient clipping by default. I left a few comments.

Comment thread deepspeed/runtime/constants.py Outdated
Comment thread deepspeed/runtime/constants.py
@sfc-gh-truwase sfc-gh-truwase requested a review from loadams as a code owner June 23, 2026 14:15
sfc-gh-truwase and others added 4 commits June 23, 2026 14:19
Change GRADIENT_CLIPPING_DEFAULT from 0. (disabled) to 1.0 so configs that omit
the key clip at 1.0 by default, matching common RL/LLM training and the FSDP2
reference. Explicit "gradient_clipping": 0.0 still disables clipping.

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
CPU tests cover config default and explicit 0.0 override. GPU e2e tests verify
deepspeed.initialize surfaces clip norm 1.0 when omitted and 0.0 when set.

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
@sfc-gh-truwase sfc-gh-truwase force-pushed the grad-clip-default-1.0 branch from 53ecde5 to 3cd7246 Compare June 23, 2026 14:19
@PKUWZP PKUWZP self-requested a review June 23, 2026 14:50
@PKUWZP

PKUWZP commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

@sfc-gh-truwase Thanks for the fix Tunji! Can I know why we want to hard-code gradient clipping to be 1.0?

@sfc-gh-truwase

Copy link
Copy Markdown
Collaborator Author

@sfc-gh-truwase Thanks for the fix Tunji! Can I know why we want to hard-code gradient clipping to be 1.0?

This seems to be standard expectation from modeling folks that I have interacted with. It is also parity with FSDP and Megatron-LM

@tohtana tohtana left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update!

@sfc-gh-truwase sfc-gh-truwase added this pull request to the merge queue Jun 23, 2026
Merged via the queue into master with commit 4421665 Jun 23, 2026
15 checks passed
@sfc-gh-truwase sfc-gh-truwase deleted the grad-clip-default-1.0 branch June 23, 2026 17:58
xylian86 pushed a commit to xylian86/DeepSpeed that referenced this pull request Jun 25, 2026
## Summary
- Change `GRADIENT_CLIPPING_DEFAULT` from `0.` (disabled) to `1.0`.

## Motivation
With the old default, configs that omit `gradient_clipping` run
unclipped. Most RL/LLM training (and the FSDP2 reference) clip at `1.0`;
this avoids silently-unclipped runs. Isolated into its own PR since it
is a default behavior change.

## Test plan
- [ ] Init without `gradient_clipping` -> effective clip norm is `1.0`.
- [ ] Explicit `gradient_clipping: 0.0` still disables clipping
(override respected).

Made with [Cursor](https://cursor.com)

---------

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants