Skip to content

Conversation

@delock
Copy link
Collaborator

@delock delock commented Oct 30, 2025

This PR allows seperate learning rate for muon and adam part of the Muon optimizer. Following up #7657

@PKUWZP PKUWZP self-requested a review October 30, 2025 14:10
if muon_params:
accepted_parameters = dict()
for key in ["lr", "momentum", "weight_decay"]:
for key in ["lr", "momentum", "weight_decay", "muon_lr"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the user need to specific "muon_lr" for their training jobs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not mandatory, 'lr' will apply to both muon and adam. However, 'muon_lr' will be used if present. Same for 'adam_lr'. Thinking them as advanced user setting.

@delock
Copy link
Collaborator Author

delock commented Nov 3, 2025

@PKUWZP @tjruwase I'll put this PR into auto-merge if everything looks fine.

@delock delock merged commit df59f20 into master Nov 11, 2025
12 checks passed
@delock delock deleted the gma/muon_sep_lr branch November 11, 2025 05:26
LckyLke pushed a commit to LckyLke/DeepSpeed that referenced this pull request Nov 11, 2025
…er (deepspeedai#7658)

This PR allows seperate learning rate for muon and adam part of the Muon
optimizer. Following up
deepspeedai#7657

Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Luke Friedrichs <[email protected]>
LckyLke pushed a commit to LckyLke/DeepSpeed that referenced this pull request Nov 11, 2025
…er (deepspeedai#7658)

This PR allows seperate learning rate for muon and adam part of the Muon
optimizer. Following up
deepspeedai#7657

Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Luke Friedrichs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants