Question Regarding the Rewards and Advantage handling in Mathmatics Reasoning

Hi authors,

Thank you for open-sourcing your code and for your interesting work on MORL! I am trying to replicate the experiment with two reward settings in the paper. However, it seems the shared code is mostly for RLLA. Could you please provide suggestions on where to find the mathematical reasoning training script and reward handling? I am particularly interested in how different-length rewards are handled.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question Regarding the Rewards and Advantage handling in Mathmatics Reasoning #10

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question Regarding the Rewards and Advantage handling in Mathmatics Reasoning #10

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions