Skip to content

Question Regarding the Rewards and Advantage handling in Mathmatics Reasoning #10

Description

@littlestone111

Hi authors,

Thank you for open-sourcing your code and for your interesting work on MORL! I am trying to replicate the experiment with two reward settings in the paper. However, it seems the shared code is mostly for RLLA. Could you please provide suggestions on where to find the mathematical reasoning training script and reward handling? I am particularly interested in how different-length rewards are handled.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions