Hi authors,
Thank you for open-sourcing your code and for your interesting work on MORL! I am trying to replicate the experiment with two reward settings in the paper. However, it seems the shared code is mostly for RLLA. Could you please provide suggestions on where to find the mathematical reasoning training script and reward handling? I am particularly interested in how different-length rewards are handled.
Thank you!
Hi authors,
Thank you for open-sourcing your code and for your interesting work on MORL! I am trying to replicate the experiment with two reward settings in the paper. However, it seems the shared code is mostly for RLLA. Could you please provide suggestions on where to find the mathematical reasoning training script and reward handling? I am particularly interested in how different-length rewards are handled.
Thank you!