-
Notifications
You must be signed in to change notification settings - Fork 43
Closed
Labels
good first issueGood for newcomersGood for newcomersoutreachyIssues targeted at Outreachy applicantsIssues targeted at Outreachy applicants
Description
Why it matters
Generalised advantage methods like PPO rely on a clipped value loss to keep critics stable. Fehu.Training.value_loss exposes a ~clip_range argument, but the implementation at fehu/lib/training.ml:86 just ignores it and falls back to plain MSE. Outreachy applicants working on policy-gradient baselines hit a silent footgun: the API promises clipping, yet nothing happens.
How to see the gap
Open fehu/lib/training.ml and inspect value_loss; the comment even says “interface may need adjustment”. Run dune runtest fehu and note that fehu/test/test_training.ml only checks the unclipped path—no scenario exercises clipping.
Your task
- Extend
value_lossto accept an optional~old_valuesarray. When~clip_rangeis provided you must also haveold_values; otherwise raiseInvalid_argumentwith a helpful message. - Implement the PPO-style clipping: compute
value_clipped = old_values + clamp(values - old_values, ±clip_range)and take the max between the unclipped and clipped squared error for each element. - Update the interface and documentation in
fehu/lib/training.mlito describe the new parameter and behaviour. - Enhance
fehu/test/test_training.mlwith a regression test that feeds in toyvalues,old_values,returns, andclip_range, then asserts the clipped result matches the hand-computed expectation. Keep the existing unclipped test passing.
Tips
- Use
Array.lengthchecks similar to the existing code to ensure all arrays align; fail fast if they don’t. - To keep numerical comparisons stable in the test, compare against a precomputed float value with
Alcotest.(check (float 1e-6)). - Document the requirement in the
.mlithat callers must passold_valueswhen using clipping so future users understand the contract.
Done when
value_lossclips correctly and raises if~clip_rangeis used without~old_values.- The new unit test proves the clipped branch works.
dune runtest fehucontinues to succeed.
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomersoutreachyIssues targeted at Outreachy applicantsIssues targeted at Outreachy applicants