Skip to content

Finish clipped value loss support in Fehu.Training #107

@tmattio

Description

@tmattio

Why it matters

Generalised advantage methods like PPO rely on a clipped value loss to keep critics stable. Fehu.Training.value_loss exposes a ~clip_range argument, but the implementation at fehu/lib/training.ml:86 just ignores it and falls back to plain MSE. Outreachy applicants working on policy-gradient baselines hit a silent footgun: the API promises clipping, yet nothing happens.

How to see the gap

Open fehu/lib/training.ml and inspect value_loss; the comment even says “interface may need adjustment”. Run dune runtest fehu and note that fehu/test/test_training.ml only checks the unclipped path—no scenario exercises clipping.

Your task

  • Extend value_loss to accept an optional ~old_values array. When ~clip_range is provided you must also have old_values; otherwise raise Invalid_argument with a helpful message.
  • Implement the PPO-style clipping: compute value_clipped = old_values + clamp(values - old_values, ±clip_range) and take the max between the unclipped and clipped squared error for each element.
  • Update the interface and documentation in fehu/lib/training.mli to describe the new parameter and behaviour.
  • Enhance fehu/test/test_training.ml with a regression test that feeds in toy values, old_values, returns, and clip_range, then asserts the clipped result matches the hand-computed expectation. Keep the existing unclipped test passing.

Tips

  • Use Array.length checks similar to the existing code to ensure all arrays align; fail fast if they don’t.
  • To keep numerical comparisons stable in the test, compare against a precomputed float value with Alcotest.(check (float 1e-6)).
  • Document the requirement in the .mli that callers must pass old_values when using clipping so future users understand the contract.

Done when

  • value_loss clips correctly and raises if ~clip_range is used without ~old_values.
  • The new unit test proves the clipped branch works.
  • dune runtest fehu continues to succeed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueGood for newcomersoutreachyIssues targeted at Outreachy applicants

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions