Skip to content

Conversation

@gustavorubim
Copy link

Introduce the CISPO loss type in GRPOConfig, enhancing the training process with truncated importance-sampling REINFORCE. Update documentation to explain the objective, usage, and related parameters, while adding logging metrics for better observability during training. This implementation supports the ScaleRL-inspired variant and improves the overall training experience.

Add CISPO (truncated importance-sampling REINFORCE) as a new loss_type in GRPOConfig and document the objective, usage, and citation. Implement docs describing the CISPO formulation, the cispo_clip_max parameter (default 5.0), and how to enable it via loss_type="cispo". Add related logging metrics (cispo/importance_ratio/{mean,truncated_mean,max,max_truncated} and cispo/clip_fraction) so importance ratios and truncation behavior can be monitored. Update GRPOConfig help text to mention CISPO and explain its behavior. This enables using the ScaleRL-inspired variant and improves observability during training.
Copy link
Collaborator

@pramodith pramodith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Could you also add a test case in test_grpo_trainer to train a tiny model with this loss_type?

},
)

cispo_clip_max: float = field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend re-using the epsilon-high config instead of creating a new one here.

if self.loss_type == "cispo":
cispo_cap = torch.full_like(coef_1, self.cispo_clip_max)
cispo_truncated_weights = torch.minimum(coef_1, cispo_cap)
cispo_clipped_mask = coef_1 > cispo_cap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified by:

cispo_weights = torch.where(coef_1 < self.epsilon_high, coef_1, self.epsilon_high).detach()

@pramodith
Copy link
Collaborator

@gustavorubim just wanted to check if you've had the chance to make the requested changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants