Skip to content

Conversation

@bohdan-nd
Copy link

Summary

Refactors ReinforceLoss to improve code clarity and correctness.

Specifically:

  • Renamed variables to follow standard RL terminology
  • Fixed normalization: normalize by sequence length instead of total batch tokens
  • Cleaned the code
  • Made importance sampling optional and moved configuration to the constructor
  • Reorder func parameters to match SimpleGRPOLoss

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant