Hello, thank you for your great work.
Why I can't see the implementation of alpha in the visual-rft code? Looking at this line of code, does the GPG version of Visual-RFT just remove the KL divergence?
https://github.com/AMAP-ML/GPG/blob/main/Visual-RFT/src/virft/src/open_r1/trainer/grpo_trainer.py#L493
Hello, thank you for your great work.
Why I can't see the implementation of alpha in the visual-rft code? Looking at this line of code, does the GPG version of Visual-RFT just remove the KL divergence?
https://github.com/AMAP-ML/GPG/blob/main/Visual-RFT/src/virft/src/open_r1/trainer/grpo_trainer.py#L493