-
Notifications
You must be signed in to change notification settings - Fork 343
Open
Description
Hello! Thanks a lot for your awesome work.
I'm trying to reproduce the results GRPO results using multi-GPU setup. I did not change any of your code and just ran the training using provided notebook. However, the model doesn't seem to learn anything -- there is no any reward improvement from near-zero level.
Here is the wandb run: https://wandb.ai/tarasovd/GRPO-Qwen-1.5-Instruct-Multi-GPU/runs/ukdas6bn
Do you have any ideas why this might happen? Were there any changes in the code after the launch you reported (sorry, it's hard to track changes for notebooks with git diff)?
My best guess is that libraries versions might be the issue and I couldn't find those in your repository. Could you please share those or say where they can be found?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels