Skip to content

GRPO multi-GPU reproduction issue #9

@DT6A

Description

@DT6A

Hello! Thanks a lot for your awesome work.

I'm trying to reproduce the results GRPO results using multi-GPU setup. I did not change any of your code and just ran the training using provided notebook. However, the model doesn't seem to learn anything -- there is no any reward improvement from near-zero level.

Here is the wandb run: https://wandb.ai/tarasovd/GRPO-Qwen-1.5-Instruct-Multi-GPU/runs/ukdas6bn

Do you have any ideas why this might happen? Were there any changes in the code after the launch you reported (sorry, it's hard to track changes for notebooks with git diff)?

My best guess is that libraries versions might be the issue and I couldn't find those in your repository. Could you please share those or say where they can be found?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions