-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Hi, thanks a lot for this work and for open-sourcing the code!!
I’ve been trying to reproduce the results for a course project this semester and ran into an issue during evaluation.
What I did:
•Ran submit_rl_exp_s1 to train with RL using the base model Parallel-R1/Parallel-SFT-Unseen from Hugging Face.
• Evaluated the resulting checkpoint using verl/recipe/r1/run_r1_distill_qwen.sh.
• I also tried evaluating the released checkpoint Parallel-R1/Parallel-R1-Unseen_Step_200.
Issue:
• I noticed that I had to manually modify the evaluation prompt to match the Parallel Thinking prompt described in the report.
• Even after doing so, the evaluation outputs do not contain any parallel path tags, which I find unexpected (using run_r1_distill_qwen.sh )
Questions:
1. Are parallel path tags supposed to appear automatically during evaluation, or is there an additional configuration or prompt formatting step required?
2. Is there a specific evaluation script or flag needed to enable parallel path generation?
Any guidance on how to properly reproduce the parallel reasoning behavior and evaluation would be greatly appreciated.
Thank you!