According to recipes/OlympicCoder-7B/sft/config_v00.00.yaml, training the sft model with Qwen/Qwen2.5-Coder-7B-Instruct and open-r1/codeforces-cots directly leads to OlympicCoder-7B. Why was that?
Do you have a recipe for training the GRPO model of OlympicCoder? Could you upload it as well? What reward function did you use during the grpo training?