[NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.
reinforcement-learning code-generation post-training chain-of-thought llm-rlhf gemini-pro sft-data process-reward-model deepseek-r1 o3-mini clawdbot-skill
-
Updated
Sep 27, 2025 - Python