QED-Nano is a compact 4B-parameter language model explicitly post-trained for Olympiad-level mathematical proof generation. By combining high-quality supervised fine-tuning with long-horizon reinforcement learning and structured test-time compute, QED-Nano significantly strengthens proof-writing capabilities in small models.
Despite its size, QED-Nano closes much of the gap to large generalist systems by learning to reason over long horizons and effectively utilize additional computation at inference time. The training pipeline emphasizes data quality, curriculum design, and reinforcement learning objectives that directly optimize for rigorous step-by-step proof correctness.
This repository contains the full training and evaluation stack for QED-Nano, including curated Olympiad proof datasets, supervised fine-tuning and reinforcement learning code, and benchmarking tools for proof and answer-based evaluations. The goal is to provide a reproducible framework for studying long-horizon mathematical reasoning in compact language models and for enabling future research on compute-efficient reasoning systems.
- Model: lm-provers/QED-Nano on Hugging Face
- Blog Post: QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
- SFT Data: lm-provers/FineProofs-SFT
- RL Data: lm-provers/FineProofs-RL
This repository contains the code and resources for training and evaluating QED-Nano:
data/- Data generation scripts and SLURM configurations for creating SFT and RL training datasetstraining/- Training code for supervised fine-tuning (SFT) and reinforcement learning (RL) with reasoning cacheeval/- Evaluation code for benchmarking models on IMOProofBench, IMOAnswerBench, and ProofBench
For instructions on generating dataset used in this codebase, see the data README
For instructions on training your own model using our codebase and datasets, see the training README.
For instructions on evaluating models on our benchmarks, see the evaluation README.
If you use QED-Nano in your research, please cite:
@misc{qednano2026,
title = {QED-Nano: Teaching a Tiny Model to Prove Hard Theorems},
author = {LM-Provers and Yuxiao Qu and Amrith Setlur and Jasper Dekoninck and Edward Beeching and Jia Li and Ian Wu and Lewis Tunstall and Aviral Kumar},
year = {2026},
howpublished = {https://huggingface.co/spaces/lm-provers/qed-nano-blogpost},
note = {Blog post}
}