Skip to content

CMU-AIRe/QED-Nano

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QED Nano
QED-Nano

Build License: Apache 2.0 Hugging Face

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

QED-Nano is a compact 4B-parameter language model explicitly post-trained for Olympiad-level mathematical proof generation. By combining high-quality supervised fine-tuning with long-horizon reinforcement learning and structured test-time compute, QED-Nano significantly strengthens proof-writing capabilities in small models.

Despite its size, QED-Nano closes much of the gap to large generalist systems by learning to reason over long horizons and effectively utilize additional computation at inference time. The training pipeline emphasizes data quality, curriculum design, and reinforcement learning objectives that directly optimize for rigorous step-by-step proof correctness.

This repository contains the full training and evaluation stack for QED-Nano, including curated Olympiad proof datasets, supervised fine-tuning and reinforcement learning code, and benchmarking tools for proof and answer-based evaluations. The goal is to provide a reproducible framework for studying long-horizon mathematical reasoning in compact language models and for enabling future research on compute-efficient reasoning systems.

🚀 Quick Links

📦 Training Data

📂 Repository Structure

This repository contains the code and resources for training and evaluating QED-Nano:

  • data/ - Data generation scripts and SLURM configurations for creating SFT and RL training datasets
  • training/ - Training code for supervised fine-tuning (SFT) and reinforcement learning (RL) with reasoning cache
  • eval/ - Evaluation code for benchmarking models on IMOProofBench, IMOAnswerBench, and ProofBench

🏃 Getting Started

Data Generation

For instructions on generating dataset used in this codebase, see the data README

Training

For instructions on training your own model using our codebase and datasets, see the training README.

Evaluation

For instructions on evaluating models on our benchmarks, see the evaluation README.

📊 Citation

If you use QED-Nano in your research, please cite:

@misc{qednano2026,
  title        = {QED-Nano: Teaching a Tiny Model to Prove Hard Theorems},
  author       = {LM-Provers and Yuxiao Qu and Amrith Setlur and Jasper Dekoninck and Edward Beeching and Jia Li and Ian Wu and Lewis Tunstall and Aviral Kumar},
  year         = {2026},
  howpublished = {https://huggingface.co/spaces/lm-provers/qed-nano-blogpost},
  note         = {Blog post}
}