Agent-R1

Training Powerful LLM Agents with End-to-End Reinforcement Learning

News

[2026.03.23] Agent-R1 v0.1.0 marks the first official version of the project. It introduces a fully refactored codebase, the Step-level MDP foundation, and new Layered Abstractions. The previous version has been archived to the legacy branch.
[2026.03.04] We've launched Claw-R1, a more advanced framework designed to empower General Agents (OpenClaw etc.) with Agentic RL through a Middleware design. Check it out at AgentR1/Claw-R1.

Overview

Agent-R1 is an open-source framework for training powerful language agents with end-to-end reinforcement learning. It is designed for multi-step agent tasks, where the model interacts with environments and tools across multiple rounds instead of producing a single final answer.

The core idea behind Agent-R1 is Step-level MDP: each interaction step is treated as a proper RL transition, with an environment-defined state, an LLM action, and the next observation produced by the environment. This replaces the usual "append everything into one ever-growing token sequence" view with a more principled and more flexible training abstraction.

With Agent-R1, you can build custom agent workflows, define interactive environments and tools, and train multi-step agents in a unified RL pipeline.

Also check out Awesome-Agent-RL: Our curated collection of papers and resources on unlocking the potential of Agents through Reinforcement Learning.

Why Agent-R1 v0.1.0

Agent-R1 v0.1.0 is the first official release of the new architecture. It is built to address two common failure modes in RL training for LLM agents:

Retokenization drift in text-based pipelines: if rollout data is collected as text and later tokenized again for training, the Token -> Text -> Token conversion is not reversible.
Rigid token-only trajectory construction: if the whole interaction is represented as a single growing token list, context handling becomes hard-wired to simple append-only logic.

Agent-R1 addresses these issues with a step-level trajectory representation:

each step stores its own prompt and response
the environment, not raw token concatenation, controls the next observation
context can be truncated, summarized, rewritten, or augmented between steps
standard RL loops such as obs -> action -> step -> next_obs map naturally onto agent training

This makes Agent-R1 a better fit for real multi-step agent tasks with tool use, environment feedback, and flexible context management.

Version Guide

The default main branch contains the new v0.1.0 architecture based on Step-level MDP and Layered Abstractions.
The previous implementation is preserved in the legacy branch for reference.
The current version uses the same runtime environment as verl and requires verl==0.7.0.

Getting Started

Agent-R1 uses the same environment setup as verl, and the current version requires verl==0.7.0. You only need to clone this repository; there is no separate Agent-R1 installation step.

The recommended path is:

Read the Getting Started page for the minimal setup flow.
Use examples/data_preprocess/gsm8k.py and examples/run_qwen2.5-3b.sh as a sanity check that the environment is wired correctly.
Move to the Agent Task Tutorial for the main Agent-R1 workflow based on multi-step interaction and tool use.

Stage 1: Sanity Check the Base Training Stack

Prepare a minimal GSM8K dataset and run the single-step script:

python3 examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k
bash examples/run_qwen2.5-3b.sh

This stage is only a setup check. It helps confirm that your environment, model path, dataset path, and training stack are wired correctly.

Stage 2: Run the Main Agent-R1 Workflow

Prepare the tool-augmented dataset and launch the multi-step agent training script:

python3 examples/data_preprocess/gsm8k_tool.py --local_save_dir ~/data/gsm8k_tool
bash examples/run_qwen3-4b_gsm8k_tool.sh

This is the main Agent-R1 path, where AgentEnvLoop drives multi-step rollout and ToolEnv handles tool calls and environment feedback.

Core concepts:

Awesome Projects Using Agent-R1

Here are some representative projects built on top of Agent-R1:

TableMind: An autonomous programmatic agent for tool-augmented table reasoning. TableMind is built upon the Agent-R1 framework and leverages its end-to-end reinforcement learning pipeline to train a specialized agent for structured table understanding.
PaperScout: An autonomous agent for academic paper search built with Agent-R1. It introduces Proximal Sequence Policy Optimization (PSPO), a process-aware method for aligning token-level optimization with sequence-level agent interactions.

Acknowledgements

This work is conducted at the State Key Laboratory of Cognitive Intelligence, USTC. We gratefully acknowledge the inspiring ideas and early insights from DeepSeek-R1, veRL, and RAGEN, which have significantly influenced the development of Agent-R1. We also sincerely thank Prof. Qi Liu and Prof. Mingyue Cheng for their guidance and support.

Citation

If you find Agent-R1 useful in your research, please cite:

@misc{cheng2025agentr1trainingpowerfulllm,
  title={Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning},
  author={Mingyue Cheng and Jie Ouyang and Shuo Yu and Ruiran Yan and Yucong Luo and Zirui Liu and Daoyu Wang and Qi Liu and Enhong Chen},
  year={2025},
  eprint={2511.14460},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2511.14460}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
agent_r1		agent_r1
docs		docs
examples		examples
image		image
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent-R1

Training Powerful LLM Agents with End-to-End Reinforcement Learning

News

Overview

Why Agent-R1 v0.1.0

Version Guide

Getting Started

Stage 1: Sanity Check the Base Training Stack

Stage 2: Run the Main Agent-R1 Workflow

Awesome Projects Using Agent-R1

Acknowledgements

Citation

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent-R1

Training Powerful LLM Agents with End-to-End Reinforcement Learning

News

Overview

Why Agent-R1 v0.1.0

Version Guide

Getting Started

Stage 1: Sanity Check the Base Training Stack

Stage 2: Run the Main Agent-R1 Workflow

Awesome Projects Using Agent-R1

Acknowledgements

Citation

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages