MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning framework for general agentic tasks, providing a foundational MARFT framework.
Check out our paper MARFT: Multi-Agent Reinforcement Fine-Tuning!!!
- About
- Features
- Getting Started
- Environment Extension
- Multi-Adapter
- Agent-by-Agent Training
- Resume Training
- Contributing
- License
- Citation
This repository aims to help researchers in academia and industry transition into the world of reinforcement learning. The power of multi-agent systems is vast and often surprising, which is why we provide a comprehensive framework for MARFT. The framework supports both action-level optimization and token-level optimization. It is designed to scale to various agentic tasks by allowing users to craft new environments tailored to their specific needs.
- Action and Token Optimization: Supports both action-level and token-level optimization.
- Environment Extension: Easy-to-use tools for creating custom environments for agentic tasks.
- Multi-Adapter Support: Agents use the same base model but have different LoRA adapters.
- Agent-by-Agent Training: Training individual agents while freezing others for efficient learning.
- Resume Training: Resume training from an existing checkpoint.
-
Create a virtual environment:
conda create -n marft conda activate marft
-
Clone the repository and install dependencies:
git clone https://github.com/jwliao-ai/MARFT.git cd MARFT pip install -r requirements.txtNote: You may need to adjust package versions to match your CUDA version.
To create a custom environment for your specific agentic task:
- Navigate to
marft/envsand create a folder for your environment. - Create a Python file (e.g.,
env_name.py) and implement the necessary environment components:__init__: Initialize the environment.reset: Reset the environment state.step: Define the agent's action step.transition: Define state transitions.
- Create a corresponding
runnerandtrainentry inrunner/sharedandscriptsrespectively.
Example:
class CustomEnv:
def __init__(self):
# Initialize your environment
pass
def reset(self):
# Reset the environment state
pass
def step(self, action):
# Define how the environment responds to actions
pass
def transition(self, state):
# Define state transitions
passThe framework supports a multi-agent system (MAS) where each agent shares the same base model but uses different LoRA (Low-Rank Adaptation) adapters. This allows agents to specialize in different tasks while maintaining a shared foundation. Checkpoint loading is also supported for seamless model resumption.
The repository supports agent-by-agent training, where a single agent is trained while others are frozen. This is controlled by the --agent_iteration_interval argument, which defines the training interval for each agent.
LLMs are hard to train and the training process often crashes if the LLM explores some exotic tokens, which is really normal. Thus, resume training helps to resume training if the LaMAS performance starts to collapse. To use resume training, specify the argument --load_path, and under the path, there should be multiple folders contain different LoRA adapter parameters and configurations. Also, a critic model critic.pth should be contained and it will be auto-loaded.
We welcome contributions to improve the framework. To contribute:
- Fork the repository.
- Create a new branch:
git checkout -b feature/new-feature. - Commit your changes:
git commit -m "Add new feature". - Push to your branch:
git push origin feature/new-feature. - Submit a pull request.
This project is licensed under the MIT License. For more details, see the LICENSE file.
If you find this repository helpful, please consider citing our paper:
@misc{liao2025marftmultiagentreinforcementfinetuning,
title={MARFT: Multi-Agent Reinforcement Fine-Tuning},
author={Junwei Liao and Muning Wen and Jun Wang and Weinan Zhang},
year={2025},
eprint={2504.16129},
archivePrefix={arXiv},
primaryClass={cs.MA},
url={https://arxiv.org/abs/2504.16129},
}