MARFT: Multi-Agent Reinforcement Fine-Tuning

MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning framework for general agentic tasks, providing a foundational MARFT framework.

Check out our paper MARFT: Multi-Agent Reinforcement Fine-Tuning!!!

About

This repository aims to help researchers in academia and industry transition into the world of reinforcement learning. The power of multi-agent systems is vast and often surprising, which is why we provide a comprehensive framework for MARFT. The framework supports both action-level optimization and token-level optimization. It is designed to scale to various agentic tasks by allowing users to craft new environments tailored to their specific needs.

Features

Action and Token Optimization: Supports both action-level and token-level optimization.
Environment Extension: Easy-to-use tools for creating custom environments for agentic tasks.
Multi-Adapter Support: Agents use the same base model but have different LoRA adapters.
Agent-by-Agent Training: Training individual agents while freezing others for efficient learning.
Resume Training: Resume training from an existing checkpoint.

Getting Started

Installation

Create a virtual environment:

conda create -n marft
conda activate marft

Clone the repository and install dependencies:
```
git clone https://github.com/jwliao-ai/MARFT.git
cd MARFT
pip install -r requirements.txt
```
Note: You may need to adjust package versions to match your CUDA version.

Environment Extension

To create a custom environment for your specific agentic task:

Navigate to marft/envs and create a folder for your environment.
Create a Python file (e.g., env_name.py) and implement the necessary environment components:
- __init__: Initialize the environment.
- reset: Reset the environment state.
- step: Define the agent's action step.
- transition: Define state transitions.
Create a corresponding runner and train entry in runner/shared and scripts respectively.

Example:

class CustomEnv:
    def __init__(self):
        # Initialize your environment
        pass

    def reset(self):
        # Reset the environment state
        pass

    def step(self, action):
        # Define how the environment responds to actions
        pass

    def transition(self, state):
        # Define state transitions
        pass

Multi-Adapter

The framework supports a multi-agent system (MAS) where each agent shares the same base model but uses different LoRA (Low-Rank Adaptation) adapters. This allows agents to specialize in different tasks while maintaining a shared foundation. Checkpoint loading is also supported for seamless model resumption.

Agent-by-Agent Training

The repository supports agent-by-agent training, where a single agent is trained while others are frozen. This is controlled by the --agent_iteration_interval argument, which defines the training interval for each agent.

Resume Training

LLMs are hard to train and the training process often crashes if the LLM explores some exotic tokens, which is really normal. Thus, resume training helps to resume training if the LaMAS performance starts to collapse. To use resume training, specify the argument --load_path, and under the path, there should be multiple folders contain different LoRA adapter parameters and configurations. Also, a critic model critic.pth should be contained and it will be auto-loaded.

Contributing

We welcome contributions to improve the framework. To contribute:

Fork the repository.
Create a new branch: git checkout -b feature/new-feature.
Commit your changes: git commit -m "Add new feature".
Push to your branch: git push origin feature/new-feature.
Submit a pull request.

License

This project is licensed under the MIT License. For more details, see the LICENSE file.

Citation

If you find this repository helpful, please consider citing our paper:

@misc{liao2025marftmultiagentreinforcementfinetuning,
      title={MARFT: Multi-Agent Reinforcement Fine-Tuning}, 
      author={Junwei Liao and Muning Wen and Jun Wang and Weinan Zhang},
      year={2025},
      eprint={2504.16129},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2504.16129}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
marft		marft
.gitignore		.gitignore
LICENSE		LICENSE
MARFT.pdf		MARFT.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARFT: Multi-Agent Reinforcement Fine-Tuning

Table of Contents

About

Features

Getting Started

Installation

Environment Extension

Multi-Adapter

Agent-by-Agent Training

Resume Training

Contributing

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MARFT: Multi-Agent Reinforcement Fine-Tuning

Table of Contents

About

Features

Getting Started

Installation

Environment Extension

Multi-Adapter

Agent-by-Agent Training

Resume Training

Contributing

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages