A fully asynchronous agentic RL training framework for search agent.
- Fully Asynchronous RL Training: trajectory generation and model update are fully decoupled, speeding up training & reducing training cost
- Diverse Choices of Search Tools: Search agent training can use either local knowledge base, web search APIs, or MCP clients.
- Async RL Training is especially suitable for cases where:
- Excution time of a trajectory is very long.
- Trajectories can not be stopped, e.g. the server state is hard to save and load
- User-friendly Development: users can implement their own agent without touching any system-level codes
Step 1: Prepare the runtime environment and install AReaL.
Please refer to https://inclusionai.github.io/AReaL/tutorial/installation.html.
Step 2: download training data from ASearcher-train-data.
Step 1. Setup Environment Variable
export SERPER_API_KEY=YOUR_SERPER_API_KEY
export JINA_API_KEY=YOUR_JINA_API_KEYHere SERPER_API_KEY is for the serper API used for Web search. The underlying search engine is Google search, JINA_API_KEY is for the Jina API used for read the content from thr URLs.
Step 2. Launch Training
Run the following command to launch training on a single node:
cd AReaL
python3 -m areal.launcher.local ASearcher/train/asearcher.py \
--config ASearcher/configs/asearcher_web.yaml \
experiment_name=<your experiment name> \
trial_name=<your trial name> \
actor.path=Qwen/Qwen2.5-7B \
train_dataset.path=/path/to/training_data.jsonl \
trial_name=<your trial name>You can run distributed experiments with Ray or Slurm
cd AReaL
python3 -m areal.launcher.ray ASearcher/train/asearcher.py \
--config ASearcher/configs/asearcher_web_16nodes.yaml \
experiment_name=<your experiment name> \
trial_name=<your trial name> \
actor.path=Qwen/Qwen2.5-7B \
train_dataset.path=/path/to/training_data.jsonl \
allocation_mode=sglang.d96p1t1+d32p1t1 \
cluster.n_nodes=16 \
cluster.n_gpus_per_node=8Step 1. Setup Environment Variable
export RAG_SERVER_ADDR_DIR=PATH_TO_DUMP_LOCAL_SERVER_ADDRESSHere RAG_SERVER_ADDR_DIR is the directory to dump the address of the launched local RAG server, which will be loaded during training.
Step 2. Set up and launch the local RAG server
-
Step 2.1. Download the e5-base-v2 model, corpus file and webpage file
-
Step 2.2 Build the index (need e5-base-v2 model and wiki corpus):
bash scripts/build_index.sh- Step 2.3. Launch the local RAG server
bash scripts/launch_local_server.sh $PORT $RAG_SERVER_ADDR_DIRStep 3. Launch Training
Run the following command to launch training on a single node:
cd AReaL
python3 -m areal.launcher.local ASearcher/train/asearcher.py \
--config ASearcher/configs/asearcher_local.yaml \
experiment_name=<your experiment name> \
trial_name=<your trial name> \
actor.path=Qwen/Qwen2.5-7B/ \
train_dataset.path=/path/to/training_data.jsonl \You can run distributed experiments with Ray or Slurm
cd AReaL
python3 -m areal.launcher.slurm ASearcher/train/asearcher.py \
--config ASearcher/configs/asearcher_local.yaml \
experiment_name=<your experiment name> \
trial_name=<your trial name> \
actor.path=Qwen/Qwen2.5-7B/ \
train_dataset.path=/path/to/training_data.jsonl \
allocation_mode=sglang.d96p1t1+d32p1t1 \
cluster.n_nodes=16 \
cluster.n_gpus_per_node=8Step 1. Launch Qwen2.5-72B-Instruct for LLM-as-Judge:
python3 -m areal.launcher.ray ASearcher/train/asearcher_reasoning.py \
--config ASearcher/configs/asearcher_web_qwq.yaml \
experiment_name=asearcher-qwen72b-inst-server-only \
trial_name=run1 \
cluster.n_nodes=1 allocation_mode=sglang.d2t4p1 \
actor.path=Qwen/Qwen2.5-72B-Instruct Step 2. Launch QwQ-32B agent training:
python3 -m areal.launcher.ray \
ASearcher/train/asearcher_reasoning.py \
--config ASearcher/configs/asearcher_web_qwq.yaml \
experiment_name=asearcher-qwq-train \
trial_name=run1 cluster.n_nodes=6 allocation_mode=sglang.d2t8+d4t8 \
actor.path=Qwen/QwQ-32B \
train_dataset.path=path_to_ASearcher-LRM-35k.jsonl \
judge_engine.experiment_name=asearcher-qwen72b-inst-server-only \
judge_engine.trial_name=run1P.S. You could also try using smaller models, e.g. <=8B, to train a search agent with limited compute.
P.S. Users can run RL training with user-defined agent workflow with only minimal modifications by replacing OpenAIClient with AReaLOpenAIClient. See ASearcher/train/reasoning_agent.py for a concret example.
Please refer to our guideline for more information about building a custom agent.