ADMIRE

Adaptive Milestone Reward for GUI Agents

中文 | English

ADMIRE is a reinforcement learning framework that uses adaptive milestone rewards to train GUI agents. It automatically generates task milestones from successful trajectories and provides dense rewards to guide agent learning.

Installation

Requirements

Python 3.11
PyTorch 2.2.0
CUDA 12.6
8× A800(A100) GPUs (recommended)

Setup

# Create conda environment
conda create -n admire python=3.11 -y
conda activate admire

# Install PyTorch
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu126

# Clone repository
git clone https://github.com/your-repo/ADMIRE.git
cd ADMIRE

# Install verl
mkdir 3rdparty && git clone https://github.com/volcengine/verl 3rdparty/verl
cd 3rdparty/verl && pip install -e . && cd ../..

# Install android_world
git clone https://github.com/google-research/android_world 3rdparty/android_world
cd 3rdparty/android_world && pip install -e . && cd ../..

# Install ADMIRE
pip install -e .

# Install additional dependencies
pip install swanlab scikit-image spacy ray
python -m spacy download en_core_web_sm

# (Optional) Login to SwanLab for experiment tracking
swanlab login -k "your-api-key"

Android Environment

Follow AndroidWorld setup guide to configure the Android emulator.

Quick Start

1. Start Ray Cluster

ray stop
ray start --head --port=6379 --dashboard-port=8265

2. Start Environment Server

python src/hammer_server/gradio_web_server.py \
    --num-devices 8 \
    --max-devices 8 \
    --crashed-device-restart \
    --concurrency-limit 8

3. Run Training

bash run_hrpo_stepwise.sh

Or with custom config:

export HYDRA_FULL_ERROR=1
python -m hammer_trainer_stepwise.main_ppo \
    --config-path=./scripts \
    --config-name=config_stepwise_32.yaml \
    actor_rollout_ref.model.path="Qwen/Qwen2.5-VL-7B-Instruct" \
    trainer.total_epochs=20 \
    trainer.n_gpus_per_node=8

Project Structure

ADMIRE/
├── src/
│   ├── hammer_agent/              # Agent implementation
│   ├── hammer_server/             # Environment server (Gradio)
│   ├── hammer_trainer/            # Base trainer
│   └── hammer_trainer_stepwise/   # Stepwise RL trainer with milestone rewards
├── scripts/                       # Training configs
│   ├── config_stepwise_32.yaml    # Default stepwise config
│   └── config_grpo.yaml           # GRPO config
├── 3rdparty/
│   ├── verl/                      # RL training framework
│   └── android_world/             # Android environment
└── notebooks/
    └── visualize_step.ipynb       # Trajectory visualization

Configuration

Key parameters in scripts/config_stepwise_32.yaml:

# Environment
env:
  src: [" "]                       
  max_envs: [16]
  max_steps: 30

# Model
actor_rollout_ref:
  model:
    path: "Qwen/Qwen2.5-VL-7B-Instruct"
  rollout:
    n: 8                           # Number of rollouts per prompt

# Training
trainer:
  total_epochs: 20
  n_gpus_per_node: 8

# Milestone Reward
milestone_reward:
  enable: true
  threshold: 0.75                  # Similarity threshold for matching
  weight: 0.3

process_reward:
  enable: true
  weight: 1.0
  decay_gamma: 0.99

Reward Components

The total reward is computed as:

$$\mathcal{R}_{total} = \mathcal{R}_{outcome} + \eta \cdot \mathcal{R}_{format} + \lambda(t) \cdot \mathcal{R}_{mil}$$

Configure via:

milestone_reward:
  enable: true
  weight: 0.3
  strategy: "mix"

process_reward:
  enable: true
  weight: 1.0

License

Apache License 2.0. See LICENSE.

Acknowledgments

verl - ByteDance Seed Team
AndroidWorld - Google Research
Qwen2.5-VL - Alibaba

Citation

Paper Link

@article{zheng2026adaptive,
  title={Adaptive Milestone Reward for GUI Agents},
  author={Zheng, Congmin and Mo, Xiaoyun and Ma, Xinbei and Lin, Qiqiang and Zhao, Yin and Zhu, Jiachen and Lou, Xingyu and Wang, Jun and Wang, Zhaoxiang and Liu, Weiwen and others},
  journal={arXiv preprint arXiv:2602.11524},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
framework.jpg		framework.jpg
judge_step_helpfulness.py		judge_step_helpfulness.py
openai_client.py		openai_client.py
replayTomilestone.py		replayTomilestone.py
run_gradio_web_server.sh		run_gradio_web_server.sh
run_hrpo_stepwise.sh		run_hrpo_stepwise.sh
run_release_all_devices.sh		run_release_all_devices.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADMIRE

Adaptive Milestone Reward for GUI Agents

Installation

Requirements

Setup

Android Environment

Quick Start

1. Start Ray Cluster

2. Start Environment Server

3. Run Training

Project Structure

Configuration

Reward Components

License

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ADMIRE

Adaptive Milestone Reward for GUI Agents

Installation

Requirements

Setup

Android Environment

Quick Start

1. Start Ray Cluster

2. Start Environment Server

3. Run Training

Project Structure

Configuration

Reward Components

License

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages