This repository contains the code and data for the ACL 2025 paper OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis.
We introduce OS-Genesis, an interaction-driven pipeline for synthesizing high-quality and diverse GUI agent trajectory data without human supervision or predefined tasks. By leveraging reverse task synthesis and a trajectory reward model, OS-Genesis enables effective end2end training of GUI agents.
We provide scripts and instructions to help you build trajectories in colletction.
For details and operations of the training, please refer to the InternVL2 documentation and Qwen2-VL.
To evaluate the AndroidControl Benchmark, please follow the steps below:
-
Clone the GitHub Repository:
git clone https://github.com/OS-Copilot/OS-Genesis.git
-
Inference:
cd OS-Genesis/evaluation/android_control bash run_ac_inference.sh $dataset $checkpoint
-
Evaluation:
pyhton ac_eval.py
Model Name | Base Model | Training Data | HF Link |
---|---|---|---|
OS-Genesis-4B-AC | InternVL2-4B | OS-Genesis-ac-training-data | 🤗 link |
OS-Genesis-7B-AC | Qwen2-VL-7B-Instruct | OS-Genesis-ac-training-data | 🤗 link |
OS-Genesis-8B-AC | InternVL2-8B | OS-Genesis-ac-training-data | 🤗 link |
Model Name | Base Model | Training Data | HF Link |
---|---|---|---|
OS-Genesis-4B-AW | InternVL2-4B | OS-Genesis-aw-training-data | 🤗 link |
OS-Genesis-7B-AW | Qwen2-VL-7B-Instruct | OS-Genesis-aw-training-data | 🤗 link |
OS-Genesis-8B-AW | InternVL2-8B | OS-Genesis-aw-training-data | 🤗 link |
Model Name | Base Model | Training Data | HF Link |
---|---|---|---|
OS-Genesis-4B-WA | InternVL2-4B | OS-Genesis-web-training-data | 🤗 link |
OS-Genesis-7B-WA | Qwen2-VL-7B-Instruct | OS-Genesis-web-training-data | 🤗 link |
OS-Genesis-8B-WA | InternVL2-8B | OS-Genesis-web-training-data | 🤗 link |
In addition to our complete trajectory data on HuggingFace, we also provide collected raw <s_pre, a, s_post>
triples. You can use them to reproduce the process of reverse task synthesis directly, without re-collecting them from emulators yourself 😄. The screenshots and corresponding texts (with SoM info contained) are provided below:
Data Type | Screenshots | Data JSON |
---|---|---|
Mobile | Screenshots | Data JSON |
Web | Screenshots | Data JSON |
Feel free to email me if you require additional data of this kind.
We have collected some questions from emails, Hugging Face, and WeChat communications. Please check the FAQ 🤖
- OS-Atlas 🤖
- ScienceBoard 🧪
- GUIMid 📊
🫶 If you are interested in our work or find this repository / our data helpful, please consider using the following citation format when referencing our paper:
@article{sun2024genesis,
title={OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis},
author={Sun, Qiushi and Cheng, Kanzhi and Ding, Zichen and Jin, Chuanyang and Wang, Yian and Xu, Fangzhi and Wu, Zhenyu and Jia, Chengyou and Chen, Liheng and Liu, Zhoumianze and others},
journal={arXiv preprint arXiv:2412.19723},
year={2024}
}