Hao Luo1,3, Ye Wang2,3, Wanpeng Zhang1,3, Haoqi Yuan1,3, Yicheng Feng1,3, Haiweng Xu3,
Sipeng Zheng3, Zongqing Lu1,3†
1Peking University
2Renmin University of China
3BeingBeyond
JALA is a Transformer-based VLA pretraining framework that turns large-scale human manipulation videos into action-centric supervision without pixel-level reconstruction, bridging lab-annotated motion data and in-the-wild diversity via Joint Alignment.
- [2026-02-28]: JALA accepted to CVPR 2026. Project page is live.
If you find our work useful, please consider citing us and give a star to our repository! 🌟🌟🌟
@inproceedings{luo2026jointalignedlatentaction,
title={Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild},
author={Hao Luo and Ye Wang and Wanpeng Zhang and Haoqi Yuan and Yicheng Feng and Haiweng Xu and Sipeng Zheng and Zongqing Lu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}
