Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild

Hao Luo^1,3, Ye Wang^2,3, Wanpeng Zhang^1,3, Haoqi Yuan^1,3, Yicheng Feng^1,3, Haiweng Xu³,
Sipeng Zheng³, Zongqing Lu^1,3†

¹Peking University ²Renmin University of China ³BeingBeyond

JALA is a Transformer-based VLA pretraining framework that turns large-scale human manipulation videos into action-centric supervision without pixel-level reconstruction, bridging lab-annotated motion data and in-the-wild diversity via Joint Alignment.

News

[2026-02-28]: JALA accepted to CVPR 2026. Project page is live.

Citation

If you find our work useful, please consider citing us and give a star to our repository! 🌟🌟🌟

@inproceedings{luo2026jointalignedlatentaction,
  title={Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild},
  author={Hao Luo and Ye Wang and Wanpeng Zhang and Haoqi Yuan and Yicheng Feng and Haiweng Xu and Sipeng Zheng and Zongqing Lu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild

News

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild

News

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages