Official JAX implementation of MVR
Lirui Luo · Guoxi Zhang · Hongming Xu · Yaodong Yang · Cong Fang · Qing Li
TL;DR MVR is centered on state-dependent reward shaping: it learns state relevance from multi-view videos with a frozen vision-language model, then uses that relevance to provide dense guidance early in training while automatically fading as the target behavior emerges.
The main idea of MVR is a state-dependent reward shaping formulation that integrates task rewards with VLM-based visual guidance without persistently distorting the task objective. Multi-view videos and learned state relevance are the key ingredients that make this shaping signal reliable for dynamic behaviors, robust to occlusion, and naturally self-decaying as performance improves. This repository contains the training code used to learn MVR on HumanoidBench and MetaWorld with ViCLIP and TQC in JAX.
uv venv --python 3.11
source .venv/bin/activate
uv sync --frozen --no-install-project
python download_viclip.py
bash scripts/mvr/run_mvr-metaworld.sh- State-dependent reward shaping that combines task rewards with learned visual guidance
- Automatic decay of the shaping term as the policy approaches the target behavior
- Multi-view video relevance learning with ViCLIP to handle dynamic motions and occlusion
- TQC-based reinforcement learning in JAX with experiment scripts for MetaWorld and HumanoidBench
- Python
3.11 uvfor environment management:https://github.com/astral-sh/uv- NVIDIA GPU with CUDA-compatible drivers for accelerated training
- A C/C++ toolchain for packages that build native extensions
From the repository root:
uv venv --python 3.11
source .venv/bin/activate
uv sync --frozen --no-install-project--no-install-project avoids installing this repository as an editable package, which prevents a module-name collision with ViCLIP's utils package during checkpoint loading.
python -c "import jax; print('JAX version:', jax.__version__); print('JAX devices:', jax.devices()); print('JAX backend:', jax.default_backend())"Expected output should show a CUDA device and gpu backend.
python download_viclip.pyThe default checkpoint path is ckpts/ViCLIP/ViCLIP-L_InternVid-FLT-10M.pth.
bash scripts/mvr/run_mvr-metaworld.shbash scripts/mvr/run_mvr.shbash scripts/tqc/run_tqc_metaworld.sh
bash scripts/tqc/run_tqc.shOutputs are written under outputs/<algo>/<timestamp>/<run_name>/<env_name>/.
If you find this repository useful, please cite:
@inproceedings{luo2026mvr,
title = {MVR: Multi-view Video Reward Shaping for Reinforcement Learning},
author = {Luo, Lirui and Zhang, Guoxi and Xu, Hongming and Yang, Yaodong and Fang, Cong and Li, Qing},
booktitle = {International Conference on Learning Representations},
year = {2026}
}