MVR: Multi-view Video Reward Shaping for Reinforcement Learning

🎉 Accepted at ICLR 2026 🎉

Official JAX implementation of MVR

Lirui Luo · Guoxi Zhang · Hongming Xu · Yaodong Yang · Cong Fang · Qing Li

TL;DR MVR is centered on state-dependent reward shaping: it learns state relevance from multi-view videos with a frozen vision-language model, then uses that relevance to provide dense guidance early in training while automatically fading as the target behavior emerges.

Overview

The main idea of MVR is a state-dependent reward shaping formulation that integrates task rewards with VLM-based visual guidance without persistently distorting the task objective. Multi-view videos and learned state relevance are the key ingredients that make this shaping signal reliable for dynamic behaviors, robust to occlusion, and naturally self-decaying as performance improves. This repository contains the training code used to learn MVR on HumanoidBench and MetaWorld with ViCLIP and TQC in JAX.

Quick Start

uv venv --python 3.11
source .venv/bin/activate
uv sync --frozen --no-install-project
python download_viclip.py
bash scripts/mvr/run_mvr-metaworld.sh

Highlights

State-dependent reward shaping that combines task rewards with learned visual guidance
Automatic decay of the shaping term as the policy approaches the target behavior
Multi-view video relevance learning with ViCLIP to handle dynamic motions and occlusion
TQC-based reinforcement learning in JAX with experiment scripts for MetaWorld and HumanoidBench

Installation

Prerequisites

Python 3.11
uv for environment management: https://github.com/astral-sh/uv
NVIDIA GPU with CUDA-compatible drivers for accelerated training
A C/C++ toolchain for packages that build native extensions

Environment setup

From the repository root:

uv venv --python 3.11
source .venv/bin/activate
uv sync --frozen --no-install-project

--no-install-project avoids installing this repository as an editable package, which prevents a module-name collision with ViCLIP's utils package during checkpoint loading.

Verify JAX GPU setup

python -c "import jax; print('JAX version:', jax.__version__); print('JAX devices:', jax.devices()); print('JAX backend:', jax.default_backend())"

Expected output should show a CUDA device and gpu backend.

Download pretrained ViCLIP weights

python download_viclip.py

The default checkpoint path is ckpts/ViCLIP/ViCLIP-L_InternVid-FLT-10M.pth.

Running experiments

MetaWorld MVR

bash scripts/mvr/run_mvr-metaworld.sh

HumanoidBench MVR

bash scripts/mvr/run_mvr.sh

TQC baselines

bash scripts/tqc/run_tqc_metaworld.sh
bash scripts/tqc/run_tqc.sh

Outputs are written under outputs/<algo>/<timestamp>/<run_name>/<env_name>/.

Citation

If you find this repository useful, please cite:

@inproceedings{luo2026mvr,
  title        = {MVR: Multi-view Video Reward Shaping for Reinforcement Learning},
  author       = {Luo, Lirui and Zhang, Guoxi and Xu, Hongming and Yang, Yaodong and Fang, Cong and Li, Qing},
  booktitle    = {International Conference on Learning Representations},
  year         = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
env		env
metrics		metrics
reward_models		reward_models
scripts		scripts
src		src
vlms		vlms
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
download_viclip.py		download_viclip.py
download_viclip_b.py		download_viclip_b.py
encode.py		encode.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MVR: Multi-view Video Reward Shaping for Reinforcement Learning

🎉 Accepted at ICLR 2026 🎉

Overview

Quick Start

Highlights

Installation

Prerequisites

Environment setup

Verify JAX GPU setup

Download pretrained ViCLIP weights

Running experiments

MetaWorld MVR

HumanoidBench MVR

TQC baselines

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MVR: Multi-view Video Reward Shaping for Reinforcement Learning

🎉 Accepted at ICLR 2026 🎉

Overview

Quick Start

Highlights

Installation

Prerequisites

Environment setup

Verify JAX GPU setup

Download pretrained ViCLIP weights

Running experiments

MetaWorld MVR

HumanoidBench MVR

TQC baselines

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages