|
This repository contains the training code for the Gaussian World Model (GWM), which is a latent Diffusion Transformer (DiT) combined with a 3D variational autoencoder, enabling fine-grained scene-level future state reconstruction with Gaussian Splatting.
# clone this repo, then:
export GWM_PATH=$(pwd)
echo "export GWM_PATH=$(pwd)" >> ~/.bashrc
# Install uv (if not already installed):
pip install uv
# ensure your cuda toolkit is installed
nvcc -V
# Install dependencies
uv sync
source .venv/bin/activate
uv pip install git+https://github.com/dcharatan/diff-gaussian-rasterization-modified --no-build-isolation
uv pip install git+https://github.com/facebookresearch/pytorch3d.git --no-build-isolation
# (Optional) Compile the CUDA kernels for Splatt3r
cd third_party/splatt3r/src/mast3r_src/dust3r/croco/models/curope/
python setup.py build_ext --inplace
cd ../../../../../../../..
# Download splatt3r checkpoint
mkdir -p third_party/splatt3r/checkpoints/splatt3r_v1.0
cd third_party/splatt3r/checkpoints/splatt3r_v1.0
wget https://huggingface.co/brandonsmart/splatt3r_v1.0/resolve/main/epoch%3D19-step%3D1200.ckpt
cd ../../../../..See docs/pretraining.md.
This repository is released under the MIT license.
Our code is built upon iVideoGPT and diamond, thanks to the authors for the great work!
If you find this repository helpful, please consider citing:
@article{lu2025gwm,
title={GWM: Towards Scalable Gaussian World Models for Robotic Manipulation},
author={Lu, Guanxing and Jia, Baoxiong and Li, Puhao and Chen, Yixin and Wang, Ziwei and Tang, Yansong and Huang, Siyuan},
booktitle={ICCV},
year={2025},
organization={IEEE}
}