This repository presents a unified and modular pipeline that couples large-scale vision–language reasoning, high-fidelity 2D segmentation, and multi-object 3D Gaussian splatting. It integrates three independent systems—vLLM (for Qwen3-VL inference), SAM3 (for multi-object 2D segmentation), and SAM-3D-Objects (for 3D reconstruction from RGB + masks)—into a complete, end-to-end workflow. To ensure reproducibility, each module runs inside its own Conda environment. The pipeline supports both staged execution and a fully automated one-click execution, with built-in HuggingFace authentication, checkpoint management, and environment initialization.
git clone --recursive https://github.com/xyys2003/sam3d_gs.git
cd sam3d_gs
If cloned without submodules:
git submodule update --init --recursive
| Environment | Purpose | Path |
|---|---|---|
vllm |
Serve Qwen3-VL-8B-Thinking via vLLM | — |
sam3 |
Multi-object segmentation (SAM3) | sam3/ |
sam3d-objects |
RGB + masks → 3D Gaussian reconstruction | sam-3d-objects/ |
conda create -n vllm python=3.10 -y
conda activate vllm
Install PyTorch (CUDA 12.x):
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu124
Install vLLM:
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu124
pip install transformers tiktoken sentencepiece xformers flashinfer-python
pip install huggingface_hub
Reference implementation: 🔗 https://github.com/facebookresearch/sam3 🔗 https://huggingface.co/facebook/sam3
cd sam3
conda create -n sam3 python=3.10 -y
conda activate sam3
Install SAM3:
git clone https://github.com/facebookresearch/sam3.git
cd sam3
pip install -e .
Optional:
pip install -e ".[notebooks]"
pip install -e ".[train,dev]"
Reference implementation: 🔗 https://github.com/facebookresearch/sam3d 🔗 https://huggingface.co/facebook/sam-3d-objects
conda create -n sam_3d_body python=3.10 -y
conda activate sam_3d_body
Install dependencies (excerpt):
pip install pytorch-lightning pyrender opencv-python yacs scikit-image einops timm dill pandas hydra-core ...
Install Detectron2:
pip install 'git+https://github.com/facebookresearch/detectron2.git@a1ce2f9' \
--no-build-isolation --no-deps
Optional: MoGe
pip install git+https://github.com/microsoft/MoGe.git
The pipeline requires access to the following models:
- SAM3 🔗 https://huggingface.co/facebook/sam3
- SAM-3D-Objects 🔗 https://huggingface.co/facebook/sam-3d-objects
Log in after requesting access:
hf auth login
Ensure the Conda activation path is correct:
CONDA_SH="/your_path/miniconda3/etc/profile.d/conda.sh"
bash run_agent_with_vllm.sh
Outputs:
outputs/master_with_vllm/masks/
bash run_sam3d_from_masks.sh
Outputs:
sam-3d-objects/outputs/torch_save_pt/
sam-3d-objects/gaussians/multi/
bash run_pipeline.sh
Cause: corrupted model shards in the HuggingFace cache due to unstable network.
Fix:
rm -rf sam-3d-objects/checkpoints/hf
rm -rf ~/.cache/huggingface/hub # optional
bash run_sam3d_from_masks.sh
Force fresh download:
force_download=True
The 3D Gaussian .ply files exported by SAM-3D-Objects are expressed in the camera coordinate system, where:
- +Z axis points forward from the camera
- +X axis points right
- +Y axis points downward (typical computer vision convention)
This means the reconstructed objects are aligned using camera-forward Z-axis rather than a world coordinate frame.
If you want to visualize or place the objects in a global world coordinate system, you must apply a camera-to-world transformation: $$ \mathbf{X}{world} = \mathbf{R}{c2w}\ \mathbf{X}{camera} \ + \ \mathbf{t}{c2w} $$ Where:
-
$\mathbf{R}_{c2w}$ is the rotation matrix from camera to world -
$\mathbf{t}_{c2w}$ is the translation vector -
$\mathbf{X}_{camera}$ is the Gaussian center in camera coordinates -
$\mathbf{X}_{world}$ is the desired world coordinate position
After applying this transformation, the .ply will correctly align with your global scene, robotics simulator, or NeRF / COLMAP world frame.
@article{kirillov2024sam3,
title={SAM 3: Segment Anything in Images and Videos},
author={Kirillov, Alexander and Ravi, Nikhila and Mao, Weiyao and others},
year={2024},
url={https://github.com/facebookresearch/sam3}
}
@article{wu2024sam3dobjects,
title={SAM-3D-Objects: Segment Anything in 3D Using 2D Masks},
author={Wu, Yu and Mao, Weiyao and Kirillov, Alexander and others},
year={2024},
url={https://github.com/facebookresearch/sam3d}
}
This project is built upon and integrates:
- SAM3 GitHub: https://github.com/facebookresearch/sam3 HuggingFace: https://huggingface.co/facebook/sam3
- SAM-3D-Objects GitHub: https://github.com/facebookresearch/sam3d HuggingFace: https://huggingface.co/facebook/sam-3d-objects
We sincerely thank the authors for making their research and implementations publicly available.