Zhuoguang Chen*, Kenan Li*, Xiuyu Yang, Tao Jiang, Yiming Li, Hang Zhao†
*: equal contribution, †: corresponding author
ICRA 2025
Same color = same instance, across 3D space and time.
- [2025/04]: We release the source code!
- [2025/03]: The preprint version is available on arXiv.
- [2025/01]: Our work is accepted by 2025 IEEE International Conference on Robotics and Automation.
To the best of our knowledge, we make the first attempt to explore a camera-based 4D panoptic occupancy tracking task, which jointly tackles occupancy panoptic segmentation and object tracking with camera input. For fair evaluations, we propose the OccSTQ metric and build a set of baselines adapted from other domains.
We propose TrackOcc, which uses 4D panoptic queries to perform the proposed task in a streaming, end-to-end manner. We also introduce a localization-aware loss to enhance the tracking performance.
Clone TrackOcc
git clone https://github.com/Tsinghua-MARS-Lab/TrackOcc.git
cd TrackOcc
Create conda environment
conda create -n trackocc python=3.8
conda activate trackocc
# PyTorch 1.12.1 + CUDA 11.3
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
Install other dependencies
pip install openmim
mim install mmcv-full==1.6.0
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
pip install setuptools==59.5.0
pip install numpy==1.23.5
pip install yapf==0.40.1
Compile CUDA extensions
pip install -v -e .
TrackOcc's data (including 5-frame interval sampled data) and labels are now available on Hugging Face. Remember to unzip the compressed files:
cd data/TrackOcc-waymo/
unzip pano_voxel04.zip
cd kitti_format/training
cat velodyne.zip.part* > velodyne.zip
unzip velodyne.zip
unzip 'image_*.zip'
And Preparing the data folder to the following structure:
data/TrackOcc-waymo
├── kitti_format
├── waymo_infos_train_jpg.pkl
├── waymo_infos_val_jpg.pkl
├── training
├── image_0
├── ......
├── image_4
└── velodyne
└── pano_voxel04
├── trainig
│ ├── 000
├── 000_04.npz
├── 001_04.npz
└── ......
│ ├── 001
│ └── ......
└── validation
The backbone is pretrained on nuImages. Download the weight to pretrain/xxx.pth before you start training. Remember that you need to modify some params of the config file.
./dist_train.sh configs/TrackOcc/trackocc_r50_704x256_3inst_3f_8gpu.py 8
Remenber that you need to modify some params in the command.
GPUS=8 ./slurm_train.sh PARTITION JOB_NAME configs/TrackOcc/trackocc_r50_704x256_3inst_3f_8gpu.py --cfg-options 'dist_params.port=29500'
Our trained weight is released at Hugging Face. Download the weight to pretrain/xxx.pth before you start evaluating.
./dist_test.sh configs/TrackOcc/trackocc_r50_704x256_3inst_3f_8gpu.py pretrain/trackocc_r50_704x256_3inst_3f_8gpu.pth 8
GPUS=8 ./slurm_test.sh PARTITION JOB_NAME configs/TrackOcc/trackocc_r50_704x256_3inst_3f_8gpu.py pretrain/trackocc_r50_704x256_3inst_3f_8gpu.pth --cfg-options 'dist_params.port=28506'
CUDA_VISIBLE_DEVICES=0 python timing_trackocc.py configs/TrackOcc/trackocc_r50_704x256_3inst_3f_8gpu.py pretrain/trackocc_r50_704x256_3inst_3f_8gpu.pth
This project is not possible without multiple great open-sourced code bases. We list some notable examples below.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{chen2025trackocc,
title={TrackOcc: Camera-based 4D Panoptic Occupancy Tracking},
author={Zhuoguang Chen and Kenan Li and Xiuyu Yang and Tao Jiang and Yiming Li and Hang Zhao},
journal={arXiv preprint arXiv:2503.08471},
year={2025}
}