Yunhao Li, Yifan Jiao, Dan Meng, Heng Fan*, Libo Zhang*
International Conference on Computer Vision (ICCV), 2025. (*equal advising and co-last author)
Despite recent progress, current Open-Vocabulary Multi-Object Tracking (OV-MOT) methods largely focus on instance-level information while overlooking trajectory-level cues. Although some introduce novel association strategies, they neglect trajectory information—an essential component in videos and a staple in classic MOT. This limits their ability to exploit contextual continuity.
In contrast, our TRACT framework incorporates trajectory-level cues through three key strategies:
- TCR (Trajectory Consistency Reinforcement): Enforces consistency within trajectories
- TFA (Trajectory-aware Feature Aggregation): Aggregates features across trajectory history
- TSE (Trajectory Semantic Enhancement): Enhances semantic understanding using trajectory context
TRACT Framework Overview: Our approach uses a replaceable open-vocabulary detector to generate boxes of arbitrary categories. These detection results are used for trajectory association, where TRACT leverages trajectory information in both the trajectory-enhanced association and trajectory-assisted classification steps.
- Universal Open-Vocabulary Tracking: Track any object categories without specific training
- Trajectory-Aware Design: Incorporates temporal consistency and trajectory-level reasoning
- Strong Performance: Achieves state-of-the-art results on multiple benchmarks
This repository contains two main components:
- Location:
./masa/ - Purpose: Universal instance appearance model for object association
- Features: Zero-shot tracking capabilities across diverse domains
- Location:
./TraCLIP/ - Purpose: Trajectory-aware classification using CLIP features
- Features: Temporal feature aggregation and trajectory semantic enhancement
- Linux or macOS
- Python >= 3.9
- PyTorch >= 2.1.2
- CUDA >= 11.8 (recommended)
git clone https://github.com/your-repo/TRACT.git
cd TRACTcd masa
conda env create -f environment.yml
conda activate masaenv
# Option 1: Automated installation
sh install_dependencies.sh
# Option 2: Manual installation
pip install -U openmim
mim install mmengine
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
pip install git+https://github.com/open-mmlab/mmdetection.git@v3.3.0
pip install -r requirements.txtcd ../TraCLIP
pip install -r requirements.txt
# Install additional packages
pip install clip-by-openai
pip install timm
pip install tqdm- Download TAO dataset from official website
- Organize the data structure as shown in
TraCLIP/datasets/ - Generate tracklet datasets using provided scripts
Follow the data format specifications in TraCLIP/readme.md for custom datasets.
- CUDA out of memory: Reduce batch size or use gradient accumulation
- Missing dependencies: Ensure all packages in requirements.txt are installed
- Model loading errors: Check model paths and download pre-trained weights
Our code repository is built upon:
- MASA - Universal instance appearance modeling
- MMDetection - Object detection framework
- CLIP - Vision-language pre-training
Thanks for their wonderful work!
If you find this project useful for your research, please use the following BibTeX entry:
@inproceedings{li2025tract,
title={Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking},
author={Li, Yunhao and Jiao, Yifan and Meng, Dan and Fan, Heng and Zhang, Libo},
booktitle={International Conference on Computer Vision (ICCV)},
year={2025}
}

