GitHub - wish44165/YOLOv12-BoT-SORT-ReID: 🔥🔥 CVPR 2025 (Nashville TN) - The 4th Anti-UAV Workshop & Challenge 🥉

YOLOv12-BoT-SORT-ReID

Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID

Yu-Hsi Chen

Preface

The combination of YOLOv12 and BoT-SORT demonstrates strong object detection and tracking potential yet remains underexplored in current literature and implementations.

[1] Jocher, Glenn, et al. "ultralytics/yolov5: v6. 0-YOLOv5n'Nano'models, Roboflow integration, TensorFlow export, OpenCV DNN support." Zenodo (2021).
[2] Tian, Yunjie, Qixiang Ye, and David Doermann. "Yolov12: Attention-centric real-time object detectors." arXiv preprint arXiv:2502.12524 (2025).
[3] Zhang, Guangdong, et al. "Multi-object Tracking Based on YOLOX and DeepSORT Algorithm." International Conference on 5G for Future Wireless Networks. Cham: Springer Nature Switzerland, 2022.
[4] Aharon, Nir, Roy Orfaig, and Ben-Zion Bobrovsky. "Bot-sort: Robust associations multi-pedestrian tracking." arXiv preprint arXiv:2206.14651 (2022).

This repository provides a strong baseline for multi-UAV tracking in thermal infrared videos by leveraging YOLOv12 and BoT-SORT with ReID. Our approach significantly outperforms the widely adopted YOLOv5 with the DeepSORT pipeline, offering a high-performance foundation for UAV swarm tracking. Importantly, the established workflow in this repository can be easily integrated with any custom-trained model, extending its applicability beyond UAV scenarios. Refer to this section for practical usage examples.

📹 Preview - Strong Baseline

strong_baseline.webm

🔗 Full video available at: Track 3

🔍 See also SOT inferences: Track 1 and Track 2

📹 Preview - Single-Frame Enhancements

enhancements_MultiUAV-261.webm

🔗 Full video available at: Enhancements

📹 Preview - Custom Model Inference

This section showcases example videos processed using a custom-trained model. The scenes are not limited to UAV footage or single-class detection. See 🚀 Quickstart: Installation and Demonstration → Run Inference Using a Custom-Trained Model for more details.

1. Multi-Class on a Walkway Scene

palace.webm

🔗 Original video: palace.mp4

2. Common Objects Underwater

cou.webm

🔗 Full video available at: COU.mp4

3. UAVDB

uavdb.webm

🔗 Full video available at: UAVDB.mp4

🏁 Beyond Strong Baseline: Multi-UAV Tracking Competition ₊˚⊹

📹 Preview - Vision in Action: Overview of All Videos

A complete visual overview of all training and test videos.

vision_in_action.webm

🔗 Full video available at: Overview

Scenarios are categorized to evaluate tracking performance under diverse conditions:

Takeoff - UAV launch phase: 2 videos.
L - Larger UAV target: 15 videos.
C - Cloud background: 39 videos.
CF - Cloud (Fewer UAVs): 18 videos.
T - Tree background: 68 videos.
TF - Tree (Fewer UAVs): 14 videos.
B - Scene with buildings: 11 videos.
BB1 - Building Background 1: 4 videos.
BB2 - Building Background 2: 17 videos.
BB2P - Building Background 2 (UAV partially out of view): 8 videos.
Landing - UAV landing phase: 4 videos.

TOTAL: 200 videos (151,384 frames)

📹 Preview - Vision in Action: Training Videos

demo_train.webm

🔗 Full video available at: Training Videos

Takeoff - UAV launch phase: 1 videos.
L - Larger UAV target: 8 videos.
C - Cloud background: 20 videos.
CF - Cloud (Fewer UAVs): 9 videos.
T - Tree background: 34 videos.
TF - Tree (Fewer UAVs): 7 videos.
B - Scene with buildings: 6 videos.
BB1 - Building Background 1: 2 videos.
BB2 - Building Background 2: 9 videos.
BB2P - Building Background 2 (UAV partially out of view): 4 videos.
Landing - UAV landing phase: 2 videos.

TOTAL: 102 videos (77,293 frames)

📹 Preview - Vision in Action: Test Videos

demo_test.webm

🔗 Full video available at: Test Videos

Takeoff - UAV launch phase: 1 videos.
L - Larger UAV target: 7 videos.
C - Cloud background: 19 videos.
CF - Cloud (Fewer UAVs): 9 videos.
T - Tree background: 34 videos.
TF - Tree (Fewer UAVs): 7 videos.
B - Scene with buildings: 5 videos.
BB1 - Building Background 1: 2 videos.
BB2 - Building Background 2: 8 videos.
BB2P - Building Background 2 (UAV partially out of view): 4 videos.
Landing - UAV landing phase: 2 videos.

TOTAL: 98 videos (74,538 frames)

📹 Preview - Vision in Action: Beyond Strong Baseline

🔗 View the competition on Codabench

🗞️ News

August 1, 2025: Submit now and challenge the Strong Baseline .
July 30, 2025: 🔧 Corrected test data for the BB2P_02 sequence to fix a minor defect.
July 27, 2025: 🏁 Beyond Strong Baseline is now open for registration.
July 23, 2025: The test data for the competition is now available.
July 13, 2025: The training data for the competition is now available.
June 21, 2025: Training scripts for YOLOv12 and BoT-SORT-ReID are now available.
June 12, 2025: 🥉 3rd Place Award in The 4th Anti-UAV Workshop & Challenge Track 3.
June 6, 2025: Corrected mistyped numbers in Table 1 .
April 25, 2025: Single-Frame Enhancement datasets are now available.
April 23, 2025: Strong Baseline weights available: YOLOv12 | ReID .
April 13, 2025: The datasets presented in Table 2 of the paper are now available.
April 7, 2025: Our paper is now on arXiv .
- 🎥 Demos: Hugging Face | YouTube
- 🚀 Quickstart: Colab Notebook | Kaggle Notebook
- 🌐 Project Page: Link

🚀 Quickstart: Installation and Demonstration

Installation

$ conda create -n yolov12_botsort python=3.11 -y
$ conda activate yolov12_botsort
$ git clone https://github.com/wish44165/YOLOv12-BoT-SORT-ReID.git
$ cd YOLOv12-BoT-SORT-ReID/BoT-SORT/yolov12/
$ wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
$ pip install -r requirements.txt
$ cd ../
$ pip3 install torch torchvision torchaudio
$ pip3 install -r requirements.txt
$ pip3 install ultralytics
$ pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
$ pip3 install cython_bbox
$ pip3 install faiss-cpu
$ pip3 install seaborn

Folder Structure

The following folder structure will be created upon cloning this repository.

YOLOv12-BoT-SORT-ReID/
├── data/
│   └── demo/
│       ├── MOT/
│       │   ├── MultiUAV-003.mp4
│       │   ├── Test_imgs/
│       │   │   ├── MultiUAV-003/
│       │   │   ├── MultiUAV-135/
│       │   │   ├── MultiUAV-173/
│       │   │   └── MultiUAV-261/
│       │   └── TestLabels_FirstFrameOnly/
│       │       ├── MultiUAV-003.txt
│       │       ├── MultiUAV-135.txt
│       │       ├── MultiUAV-173.txt
│       │       └── MultiUAV-261.txt
│       └── SOT/
│           ├── Track1/
│           │   ├── 20190926_111509_1_8/
│           │   ├── 41_1/
│           │   ├── new30_train-new/
│           │   └── wg2022_ir_050_split_01/
│           └── Track2/
│               ├── 02_6319_0000-1499/
│               ├── 3700000000002_110743_1/
│               ├── DJI_0057_1/
│               └── wg2022_ir_032_split_04/
└── BoT-SORT/

Demonstration

Toy example with three tracks, including SOT and MOT.

$ cd BoT-SORT/

# Track 1
$ python3 tools/predict_track1.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/demo/SOT/Track1/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 4 --save_path_answer ./submit/track1/demo --hide-labels-name
# output: ./runs/detect/, ./submit/track1/demo/

# Track 2
$ python3 tools/predict_track2.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/demo/SOT/Track2/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 1 --save_path_answer ./submit/track2/demo --hide-labels-name
# output: ./runs/detect/, ./submit/track2/demo/

# Track 3
$ python3 tools/predict_track3.py --weights ./yolov12/weights/MOT_yolov12n.pt --source ../data/demo/MOT/ --img-size 1600 --device "0" --track_buffer 60 --save_path_answer ./submit/track3/demo --hide-labels-name
$ python3 tools/predict_track3.py --weights ./yolov12/weights/MOT_yolov12n.pt --source ../data/demo/MOT/ --img-size 1600 --device "0" --track_buffer 60 --save_path_answer ./submit/track3/demo --with-reid --fast-reid-config logs/sbs_S50/config.yaml --fast-reid-weights logs/sbs_S50/model_0016.pth --hide-labels-name
# output: ./runs/detect/, ./submit/track3/demo/

# Heatmap
$ cd yolov12/
$ python heatmap.py
# output: ./outputs/

Run Inference on Custom Data

This project supports flexible inference on image folders and video files, with or without initial object positions, specifically for MOT task.

python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source <path to folder or video> \
    --with-initial-positions \
    --initial-position-config <path to initial positions file (optional)> \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name

Below are examples of supported inference settings:

# 1. Inference on Image Folder (without initial position)
python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source ../data/demo/MOT/Test_imgs/MultiUAV-003/ \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name

# 2. Inference on Image Folder (with initial position)
python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source ../data/demo/MOT/Test_imgs/MultiUAV-003/ \
    --with-initial-positions \
    --initial-position-config ../data/demo/MOT/TestLabels_FirstFrameOnly/MultiUAV-003.txt \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name

# 3. Inference on Video (without initial position)
python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source ../data/demo/MOT/MultiUAV-003.mp4 \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name

# 4. Inference on Video (with initial position)
python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source ../data/demo/MOT/MultiUAV-003.mp4 \
    --with-initial-positions \
    --initial-position-config ../data/demo/MOT/TestLabels_FirstFrameOnly/MultiUAV-003.txt \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name

Run Inference Using a Custom Trained Model

This project also supports flexible inference using a custom-trained model for any MOT task. Below are the instructions for reproducing the preview section.

$ cd BoT-SORT/

1. Multi-Class on a Walkway Scene

$ wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12x.pt
$ wget https://github.com/FoundationVision/ByteTrack/raw/main/videos/palace.mp4
$ python3 tools/inference.py \
    --weights yolov12x.pt \
    --source palace.mp4 \
    --img-size 640 \
    --device "0" \
    --save_path_answer ./submit/palace/

2. Common Objects Underwater

for f in ./videos/COU/*.mp4; do
    python3 tools/inference.py \
        --weights ./yolov12/runs/det/train/weights/best.pt \
        --source "$f" \
        --img-size 1600 \
        --device "0" \
        --save_path_answer ./submit/COU/
done

3. UAVDB

for f in ./videos/UAVDB/*.mp4; do
    python3 tools/inference.py \
        --weights ./yolov12/runs/det/train/weights/best.pt \
        --source "$f" \
        --img-size 1600 \
        --device "0" \
        --save_path_answer ./submit/UAVDB/
done

🛠️ Implementation Details

Hardware Information

Experiments were conducted on two platforms: (1) a local system with an Intel Core i7-12650H CPU, NVIDIA RTX 4050 GPU, and 16 GB RAM for data processing and inference, and (2) an HPC system with an NVIDIA H100 GPU and 80 GB memory for model training.

Laptop

CPU: Intel® Core™ i7-12650H
GPU: NVIDIA GeForce RTX 4050 Laptop GPU (6GB)
RAM: 23734MiB

HPC

GPU: Spartan gpu-h100 (80GB), gpu-a100 (80GB)

🖻 Data Preparation

Officially Released

4th_Anti-UAV_Challenge/
├── baseline/
│   ├── Baseline_code.zip
│   └── MultiUAV_Baseline_code_and_submissi.zip
├── test/
│   ├── MultiUAV_Test.zip
│   ├── track1_test.zip
│   └── track2_test.zip
└── train/
    ├── MultiUAV_Train.zip
    └── train.zip

Train
- Track 1 & Track 2: Google Drive | Baidu
- Track 3: Google Drive | Baidu
Test
- Track 1: Google Drive | Baidu
- Track 2: Google Drive | Baidu
- Track 3: Google Drive | Baidu

Strong Baseline

train/
├── MOT/
│   └── AntiUAV_train_val.zip
├── ReID/
│   ├── MOT20_subset.zip
│   └── MOT20.zip
└── SOT/
    ├── AntiUAV_train_val_test.zip
    └── AntiUAV_train_val.zip

Enhancements

enhancements/
├── MOT/
│   ├── CLAHE_train_val.zip
│   ├── Sobel-based_Edge_Sharpening_train_val.zip
│   └── Sobel-based_Image_Gradients_train_val.zip
└── ReID/
    ├── CLAHE_subset.zip
    ├── Sobel-based_Edge_Sharpening_subset.zip
    └── Sobel-based_Image_Gradients_subset.zip

📂 Folder Structure

Project Layout

Follow the folder structure below to ensure smooth execution and easy navigation.

YOLOv12-BoT-SORT-ReID/
├── BoT-SORT/
│   ├── getInfo.py
│   ├── datasets/
│   │   └── README.md
│   ├── fast_reid/
│   │   └── datasets/
│   │       ├── generate_mot_patches.py
│   │       └── README.md
│   ├── logs/
│   │   ├── sbs_S50/
│   │   │   ├── config.yaml
│   │   │   └── model_0016.pth
│   │   └── README.md
│   ├── requirements.txt
│   ├── runs/
│   │   └── README.md
│   ├── submit/
│   │   └── README.md
│   ├── tools/
│   │   ├── predict_track1.py
│   │   ├── predict_track2.py
│   │   └── predict_track3.py
│   └── yolov12/
│       ├── heatmap.py
│       ├── imgs_dir/
│       │   ├── 00096.jpg
│       │   ├── 00379.jpg
│       │   ├── 00589.jpg
│       │   └── 00643.jpg
│       ├── requirements.txt
│       └── weights/
│           ├── MOT_yolov12n.pt
│           └── SOT_yolov12l.pt
├── data/
│   ├── demo/
│   ├── MOT/
│   │   └── README.md
│   └── SOT/
│       └── README.md
├── LICENSE
└── README.md

🔨 Reproduction

Run Commands

Executing the following commands can reproduce the leaderboard results.

Data Analysis

$ cd BoT-SORT/

# Table 1
$ python3 getInfo.py

Train YOLOv12

Refer to the README for more information.

$ cd BoT-SORT/yolov12/

# Run training with default settings
$ python3 train.py

Train BoT-SORT-ReID

Refer to the README for more information.

$ cd BoT-SORT/

# Train with final config
$ python3 fast_reid/tools/train_net.py --config-file ./logs/sbs_S50/config.yaml MODEL.DEVICE "cuda:0"

Inference

$ cd BoT-SORT/

# Track 1
$ python3 tools/predict_track1.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/SOT/track1_test/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 4 --save_path_answer ./submit/track1/test --hide-labels-name
# output: ./runs/detect/, ./submit/track1/test/

# Track 2
$ python3 tools/predict_track2.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/SOT/track2_test/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 1 --save_path_answer ./submit/track2/test --hide-labels-name
# output: ./runs/detect/, ./submit/track2/test/

# Track 3
$ chmod +x run_track3.sh
$ ./run_track3.sh
# output: ./runs/detect/, ./submit/track3/test/

✨ Models

Model	size ^(pixels)	AP^val 50-95	params ^(M)	FLOPs ^(G)	Note
SOT_yolov12l.pt	640	67.2	26.3	88.5
MOT_yolov12n.pt (ReID)	1600	68.5	2.6	6.3	#4 (Comment)

📜 Citation

If you find this project helpful for your research or applications, we would appreciate it if you could cite the paper and give it a star.

@InProceedings{Chen_2025_CVPR,
    author    = {Chen, Yu-Hsi},
    title     = {Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {6573-6582}
}

🙏 Acknowledgments

Much of the code builds upon YOLOv12, BoT-SORT, and TrackEval. We also sincerely thank the organizers of the Anti-UAV benchmark for providing the valuable dataset. We greatly appreciate their contributions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YOLOv12-BoT-SORT-ReID

🏁 Beyond Strong Baseline: Multi-UAV Tracking Competition ₊˚⊹

🗞️ News

🚀 Quickstart: Installation and Demonstration

1. Multi-Class on a Walkway Scene

2. Common Objects Underwater

3. UAVDB

🛠️ Implementation Details

Laptop

HPC

🖻 Data Preparation

📂 Folder Structure

🔨 Reproduction

✨ Models

📜 Citation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
BoT-SORT		BoT-SORT
TrackEval		TrackEval
assets		assets
data		data
LICENSE		LICENSE
README.md		README.md

License

wish44165/YOLOv12-BoT-SORT-ReID

Folders and files

Latest commit

History

Repository files navigation

YOLOv12-BoT-SORT-ReID

🏁 Beyond Strong Baseline: Multi-UAV Tracking Competition ₊˚⊹

🗞️ News

🚀 Quickstart: Installation and Demonstration

1. Multi-Class on a Walkway Scene

2. Common Objects Underwater

3. UAVDB

🛠️ Implementation Details

Laptop

HPC

🖻 Data Preparation

📂 Folder Structure

🔨 Reproduction

✨ Models

📜 Citation

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages