Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID
Yu-Hsi Chen
Preface
The combination of YOLOv12 and BoT-SORT demonstrates strong object detection and tracking potential yet remains underexplored in current literature and implementations.

[1] Jocher, Glenn, et al. "ultralytics/yolov5: v6. 0-YOLOv5n'Nano'models, Roboflow integration, TensorFlow export, OpenCV DNN support." Zenodo (2021).
[2] Tian, Yunjie, Qixiang Ye, and David Doermann. "Yolov12: Attention-centric real-time object detectors." arXiv preprint arXiv:2502.12524 (2025).
[3] Zhang, Guangdong, et al. "Multi-object Tracking Based on YOLOX and DeepSORT Algorithm." International Conference on 5G for Future Wireless Networks. Cham: Springer Nature Switzerland, 2022.
[4] Aharon, Nir, Roy Orfaig, and Ben-Zion Bobrovsky. "Bot-sort: Robust associations multi-pedestrian tracking." arXiv preprint arXiv:2206.14651 (2022).
This repository provides a strong baseline for multi-UAV tracking in thermal infrared videos by leveraging YOLOv12 and BoT-SORT with ReID. Our approach significantly outperforms the widely adopted YOLOv5 with the DeepSORT pipeline, offering a high-performance foundation for UAV swarm tracking. Importantly, the established workflow in this repository can be easily integrated with any custom-trained model, extending its applicability beyond UAV scenarios. Refer to this section for practical usage examples.
๐น Preview - Strong Baseline
strong_baseline.webm
๐ Full video available at: Track 3
๐ See also SOT inferences: Track 1 and Track 2
๐ CVPR2025 | Workshops | 4th Anti-UAV Workshop | Track-1 | Track-2 | Track-3
๐น Preview - Single-Frame Enhancements
enhancements_MultiUAV-261.webm
๐ Full video available at: Enhancements
๐น Preview - Custom Model Inference
This section showcases example videos processed using a custom-trained model. The scenes are not limited to UAV footage or single-class detection. See ๐ Quickstart: Installation and Demonstration โ Run Inference Using a Custom-Trained Model
for more details.
๐น Preview - Vision in Action: Overview of All Videos
A complete visual overview of all training and test videos.
vision_in_action.webm
๐ Full video available at: Overview
Scenarios are categorized to evaluate tracking performance under diverse conditions:
- Takeoff - UAV launch phase: 2 videos.
- L - Larger UAV target: 15 videos.
- C - Cloud background: 39 videos.
- CF - Cloud (Fewer UAVs): 18 videos.
- T - Tree background: 68 videos.
- TF - Tree (Fewer UAVs): 14 videos.
- B - Scene with buildings: 11 videos.
- BB1 - Building Background 1: 4 videos.
- BB2 - Building Background 2: 17 videos.
- BB2P - Building Background 2 (UAV partially out of view): 8 videos.
- Landing - UAV landing phase: 4 videos.
TOTAL: 200 videos (151,384 frames)
๐น Preview - Vision in Action: Training Videos
demo_train.webm
๐ Full video available at: Training Videos
- Takeoff - UAV launch phase: 1 videos.
- L - Larger UAV target: 8 videos.
- C - Cloud background: 20 videos.
- CF - Cloud (Fewer UAVs): 9 videos.
- T - Tree background: 34 videos.
- TF - Tree (Fewer UAVs): 7 videos.
- B - Scene with buildings: 6 videos.
- BB1 - Building Background 1: 2 videos.
- BB2 - Building Background 2: 9 videos.
- BB2P - Building Background 2 (UAV partially out of view): 4 videos.
- Landing - UAV landing phase: 2 videos.
TOTAL: 102 videos (77,293 frames)
๐น Preview - Vision in Action: Test Videos
demo_test.webm
๐ Full video available at: Test Videos
- Takeoff - UAV launch phase: 1 videos.
- L - Larger UAV target: 7 videos.
- C - Cloud background: 19 videos.
- CF - Cloud (Fewer UAVs): 9 videos.
- T - Tree background: 34 videos.
- TF - Tree (Fewer UAVs): 7 videos.
- B - Scene with buildings: 5 videos.
- BB1 - Building Background 1: 2 videos.
- BB2 - Building Background 2: 8 videos.
- BB2P - Building Background 2 (UAV partially out of view): 4 videos.
- Landing - UAV landing phase: 2 videos.
TOTAL: 98 videos (74,538 frames)
- August 1, 2025: Submit now and challenge the Strong Baseline .
- July 30, 2025: ๐ง Corrected test data for the BB2P_02 sequence to fix a minor defect.
- July 27, 2025: ๐ Beyond Strong Baseline is now open for registration.
- July 23, 2025: The test data for the competition is now available.
- July 13, 2025: The training data for the competition is now available.
- June 21, 2025: Training scripts for YOLOv12 and BoT-SORT-ReID are now available.
- June 12, 2025: ๐ฅ 3rd Place Award in The 4th Anti-UAV Workshop & Challenge Track 3.
- June 6, 2025: Corrected mistyped numbers in Table 1 .
- April 25, 2025: Single-Frame Enhancement datasets are now available.
- April 23, 2025: Strong Baseline weights available: YOLOv12 | ReID .
- April 13, 2025: The datasets presented in Table 2 of the paper are now available.
- April 7, 2025: Our paper is now on arXiv .
- ๐ฅ Demos: Hugging Face | YouTube
- ๐ Quickstart: Colab Notebook | Kaggle Notebook
- ๐ Project Page: Link
Installation
$ conda create -n yolov12_botsort python=3.11 -y
$ conda activate yolov12_botsort
$ git clone https://github.com/wish44165/YOLOv12-BoT-SORT-ReID.git
$ cd YOLOv12-BoT-SORT-ReID/BoT-SORT/yolov12/
$ wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
$ pip install -r requirements.txt
$ cd ../
$ pip3 install torch torchvision torchaudio
$ pip3 install -r requirements.txt
$ pip3 install ultralytics
$ pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
$ pip3 install cython_bbox
$ pip3 install faiss-cpu
$ pip3 install seaborn
Folder Structure
The following folder structure will be created upon cloning this repository.
YOLOv12-BoT-SORT-ReID/
โโโ data/
โย ย โโโ demo/
โย ย ย ย โโโ MOT/
โย ย ย ย โย ย โโโ MultiUAV-003.mp4
โย ย ย ย โย ย โโโ Test_imgs/
โย ย ย ย โย ย โย ย โโโ MultiUAV-003/
โย ย ย ย โย ย โย ย โโโ MultiUAV-135/
โย ย ย ย โย ย โย ย โโโ MultiUAV-173/
โย ย ย ย โย ย โย ย โโโ MultiUAV-261/
โย ย ย ย โย ย โโโ TestLabels_FirstFrameOnly/
โย ย ย ย โย ย โโโ MultiUAV-003.txt
โย ย ย ย โย ย โโโ MultiUAV-135.txt
โย ย ย ย โย ย โโโ MultiUAV-173.txt
โย ย ย ย โย ย โโโ MultiUAV-261.txt
โย ย ย ย โโโ SOT/
โย ย ย ย โโโ Track1/
โย ย ย ย โย ย โโโ 20190926_111509_1_8/
โย ย ย ย โย ย โโโ 41_1/
โย ย ย ย โย ย โโโ new30_train-new/
โย ย ย ย โย ย โโโ wg2022_ir_050_split_01/
โย ย ย ย โโโ Track2/
โย ย ย ย โโโ 02_6319_0000-1499/
โย ย ย ย โโโ 3700000000002_110743_1/
โย ย ย ย โโโ DJI_0057_1/
โย ย ย ย โโโ wg2022_ir_032_split_04/
โโโ BoT-SORT/
Demonstration
Toy example with three tracks, including SOT and MOT.
$ cd BoT-SORT/
# Track 1
$ python3 tools/predict_track1.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/demo/SOT/Track1/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 4 --save_path_answer ./submit/track1/demo --hide-labels-name
# output: ./runs/detect/, ./submit/track1/demo/
# Track 2
$ python3 tools/predict_track2.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/demo/SOT/Track2/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 1 --save_path_answer ./submit/track2/demo --hide-labels-name
# output: ./runs/detect/, ./submit/track2/demo/
# Track 3
$ python3 tools/predict_track3.py --weights ./yolov12/weights/MOT_yolov12n.pt --source ../data/demo/MOT/ --img-size 1600 --device "0" --track_buffer 60 --save_path_answer ./submit/track3/demo --hide-labels-name
$ python3 tools/predict_track3.py --weights ./yolov12/weights/MOT_yolov12n.pt --source ../data/demo/MOT/ --img-size 1600 --device "0" --track_buffer 60 --save_path_answer ./submit/track3/demo --with-reid --fast-reid-config logs/sbs_S50/config.yaml --fast-reid-weights logs/sbs_S50/model_0016.pth --hide-labels-name
# output: ./runs/detect/, ./submit/track3/demo/
# Heatmap
$ cd yolov12/
$ python heatmap.py
# output: ./outputs/
Run Inference on Custom Data
This project supports flexible inference on image folders and video files, with or without initial object positions, specifically for MOT task.
python3 tools/inference.py \
--weights ./yolov12/weights/MOT_yolov12n.pt \
--source <path to folder or video> \
--with-initial-positions \
--initial-position-config <path to initial positions file (optional)> \
--img-size 1600 \
--track_buffer 60 \
--device "0" \
--agnostic-nms \
--save_path_answer ./submit/inference/ \
--with-reid \
--fast-reid-config logs/sbs_S50/config.yaml \
--fast-reid-weights logs/sbs_S50/model_0016.pth \
--hide-labels-name
Below are examples of supported inference settings:
# 1. Inference on Image Folder (without initial position)
python3 tools/inference.py \
--weights ./yolov12/weights/MOT_yolov12n.pt \
--source ../data/demo/MOT/Test_imgs/MultiUAV-003/ \
--img-size 1600 \
--track_buffer 60 \
--device "0" \
--agnostic-nms \
--save_path_answer ./submit/inference/ \
--with-reid \
--fast-reid-config logs/sbs_S50/config.yaml \
--fast-reid-weights logs/sbs_S50/model_0016.pth \
--hide-labels-name
# 2. Inference on Image Folder (with initial position)
python3 tools/inference.py \
--weights ./yolov12/weights/MOT_yolov12n.pt \
--source ../data/demo/MOT/Test_imgs/MultiUAV-003/ \
--with-initial-positions \
--initial-position-config ../data/demo/MOT/TestLabels_FirstFrameOnly/MultiUAV-003.txt \
--img-size 1600 \
--track_buffer 60 \
--device "0" \
--agnostic-nms \
--save_path_answer ./submit/inference/ \
--with-reid \
--fast-reid-config logs/sbs_S50/config.yaml \
--fast-reid-weights logs/sbs_S50/model_0016.pth \
--hide-labels-name
# 3. Inference on Video (without initial position)
python3 tools/inference.py \
--weights ./yolov12/weights/MOT_yolov12n.pt \
--source ../data/demo/MOT/MultiUAV-003.mp4 \
--img-size 1600 \
--track_buffer 60 \
--device "0" \
--agnostic-nms \
--save_path_answer ./submit/inference/ \
--with-reid \
--fast-reid-config logs/sbs_S50/config.yaml \
--fast-reid-weights logs/sbs_S50/model_0016.pth \
--hide-labels-name
# 4. Inference on Video (with initial position)
python3 tools/inference.py \
--weights ./yolov12/weights/MOT_yolov12n.pt \
--source ../data/demo/MOT/MultiUAV-003.mp4 \
--with-initial-positions \
--initial-position-config ../data/demo/MOT/TestLabels_FirstFrameOnly/MultiUAV-003.txt \
--img-size 1600 \
--track_buffer 60 \
--device "0" \
--agnostic-nms \
--save_path_answer ./submit/inference/ \
--with-reid \
--fast-reid-config logs/sbs_S50/config.yaml \
--fast-reid-weights logs/sbs_S50/model_0016.pth \
--hide-labels-name
Run Inference Using a Custom Trained Model
This project also supports flexible inference using a custom-trained model for any MOT task. Below are the instructions for reproducing the preview section.
$ cd BoT-SORT/
$ wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12x.pt
$ wget https://github.com/FoundationVision/ByteTrack/raw/main/videos/palace.mp4
$ python3 tools/inference.py \
--weights yolov12x.pt \
--source palace.mp4 \
--img-size 640 \
--device "0" \
--save_path_answer ./submit/palace/
for f in ./videos/COU/*.mp4; do
python3 tools/inference.py \
--weights ./yolov12/runs/det/train/weights/best.pt \
--source "$f" \
--img-size 1600 \
--device "0" \
--save_path_answer ./submit/COU/
done
for f in ./videos/UAVDB/*.mp4; do
python3 tools/inference.py \
--weights ./yolov12/runs/det/train/weights/best.pt \
--source "$f" \
--img-size 1600 \
--device "0" \
--save_path_answer ./submit/UAVDB/
done
Hardware Information
Experiments were conducted on two platforms: (1) a local system with an Intel Core i7-12650H CPU, NVIDIA RTX 4050 GPU, and 16 GB RAM for data processing and inference, and (2) an HPC system with an NVIDIA H100 GPU and 80 GB memory for model training.
- CPU: Intelยฎ Coreโข i7-12650H
- GPU: NVIDIA GeForce RTX 4050 Laptop GPU (6GB)
- RAM: 23734MiB
- GPU: Spartan gpu-h100 (80GB), gpu-a100 (80GB)
Officially Released
4th_Anti-UAV_Challenge/
โโโ baseline/
โย ย โโโ Baseline_code.zip
โย ย โโโ MultiUAV_Baseline_code_and_submissi.zip
โโโ test/
โย ย โโโ MultiUAV_Test.zip
โย ย โโโ track1_test.zip
โย ย โโโ track2_test.zip
โโโ train/
โโโ MultiUAV_Train.zip
โโโ train.zip
- Train
- Track 1 & Track 2: Google Drive | Baidu
- Track 3: Google Drive | Baidu
- Test
- Track 1: Google Drive | Baidu
- Track 2: Google Drive | Baidu
- Track 3: Google Drive | Baidu
Strong Baseline
train/
โโโ MOT/
โย ย โโโ AntiUAV_train_val.zip
โโโ ReID/
โย ย โโโ MOT20_subset.zip
โย ย โโโ MOT20.zip
โโโ SOT/
โโโ AntiUAV_train_val_test.zip
โโโ AntiUAV_train_val.zip
Enhancements
enhancements/
โโโ MOT/
โย ย โโโ CLAHE_train_val.zip
โย ย โโโ Sobel-based_Edge_Sharpening_train_val.zip
โย ย โโโ Sobel-based_Image_Gradients_train_val.zip
โโโ ReID/
โโโ CLAHE_subset.zip
โโโ Sobel-based_Edge_Sharpening_subset.zip
โโโ Sobel-based_Image_Gradients_subset.zip
Project Layout
Follow the folder structure below to ensure smooth execution and easy navigation.
YOLOv12-BoT-SORT-ReID/
โโโ BoT-SORT/
โย ย โโโ getInfo.py
โย ย โโโ datasets/
โย ย โย ย โโโ README.md
โย ย โโโ fast_reid/
โย ย โย ย โโโ datasets/
โย ย โย ย ย ย โโโ generate_mot_patches.py
โย ย โย ย ย ย โโโ README.md
โย ย โโโ logs/
โย ย โย ย โโโ sbs_S50/
โย ย โ โย ย โโโ config.yaml
โย ย โ โย ย โโโ model_0016.pth
โย ย โย ย โโโ README.md
โย ย โโโ requirements.txt
โย ย โโโ runs/
โย ย โย ย โโโ README.md
โย ย โโโ submit/
โย ย โย ย โโโ README.md
โย ย โโโ tools/
โย ย โย ย โโโ predict_track1.py
โย ย โย ย โโโ predict_track2.py
โย ย โย ย โโโ predict_track3.py
โย ย โโโ yolov12/
โย ย โโโ heatmap.py
โย ย โโโ imgs_dir/
โย ย โย ย โโโ 00096.jpg
โย ย โย ย โโโ 00379.jpg
โย ย โย ย โโโ 00589.jpg
โย ย โย ย โโโ 00643.jpg
โย ย โโโ requirements.txt
โย ย โโโ weights/
โย ย โโโ MOT_yolov12n.pt
โย ย โโโ SOT_yolov12l.pt
โโโ data/
โย ย โโโ demo/
โย ย โโโ MOT/
โย ย โย ย โโโ README.md
โย ย โโโ SOT/
โย ย โโโ README.md
โโโ LICENSE
โโโ README.md
Run Commands
Executing the following commands can reproduce the leaderboard results.
Data Analysis
$ cd BoT-SORT/
# Table 1
$ python3 getInfo.py
Train YOLOv12
Refer to the README for more information.
$ cd BoT-SORT/yolov12/
# Run training with default settings
$ python3 train.py
Train BoT-SORT-ReID
Refer to the README for more information.
$ cd BoT-SORT/
# Train with final config
$ python3 fast_reid/tools/train_net.py --config-file ./logs/sbs_S50/config.yaml MODEL.DEVICE "cuda:0"
Inference
$ cd BoT-SORT/
# Track 1
$ python3 tools/predict_track1.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/SOT/track1_test/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 4 --save_path_answer ./submit/track1/test --hide-labels-name
# output: ./runs/detect/, ./submit/track1/test/
# Track 2
$ python3 tools/predict_track2.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/SOT/track2_test/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 1 --save_path_answer ./submit/track2/test --hide-labels-name
# output: ./runs/detect/, ./submit/track2/test/
# Track 3
$ chmod +x run_track3.sh
$ ./run_track3.sh
# output: ./runs/detect/, ./submit/track3/test/
Model | size (pixels) |
APval 50-95 |
params (M) |
FLOPs (G) |
Note |
---|---|---|---|---|---|
SOT_yolov12l.pt | 640 | 67.2 | 26.3 | 88.5 | |
MOT_yolov12n.pt (ReID) | 1600 | 68.5 | 2.6 | 6.3 | #4 (Comment) |
If you find this project helpful for your research or applications, we would appreciate it if you could cite the paper and give it a star.
@InProceedings{Chen_2025_CVPR,
author = {Chen, Yu-Hsi},
title = {Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
month = {June},
year = {2025},
pages = {6573-6582}
}
Much of the code builds upon YOLOv12, BoT-SORT, and TrackEval. We also sincerely thank the organizers of the Anti-UAV benchmark for providing the valuable dataset. We greatly appreciate their contributions!