Skip to content

wish44165/YOLOv12-BoT-SORT-ReID

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

35 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

YOLOv12-BoT-SORT-ReID

Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID

Yu-Hsi Chen

arXiv PyPI - Python Version Hugging Face Demo Colab Notebook Kaggle Notebook

DOI MSI Spartan

ResearchGate Medium YouTube

Preface

The combination of YOLOv12 and BoT-SORT demonstrates strong object detection and tracking potential yet remains underexplored in current literature and implementations.

[1] Jocher, Glenn, et al. "ultralytics/yolov5: v6. 0-YOLOv5n'Nano'models, Roboflow integration, TensorFlow export, OpenCV DNN support." Zenodo (2021).
[2] Tian, Yunjie, Qixiang Ye, and David Doermann. "Yolov12: Attention-centric real-time object detectors." arXiv preprint arXiv:2502.12524 (2025).
[3] Zhang, Guangdong, et al. "Multi-object Tracking Based on YOLOX and DeepSORT Algorithm." International Conference on 5G for Future Wireless Networks. Cham: Springer Nature Switzerland, 2022.
[4] Aharon, Nir, Roy Orfaig, and Ben-Zion Bobrovsky. "Bot-sort: Robust associations multi-pedestrian tracking." arXiv preprint arXiv:2206.14651 (2022).

This repository provides a strong baseline for multi-UAV tracking in thermal infrared videos by leveraging YOLOv12 and BoT-SORT with ReID. Our approach significantly outperforms the widely adopted YOLOv5 with the DeepSORT pipeline, offering a high-performance foundation for UAV swarm tracking. Importantly, the established workflow in this repository can be easily integrated with any custom-trained model, extending its applicability beyond UAV scenarios. Refer to this section for practical usage examples.

๐Ÿ“น Preview - Strong Baseline
strong_baseline.webm

๐Ÿ”— Full video available at: Track 3

๐Ÿ” See also SOT inferences: Track 1 and Track 2

๐ŸŒ CVPR2025 | Workshops | 4th Anti-UAV Workshop | Track-1 | Track-2 | Track-3

๐Ÿ“น Preview - Single-Frame Enhancements
enhancements_MultiUAV-261.webm

๐Ÿ”— Full video available at: Enhancements

๐Ÿ“น Preview - Custom Model Inference

This section showcases example videos processed using a custom-trained model. The scenes are not limited to UAV footage or single-class detection. See ๐Ÿš€ Quickstart: Installation and Demonstration โ†’ Run Inference Using a Custom-Trained Model for more details.

1. Multi-Class on a Walkway Scene
palace.webm

๐Ÿ”— Original video: palace.mp4

2. Common Objects Underwater
cou.webm

๐Ÿ”— Full video available at: COU.mp4

3. UAVDB
uavdb.webm

๐Ÿ”— Full video available at: UAVDB.mp4

๐Ÿ Beyond Strong Baseline: Multi-UAV Tracking Competition โ‚ŠหšโŠน

๐Ÿ“น Preview - Vision in Action: Overview of All Videos

A complete visual overview of all training and test videos.

vision_in_action.webm

๐Ÿ”— Full video available at: Overview

Scenarios are categorized to evaluate tracking performance under diverse conditions:

  • Takeoff - UAV launch phase: 2 videos.
  • L - Larger UAV target: 15 videos.
  • C - Cloud background: 39 videos.
  • CF - Cloud (Fewer UAVs): 18 videos.
  • T - Tree background: 68 videos.
  • TF - Tree (Fewer UAVs): 14 videos.
  • B - Scene with buildings: 11 videos.
  • BB1 - Building Background 1: 4 videos.
  • BB2 - Building Background 2: 17 videos.
  • BB2P - Building Background 2 (UAV partially out of view): 8 videos.
  • Landing - UAV landing phase: 4 videos.

TOTAL: 200 videos (151,384 frames)

๐Ÿ“น Preview - Vision in Action: Training Videos

DOI

demo_train.webm

๐Ÿ”— Full video available at: Training Videos

  • Takeoff - UAV launch phase: 1 videos.
  • L - Larger UAV target: 8 videos.
  • C - Cloud background: 20 videos.
  • CF - Cloud (Fewer UAVs): 9 videos.
  • T - Tree background: 34 videos.
  • TF - Tree (Fewer UAVs): 7 videos.
  • B - Scene with buildings: 6 videos.
  • BB1 - Building Background 1: 2 videos.
  • BB2 - Building Background 2: 9 videos.
  • BB2P - Building Background 2 (UAV partially out of view): 4 videos.
  • Landing - UAV landing phase: 2 videos.

TOTAL: 102 videos (77,293 frames)

๐Ÿ“น Preview - Vision in Action: Test Videos

DOI

demo_test.webm

๐Ÿ”— Full video available at: Test Videos

  • Takeoff - UAV launch phase: 1 videos.
  • L - Larger UAV target: 7 videos.
  • C - Cloud background: 19 videos.
  • CF - Cloud (Fewer UAVs): 9 videos.
  • T - Tree background: 34 videos.
  • TF - Tree (Fewer UAVs): 7 videos.
  • B - Scene with buildings: 5 videos.
  • BB1 - Building Background 1: 2 videos.
  • BB2 - Building Background 2: 8 videos.
  • BB2P - Building Background 2 (UAV partially out of view): 4 videos.
  • Landing - UAV landing phase: 2 videos.

TOTAL: 98 videos (74,538 frames)

๐Ÿ“น Preview - Vision in Action: Beyond Strong Baseline

DOI

๐Ÿ”— View the competition on Codabench

๐Ÿ—ž๏ธ News

๐Ÿš€ Quickstart: Installation and Demonstration

Colab Notebook Kaggle Notebook

Installation
$ conda create -n yolov12_botsort python=3.11 -y
$ conda activate yolov12_botsort
$ git clone https://github.com/wish44165/YOLOv12-BoT-SORT-ReID.git
$ cd YOLOv12-BoT-SORT-ReID/BoT-SORT/yolov12/
$ wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
$ pip install -r requirements.txt
$ cd ../
$ pip3 install torch torchvision torchaudio
$ pip3 install -r requirements.txt
$ pip3 install ultralytics
$ pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
$ pip3 install cython_bbox
$ pip3 install faiss-cpu
$ pip3 install seaborn
Folder Structure

The following folder structure will be created upon cloning this repository.

YOLOv12-BoT-SORT-ReID/
โ”œโ”€โ”€ data/
โ”‚ย ย  โ””โ”€โ”€ demo/
โ”‚ย ย   ย ย  โ”œโ”€โ”€ MOT/
โ”‚ย ย   ย ย  โ”‚ย ย  โ”œโ”€โ”€ MultiUAV-003.mp4
โ”‚ย ย   ย ย  โ”‚ย ย  โ”œโ”€โ”€ Test_imgs/
โ”‚ย ย   ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ MultiUAV-003/
โ”‚ย ย   ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ MultiUAV-135/
โ”‚ย ย   ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ MultiUAV-173/
โ”‚ย ย   ย ย  โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ MultiUAV-261/
โ”‚ย ย   ย ย  โ”‚ย ย  โ””โ”€โ”€ TestLabels_FirstFrameOnly/
โ”‚ย ย   ย ย  โ”‚ย ย      โ”œโ”€โ”€ MultiUAV-003.txt
โ”‚ย ย   ย ย  โ”‚ย ย      โ”œโ”€โ”€ MultiUAV-135.txt
โ”‚ย ย   ย ย  โ”‚ย ย      โ”œโ”€โ”€ MultiUAV-173.txt
โ”‚ย ย   ย ย  โ”‚ย ย      โ””โ”€โ”€ MultiUAV-261.txt
โ”‚ย ย   ย ย  โ””โ”€โ”€ SOT/
โ”‚ย ย   ย ย      โ”œโ”€โ”€ Track1/
โ”‚ย ย   ย ย      โ”‚ย ย  โ”œโ”€โ”€ 20190926_111509_1_8/
โ”‚ย ย   ย ย      โ”‚ย ย  โ”œโ”€โ”€ 41_1/
โ”‚ย ย   ย ย      โ”‚ย ย  โ”œโ”€โ”€ new30_train-new/
โ”‚ย ย   ย ย      โ”‚ย ย  โ””โ”€โ”€ wg2022_ir_050_split_01/
โ”‚ย ย   ย ย      โ””โ”€โ”€ Track2/
โ”‚ย ย   ย ย          โ”œโ”€โ”€ 02_6319_0000-1499/
โ”‚ย ย   ย ย          โ”œโ”€โ”€ 3700000000002_110743_1/
โ”‚ย ย   ย ย          โ”œโ”€โ”€ DJI_0057_1/
โ”‚ย ย   ย ย          โ””โ”€โ”€ wg2022_ir_032_split_04/
โ””โ”€โ”€ BoT-SORT/
Demonstration

Toy example with three tracks, including SOT and MOT.

$ cd BoT-SORT/

# Track 1
$ python3 tools/predict_track1.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/demo/SOT/Track1/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 4 --save_path_answer ./submit/track1/demo --hide-labels-name
# output: ./runs/detect/, ./submit/track1/demo/

# Track 2
$ python3 tools/predict_track2.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/demo/SOT/Track2/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 1 --save_path_answer ./submit/track2/demo --hide-labels-name
# output: ./runs/detect/, ./submit/track2/demo/

# Track 3
$ python3 tools/predict_track3.py --weights ./yolov12/weights/MOT_yolov12n.pt --source ../data/demo/MOT/ --img-size 1600 --device "0" --track_buffer 60 --save_path_answer ./submit/track3/demo --hide-labels-name
$ python3 tools/predict_track3.py --weights ./yolov12/weights/MOT_yolov12n.pt --source ../data/demo/MOT/ --img-size 1600 --device "0" --track_buffer 60 --save_path_answer ./submit/track3/demo --with-reid --fast-reid-config logs/sbs_S50/config.yaml --fast-reid-weights logs/sbs_S50/model_0016.pth --hide-labels-name
# output: ./runs/detect/, ./submit/track3/demo/

# Heatmap
$ cd yolov12/
$ python heatmap.py
# output: ./outputs/
Run Inference on Custom Data

This project supports flexible inference on image folders and video files, with or without initial object positions, specifically for MOT task.

python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source <path to folder or video> \
    --with-initial-positions \
    --initial-position-config <path to initial positions file (optional)> \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name

Below are examples of supported inference settings:

# 1. Inference on Image Folder (without initial position)
python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source ../data/demo/MOT/Test_imgs/MultiUAV-003/ \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name

# 2. Inference on Image Folder (with initial position)
python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source ../data/demo/MOT/Test_imgs/MultiUAV-003/ \
    --with-initial-positions \
    --initial-position-config ../data/demo/MOT/TestLabels_FirstFrameOnly/MultiUAV-003.txt \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name

# 3. Inference on Video (without initial position)
python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source ../data/demo/MOT/MultiUAV-003.mp4 \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name

# 4. Inference on Video (with initial position)
python3 tools/inference.py \
    --weights ./yolov12/weights/MOT_yolov12n.pt \
    --source ../data/demo/MOT/MultiUAV-003.mp4 \
    --with-initial-positions \
    --initial-position-config ../data/demo/MOT/TestLabels_FirstFrameOnly/MultiUAV-003.txt \
    --img-size 1600 \
    --track_buffer 60 \
    --device "0" \
    --agnostic-nms \
    --save_path_answer ./submit/inference/ \
    --with-reid \
    --fast-reid-config logs/sbs_S50/config.yaml \
    --fast-reid-weights logs/sbs_S50/model_0016.pth \
    --hide-labels-name
Run Inference Using a Custom Trained Model

This project also supports flexible inference using a custom-trained model for any MOT task. Below are the instructions for reproducing the preview section.

$ cd BoT-SORT/

1. Multi-Class on a Walkway Scene

$ wget https://github.com/sunsmarterjie/yolov12/releases/download/v1.0/yolov12x.pt
$ wget https://github.com/FoundationVision/ByteTrack/raw/main/videos/palace.mp4
$ python3 tools/inference.py \
    --weights yolov12x.pt \
    --source palace.mp4 \
    --img-size 640 \
    --device "0" \
    --save_path_answer ./submit/palace/

2. Common Objects Underwater

DOI

for f in ./videos/COU/*.mp4; do
    python3 tools/inference.py \
        --weights ./yolov12/runs/det/train/weights/best.pt \
        --source "$f" \
        --img-size 1600 \
        --device "0" \
        --save_path_answer ./submit/COU/
done

3. UAVDB

DOI

for f in ./videos/UAVDB/*.mp4; do
    python3 tools/inference.py \
        --weights ./yolov12/runs/det/train/weights/best.pt \
        --source "$f" \
        --img-size 1600 \
        --device "0" \
        --save_path_answer ./submit/UAVDB/
done

๐Ÿ› ๏ธ Implementation Details

Hardware Information

Experiments were conducted on two platforms: (1) a local system with an Intel Core i7-12650H CPU, NVIDIA RTX 4050 GPU, and 16 GB RAM for data processing and inference, and (2) an HPC system with an NVIDIA H100 GPU and 80 GB memory for model training.

Laptop

Spartan

  • CPU: Intelยฎ Coreโ„ข i7-12650H
  • GPU: NVIDIA GeForce RTX 4050 Laptop GPU (6GB)
  • RAM: 23734MiB

HPC

Spartan

  • GPU: Spartan gpu-h100 (80GB), gpu-a100 (80GB)

๐Ÿ–ป Data Preparation

Officially Released

DOI

4th_Anti-UAV_Challenge/
โ”œโ”€โ”€ baseline/
โ”‚ย ย  โ”œโ”€โ”€ Baseline_code.zip
โ”‚ย ย  โ””โ”€โ”€ MultiUAV_Baseline_code_and_submissi.zip
โ”œโ”€โ”€ test/
โ”‚ย ย  โ”œโ”€โ”€ MultiUAV_Test.zip
โ”‚ย ย  โ”œโ”€โ”€ track1_test.zip
โ”‚ย ย  โ””โ”€โ”€ track2_test.zip
โ””โ”€โ”€ train/
    โ”œโ”€โ”€ MultiUAV_Train.zip
    โ””โ”€โ”€ train.zip
Strong Baseline

DOI Hugging Face Datasets

train/
โ”œโ”€โ”€ MOT/
โ”‚ย ย  โ””โ”€โ”€ AntiUAV_train_val.zip
โ”œโ”€โ”€ ReID/
โ”‚ย ย  โ”œโ”€โ”€ MOT20_subset.zip
โ”‚ย ย  โ””โ”€โ”€ MOT20.zip
โ””โ”€โ”€ SOT/
    โ”œโ”€โ”€ AntiUAV_train_val_test.zip
    โ””โ”€โ”€ AntiUAV_train_val.zip
Enhancements

DOI

enhancements/
โ”œโ”€โ”€ MOT/
โ”‚ย ย  โ”œโ”€โ”€ CLAHE_train_val.zip
โ”‚ย ย  โ”œโ”€โ”€ Sobel-based_Edge_Sharpening_train_val.zip
โ”‚ย ย  โ””โ”€โ”€ Sobel-based_Image_Gradients_train_val.zip
โ””โ”€โ”€ ReID/
    โ”œโ”€โ”€ CLAHE_subset.zip
    โ”œโ”€โ”€ Sobel-based_Edge_Sharpening_subset.zip
    โ””โ”€โ”€ Sobel-based_Image_Gradients_subset.zip

๐Ÿ“‚ Folder Structure

Project Layout

Follow the folder structure below to ensure smooth execution and easy navigation.

YOLOv12-BoT-SORT-ReID/
โ”œโ”€โ”€ BoT-SORT/
โ”‚ย ย  โ”œโ”€โ”€ getInfo.py
โ”‚ย ย  โ”œโ”€โ”€ datasets/
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ README.md
โ”‚ย ย  โ”œโ”€โ”€ fast_reid/
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ datasets/
โ”‚ย ย  โ”‚ย ย   ย ย  โ”œโ”€โ”€ generate_mot_patches.py
โ”‚ย ย  โ”‚ย ย   ย ย  โ””โ”€โ”€ README.md
โ”‚ย ย  โ”œโ”€โ”€ logs/
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ sbs_S50/
โ”‚ย ย  โ”‚   โ”‚ย ย  โ”œโ”€โ”€ config.yaml
โ”‚ย ย  โ”‚   โ”‚ย ย  โ””โ”€โ”€ model_0016.pth
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ README.md
โ”‚ย ย  โ”œโ”€โ”€ requirements.txt
โ”‚ย ย  โ”œโ”€โ”€ runs/
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ README.md
โ”‚ย ย  โ”œโ”€โ”€ submit/
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ README.md
โ”‚ย ย  โ”œโ”€โ”€ tools/
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ predict_track1.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ predict_track2.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ predict_track3.py
โ”‚ย ย  โ””โ”€โ”€ yolov12/
โ”‚ย ย      โ”œโ”€โ”€ heatmap.py
โ”‚ย ย      โ”œโ”€โ”€ imgs_dir/
โ”‚ย ย      โ”‚ย ย  โ”œโ”€โ”€ 00096.jpg
โ”‚ย ย      โ”‚ย ย  โ”œโ”€โ”€ 00379.jpg
โ”‚ย ย      โ”‚ย ย  โ”œโ”€โ”€ 00589.jpg
โ”‚ย ย      โ”‚ย ย  โ””โ”€โ”€ 00643.jpg
โ”‚ย ย      โ”œโ”€โ”€ requirements.txt
โ”‚ย ย      โ””โ”€โ”€ weights/
โ”‚ย ย          โ”œโ”€โ”€ MOT_yolov12n.pt
โ”‚ย ย          โ””โ”€โ”€ SOT_yolov12l.pt
โ”œโ”€โ”€ data/
โ”‚ย ย  โ”œโ”€โ”€ demo/
โ”‚ย ย  โ”œโ”€โ”€ MOT/
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ README.md
โ”‚ย ย  โ””โ”€โ”€ SOT/
โ”‚ย ย      โ””โ”€โ”€ README.md
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ README.md

๐Ÿ”จ Reproduction

Run Commands

Executing the following commands can reproduce the leaderboard results.

Data Analysis
$ cd BoT-SORT/

# Table 1
$ python3 getInfo.py
Train YOLOv12

Refer to the README for more information.

$ cd BoT-SORT/yolov12/

# Run training with default settings
$ python3 train.py
Train BoT-SORT-ReID

Refer to the README for more information.

$ cd BoT-SORT/

# Train with final config
$ python3 fast_reid/tools/train_net.py --config-file ./logs/sbs_S50/config.yaml MODEL.DEVICE "cuda:0"
Inference
$ cd BoT-SORT/

# Track 1
$ python3 tools/predict_track1.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/SOT/track1_test/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 4 --save_path_answer ./submit/track1/test --hide-labels-name
# output: ./runs/detect/, ./submit/track1/test/

# Track 2
$ python3 tools/predict_track2.py --weights ./yolov12/weights/SOT_yolov12l.pt --source ../data/SOT/track2_test/ --img-size 640 --device "0" --conf-thres 0.01 --iou-thres 0.01 --track_high_thresh 0.1 --track_low_thresh 0.01 --fuse-score --agnostic-nms --min_box_area 1 --save_path_answer ./submit/track2/test --hide-labels-name
# output: ./runs/detect/, ./submit/track2/test/

# Track 3
$ chmod +x run_track3.sh
$ ./run_track3.sh
# output: ./runs/detect/, ./submit/track3/test/

โœจ Models

Hugging Face Models

Model size
(pixels)
APval
50-95
params
(M)
FLOPs
(G)
Note
SOT_yolov12l.pt 640 67.2 26.3 88.5
MOT_yolov12n.pt (ReID) 1600 68.5 2.6 6.3 #4 (Comment)

๐Ÿ“œ Citation

If you find this project helpful for your research or applications, we would appreciate it if you could cite the paper and give it a star.

@InProceedings{Chen_2025_CVPR,
    author    = {Chen, Yu-Hsi},
    title     = {Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {6573-6582}
}
Star History Chart

๐Ÿ™ Acknowledgments

Much of the code builds upon YOLOv12, BoT-SORT, and TrackEval. We also sincerely thank the organizers of the Anti-UAV benchmark for providing the valuable dataset. We greatly appreciate their contributions!