Shoe Counting — YOLO Object Detection

An ML pipeline to detect and count shoes in images using YOLOv8 segmentation. The model is trained on the LEN shoe_detection dataset (1150 images, MIT license).

Project structure

shoe_counting/
├── 01_explore_data.py      # Dataset stats and sample visualisation
├── 02_train.py             # Train the YOLO model
├── 03_evaluate.py          # Evaluate with detection + counting metrics
├── 04_compare_models.py    # Compare multiple trained models side by side
├── 05_predict.py           # Run inference on new images
├── requirements.txt
└── outputs/                # Auto-created — results, weights, plots
    ├── data_fixed.yaml
    ├── data_stats.json
    ├── runs/<model>/weights/best.pt
    ├── training_summary_<model>.json
    ├── metrics_<model>.json / .csv
    └── predictions/

Not in the repo: data/train, data/valid, data/test (large dataset splits), venv/, model weights *.pt data/predict/ (sample images) is included in the repo.

Setup

1. Clone and create a virtual environment

git clone https://github.com/<your-username>/shoe_counting.git
cd shoe_counting
python -m venv venv

# Activate (Windows)
venv\Scripts\activate

# Activate (Linux / macOS)
source venv/bin/activate

2. Install dependencies

CPU only:

pip install -r requirements.txt

With GPU (CUDA) — recommended:

pip install -r requirements.txt

# Then replace torch with the CUDA build. Check your version with: nvidia-smi
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121  # CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118  # CUDA 11.8

Verify CUDA is available:

python -c "import torch; print(torch.cuda.is_available())"
# True  ← GPU ready

3. Add the dataset

Download the dataset from Roboflow in YOLOv11 format and place it so the structure looks like:

shoe_counting/
└── data/
    ├── data.yaml
    ├── train/
    │   ├── images/
    │   └── labels/
    ├── valid/
    │   ├── images/
    │   └── labels/
    └── test/
        ├── images/
        └── labels/

Usage

Step 1 — Explore the dataset

python 01_explore_data.py

Outputs outputs/data_stats.json, outputs/sample_annotations.png, outputs/count_distribution.png.

Step 2 — Train the model

python 02_train.py

Options:

--model     Pretrained base model (default: yolov8n-seg.pt)
--epochs    Max training epochs   (default: 100)
--batch     Batch size            (default: 16, reduce to 8 if out of VRAM)
--imgsz     Input image size      (default: 432)
--patience  Early stopping        (default: 20 epochs without improvement)

Examples:

python 02_train.py --model yolov8n-seg.pt --epochs 100 --batch 16   # fast
python 02_train.py --model yolov8s-seg.pt --epochs 100 --batch 8    # more accurate
python 02_train.py --model yolo11n-seg.pt --epochs 150 --batch 16   # newest arch

Weights are saved to outputs/runs/<model>/weights/best.pt. Training time is recorded in outputs/training_summary_<model>.json.

Step 3 — Evaluate

python 03_evaluate.py

Options:

--weights   Path to best.pt (auto-detected if omitted)
--conf      Confidence threshold (default: 0.25)
--iou       NMS IoU threshold   (default: 0.45)

Produces:

outputs/metrics_<model>.json — all metrics
outputs/metrics_<model>.csv — per-image true vs predicted count
outputs/metrics_<model>_plots.png — visualisation charts
outputs/metrics_<model>_report.txt — human-readable summary

Step 4 — Compare models (optional)

Train and evaluate a second model, then:

python 04_compare_models.py

Produces outputs/comparison_table.csv and outputs/comparison_plots.png.

Step 5 — Predict on new images

Sample images are included in data/predict/: len3.jpg, len43.jpg, len48.jpg.

# Default — runs on data/predict/len3.jpg
python 05_predict.py

# Specific sample image
python 05_predict.py --source data/predict/len43.jpg

# All sample images at once
python 05_predict.py --source data/predict/

# Custom image
python 05_predict.py --source path/to/your/image.jpg

Options:

--source    Image file or folder (default: data/predict/len3.jpg)
--weights   Path to best.pt (auto-detected if omitted)
--conf      Confidence threshold (default: 0.25)
--iou       NMS IoU threshold   (default: 0.45)

Each image is saved to outputs/predictions/<name>_comparison.png as a side-by-side of original | prediction with segmentation masks, bounding boxes, and confidence scores drawn by the model.

Results — YOLOv8n-seg (baseline)

Training


Model	`yolov8n-seg.pt`
Epochs	100
Image size	432×432
Device	CUDA (GPU)
Training time	20 min 01 s

Detection / Segmentation metrics (test set)

Metric	Value
mAP@50	0.8515
mAP@50-95	0.8419
Precision	0.7284
Recall	0.8571
F1 Score	0.7875

Counting metrics (test set — 61 images)

Metric	Value
MAE	0.43
RMSE	0.79
MAPE	9.22%
Exact match rate	67.2%
Within ±1 rate	90.2%
Bias	+0.36 (slight over-counting)

Prediction example — `len3.jpg`

Left: original image. Right: model prediction with segmentation masks, bounding boxes and confidence scores.

Metrics explained

Metric	Type	Better when
mAP@50	Detection	Higher
mAP@50-95	Detection	Higher
Precision	Detection	Higher
Recall	Detection	Higher
F1	Detection	Higher
MAE	Counting	Lower — avg error per image
RMSE	Counting	Lower — penalises big errors
Exact match %	Counting	Higher — % images with perfect count
Within ±1 %	Counting	Higher
Bias	Counting	Near 0 — +positive = over-counting

Model options

Model	Params	Notes
`yolov8n-seg.pt`	~3.4 M	Nano — fastest, good baseline
`yolov8s-seg.pt`	~11.8 M	Small — good balance
`yolov8m-seg.pt`	~27 M	Medium — most accurate
`yolo11n-seg.pt`	~2.9 M	Newest architecture

Requirements

Python 3.9+
See requirements.txt
GPU strongly recommended (CUDA 11.8+ or 12.x)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shoe Counting — YOLO Object Detection

Project structure

Setup

1. Clone and create a virtual environment

2. Install dependencies

3. Add the dataset

Usage

Step 1 — Explore the dataset

Step 2 — Train the model

Step 3 — Evaluate

Step 4 — Compare models (optional)

Step 5 — Predict on new images

Results — YOLOv8n-seg (baseline)

Training

Detection / Segmentation metrics (test set)

Counting metrics (test set — 61 images)

Prediction example — `len3.jpg`

Metrics explained

Model options

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
outputs		outputs
.gitignore		.gitignore
01_explore_data.py		01_explore_data.py
02_train.py		02_train.py
03_evaluate.py		03_evaluate.py
04_compare_models.py		04_compare_models.py
05_predict.py		05_predict.py
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Shoe Counting — YOLO Object Detection

Project structure

Setup

1. Clone and create a virtual environment

2. Install dependencies

3. Add the dataset

Usage

Step 1 — Explore the dataset

Step 2 — Train the model

Step 3 — Evaluate

Step 4 — Compare models (optional)

Step 5 — Predict on new images

Results — YOLOv8n-seg (baseline)

Training

Detection / Segmentation metrics (test set)

Counting metrics (test set — 61 images)

Prediction example — len3.jpg

Metrics explained

Model options

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Prediction example — `len3.jpg`

Packages