This project focuses on detecting small objects in images using deep learning techniques with Detectron2.
Small object detection remains a challenging task due to scale variation, occlusion, and limited pixel information β yet itβs essential for:
- π Traffic and surveillance analysis
- π°οΈ Aerial and satellite imaging
- π©Ί Medical image interpretation
- π€ Autonomous vehicles and robotics
The notebook walks through all key stages: dataset preparation, model configuration, training, and evaluation.
This project is dataset-agnostic β any COCO-style annotated dataset with small object instances can be used.
Example dataset structure:
dataset/
βββ train/
β βββ images/
β βββ annotations.json
βββ val/
β βββ images/
β βββ annotations.json
Each annotation file follows the COCO format, including bounding boxes, segmentation masks, and class labels.
Clone the repository and install the required dependencies:
git clone https://github.com/your-username/small-object-detection.git
cd small-object-detection
pip install -r requirements.txt
Or install manually:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install detectron2 opencv-python matplotlib tqdm
To train and evaluate the model, open the notebook:
jupyter notebook small-object-detection.ipynb
Or, if converted to a Python script:
python train_detectron2.py
Before training, the input data is preprocessed to ensure quality and consistency:
- Verify imageβannotation alignment
- Remove empty or corrupted samples
- Scale images while maintaining aspect ratio
- Normalize pixel intensity for Detectron2βs input format
- Convert to COCO JSON format
- Verify bounding box and category consistency
Training is performed using Detectron2, a Facebook AI Research (FAIR) framework.
- Base architecture:
faster_rcnn_R_50_FPN_3x(ResNet-50 backbone with FPN) - Pretrained weights from COCO used for transfer learning
- Learning rate:
0.00025 - Batch size:
4 - Iterations:
~5000 - Augmentation: random flips and resizing
- mAP (mean Average Precision) across IoU thresholds
- Separate evaluation for small, medium, and large objects
- Model checkpoints saved in
/outputafter every epoch - Final model stored as
model_final.pth
After training, performance is analyzed both quantitatively and visually.
- Example mAP (small objects): ~0.35
- Example mAP (medium/large): ~0.60
Bounding boxes and labels plotted over sample images.
Model predictions compared to ground truth for validation.
Example output visualization:
from detectron2.utils.visualizer import Visualizer
visualizer = Visualizer(image[:, :, ::-1], metadata=metadata)
out = visualizer.draw_instance_predictions(outputs["instances"].to("cpu"))
plt.imshow(out.get_image()[:, :, ::-1])
Common issues and fixes:
- Reduce batch size or image resolution
- Ensure correct paths and annotation file format in
DatasetCatalog.register()
- Use Feature Pyramid Network (FPN)
- Apply higher input resolution
- Consider multi-scale training
You can deploy the trained model using:
Run inference on an image locally:
python inference.py --image path/to/image.jpg --model output/model_final.pth
- Use Flask or FastAPI for REST inference
- Convert to ONNX or TorchScript for optimized runtime
This repository includes:
- β
Jupyter notebook β
small-object-detection.ipynb - β
Model checkpoints β
/output/model_final.pth - β
Visualization samples β
/visuals/ - β
Config and training logs β
/configs/ - β
Requirements file β
requirements.txt
- Successfully trained a small-object detector using Detectron2
- Adaptable to any COCO-style dataset
- Visual evaluation and mAP metrics integrated
- Enhances object detection for low-visibility and small-scale targets
- Useful for applications in surveillance, remote sensing, and medical imaging
- Experiment with custom backbones (e.g., Swin Transformer, ConvNeXt)
- Explore attention-based architectures for improved small object recall
- Optimize inference speed with quantization or pruning
This work is licensed under a Creative Commons AttributionβNonCommercialβNoDerivatives 4.0 International License.
Use in CVs, portfolios, or derivative works is not permitted without explicit permission from the author.