A comprehensive experimental study comparing YOLOv8 Nano model performance across different input resolutions for garbage detection using the Roboflow dataset.
- Overview
- Dataset
- Experimental Setup
- Results
- Analysis
- Recommendations
- Hardware Performance
- Future Work
- Installation
- Usage
This project evaluates how input image resolution impacts object detection performance using YOLOv8 Nano. The study compares two configurations:
- Baseline: 416×416 pixels
- Experiment: 608×608 pixels
The goal is to understand the speed-accuracy trade-offs and provide guidance for optimal input size selection based on deployment scenarios.
Garbage Detection Dataset (Roboflow)
| Property | Value |
|---|---|
| Total Images | 1,255 |
| Number of Classes | 1 |
| Class Name | garbage |
| Format | YOLOv8 (YOLO11 compatible) |
| Training Set | 1,155 images (92%) |
| Validation Set | 50 images (4%) |
| Test Set | 50 images (4%) |
| Annotation | Bounding boxes with class labels |
Model: YOLOv8 Nano (YOLOv8n)
Framework: Ultralytics YOLOv8
Pre-trained: COCO weights
Hardware: NVIDIA Tesla T4 GPU (15GB VRAM)
Batch Size: 32
Epochs: 10
Optimizer: SGD
- Input Size: 416×416 pixels
- Training Time: 1.99 minutes
- Device: CUDA (GPU)
- Input Size: 608×608 pixels
- Training Time: 3.23 minutes
- Device: CUDA (GPU)
| Metric | 416×416 | 608×608 | Change | % Change |
|---|---|---|---|---|
| mAP@0.5 | 0.3520 | 0.3630 | +0.0110 | +3.1% |
| mAP@0.5:0.95 | 0.1420 | 0.1490 | +0.0070 | +4.9% |
| Precision | 0.5550 | 0.4900 | -0.0650 | -11.7% |
| Recall | 0.3220 | 0.4000 | +0.0780 | +24.2% |
| Inference Time (ms) | 1.3 | 2.6 | +1.3 | +100% |
| Training Time (min) | 1.99 | 3.23 | +1.24 | +62.3% |
| Model Size (MB) | 6.2 | 6.2 | — | — |
416×416 Configuration:
- Precision: 0.5563
- Recall: 0.3217
- mAP@0.5: 0.3520
- mAP@0.5:0.95: 0.1427
608×608 Configuration:
- Precision: 0.4895
- Recall: 0.4000
- mAP@0.5: 0.3624
- mAP@0.5:0.95: 0.1490
- +24.2% Recall Improvement - Significantly better at detecting garbage objects
- +3.1% mAP@0.5 Improvement - Higher overall accuracy
- Better Small Object Detection - Captures loose garbage and small debris
- Robust to Scale Variations - Handles objects of different sizes better
- Suitable for Accuracy-Critical Applications
- Faster Inference - 1.3ms vs 2.6ms per image (50% faster)
- Higher Precision - Fewer false positives (55.5% vs 49%)
- Faster Training - 1.99 min vs 3.23 min (38% faster)
- Resource Efficient - Better for edge devices and mobile deployment
- Lower Computational Cost
| Factor | 416×416 | 608×608 | Trade-off |
|---|---|---|---|
| Speed | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | Faster but less accurate |
| Accuracy | ⭐⭐⭐ | ⭐⭐⭐⭐ | More accurate but slower |
| Precision | ⭐⭐⭐⭐ | ⭐⭐⭐ | Higher precision, more false negatives |
| Recall | ⭐⭐⭐ | ⭐⭐⭐⭐ | Better detection rate |
| Efficiency | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | More resource efficient |
| Scenario | 416×416 | 608×608 | Notes |
|---|---|---|---|
| Trash bins with multiple items | Good | Better | Larger input helps detect clustering |
| Loose garbage | Misses some | Detects | Better for scattered items |
| Garbage in landfill | Similar | Similar | Both perform adequately |
| Urban litter | Limited | Better | Higher resolution captures small items |
| Mixed debris | Similar | Better | Slight advantage for 608×608 |
Use 608×608 with GPU
- Maximum accuracy (mAP@0.5: 0.3630)
- Best recall rate (40%)
- Real-time on GPU (2.6ms acceptable)
- Ideal when missing detections is costly
Use 416×416
- Lower computational requirements
- Faster inference (1.3ms per image)
- Better precision (fewer false alarms)
- Ideal when computational resources are limited
Use 416×416 with Quantization
- Deploy YOLOv8n-int8 or YOLOv8n-fp16
- Further reduce model size and inference time
- Maintain reasonable accuracy for mobile use
- Minimize False Positives: Use 416×416
- Minimize False Negatives: Use 608×608
- Balance Speed & Accuracy: Test 512×512 (future work)
Training Speed:
416×416: 1.99 minutes for 10 epochs
608×608: 3.23 minutes for 10 epochs
Speedup: ~1.6× faster with 416×416
Inference Speed:
416×416: 1.3 ms/image (~769 FPS)
608×608: 2.6 ms/image (~385 FPS)
Both: Real-time capable
Training Time (Estimated):
416×416: 30-50 minutes (15-25× slower than GPU)
608×608: 50-80 minutes (15-25× slower than GPU)
Inference Speed (Estimated):
416×416: 100-200 ms/image (not real-time)
608×608: 150-300 ms/image (not real-time)
GPU acceleration is essential for practical experimentation and deployment.
- Test intermediate resolution (512×512)
- Experiment with larger models (YOLOv8s, YOLOv8m)
- Model quantization (INT8, FP16)
- Comprehensive CPU benchmark
- Collect more diverse garbage images
- Domain-specific fine-tuning
- Real-world deployment testing
- Deploy on edge devices (NVIDIA Jetson, etc.)
- Python 3.8+
- CUDA 11.0+ (for GPU support)
- pip or conda
# Clone repository
git clone https://github.com/yourusername/yolov8-garbage-detection.git
cd yolov8-garbage-detection
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtultralytics>=8.0.0
torch>=1.9.0
torchvision>=0.10.0
numpy
opencv-python
matplotlib
from ultralytics import YOLO
# Load a pretrained model
model = YOLO('yolov8n.pt')
# Train with 416×416 input size
results_416 = model.train(
data='data.yaml',
imgsz=416,
epochs=10,
batch=32,
device=0 # GPU device ID
)
# Train with 608×608 input size
results_608 = model.train(
data='data.yaml',
imgsz=608,
epochs=10,
batch=32,
device=0
)from ultralytics import YOLO
# Load trained model
model = YOLO('runs/detect/train/weights/best.pt')
# Predict on image
results = model.predict(
source='image.jpg',
imgsz=416, # or 608
conf=0.5
)
# Display results
results[0].show()# Predict on directory
results = model.predict(
source='path/to/images/',
imgsz=416,
save=True,
save_txt=True
)| Property | Value |
|---|---|
| Model Architecture | YOLOv8 Nano (YOLOv8n) |
| Model Size | 6.2 MB |
| Parameters | ~3.2M |
| Framework | PyTorch |
| Pre-trained Dataset | COCO |
| Output Format | Bounding boxes + confidence |
| Supported Output | Detection, Segmentation, Classification |
This study demonstrates a clear speed-accuracy trade-off in object detection:
-
Larger Input (608×608) provides 24.2% better recall—essential for applications where missing garbage detection is costly.
-
Smaller Input (416×416) provides 50% faster inference—ideal for real-time constraints on resource-limited devices.
-
GPU Acceleration is critical for practical deployment and experimentation.
-
Both configurations are viable depending on specific requirements and deployment scenarios.
| Use Case | Recommended | Reason |
|---|---|---|
| Maximum Accuracy | 608×608 | Best recall and mAP |
| Real-time Edge Device | 416×416 | Fastest inference |
| Mobile App | 416×416 | Lower resource consumption |
| Server Deployment | 608×608 | GPU-backed, accuracy-first |
| Cost-Sensitive | 416×416 | Cheaper hardware needed |
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
For questions, issues or need complete .ipynb please contact
mail: ( amirthaganeshramesh@gmail.com )
Last Updated: 2025
Author: Amirthaganesh R