Fine-grained image classification across 176 leaf species using transfer learning with ResNet-18. Trained on ~14.7K labeled leaf images, achieving ~87% validation accuracy.
Identifying plant species from leaf photographs is a challenging fine-grained recognition task — many species share similar shapes, venation patterns, and colors. This project builds an end-to-end pipeline from data exploration to a deployable inference script.
Key results:
- 87% validation accuracy on 176 classes using ResNet-18 pretrained on ImageNet
- Clean CLI inference script for batch predictions on new images
- Full reproducible training pipeline with fixed seeds
Below are sample training images from the dataset spanning various species:
├── report.ipynb # Full training notebook — data exploration, training, evaluation
├── predict.py # Standalone CLI inference script
├── label_map.json # Class index → species name mapping
├── requirements.txt # Python dependencies
└── README.md
- Training set: 14,682 labeled images across 176 species
- Test set: 3,671 unlabeled images
- Format: JPG images of varying sizes, resized to 224×224 for training
The dataset is not included in this repository. It consists of leaf photographs with species labels such as quercus_alba, acer_rubrum, pinus_echinata, etc.
Model: ResNet-18 with ImageNet pretrained weights, final fully connected layer replaced for 176-class output.
Training details:
- 80/20 stratified train/validation split (seed=42)
- Adam optimizer, learning rate 1e-3 with StepLR decay
- Data augmentation: random resized crop, horizontal flip
- Normalization: ImageNet mean/std
- Trained for 10 epochs on Google Colab (T4 GPU)
Preprocessing (train): RandomResizedCrop(224) → RandomHorizontalFlip → ImageNet normalize
Preprocessing (inference): Resize(256) → CenterCrop(224) → ImageNet normalize
| Metric | Value |
|---|---|
| Final Validation Accuracy | ~87% |
| Number of Classes | 176 |
| Model Parameters | ~11.2M |
| Training Time | ~10 min (T4 GPU) |
Common failure modes observed from misclassified samples:
- Confusion between species within the same genus (e.g.,
pinus_densifloravspinus_echinata) - Visually similar leaves across different genera
- Unusual poses, partial views, or inconsistent lighting
pip install -r requirements.txtpython predict.py \
--data_dir /path/to/dataset \
--input_csv /path/to/test.csv \
--model_path model.pth \
--label_map label_map.json \
--output_csv predictions.csv \
--device cpuThe output CSV contains two columns — image (path copied from input) and label (predicted species name).
-
Open
report.ipynb, setDATA_DIRto your dataset root, and run all cells. The notebook savesmodel.pthandlabel_map.jsonon completion. -
Model weights are not included due to file size. Run report.ipynb to train and generate model.pth.
This pipeline generalizes readily to other image classification problems with minimal modification:
- Agriculture & forestry: Crop disease detection, weed identification, tree species inventory for forest management — swap the dataset and retrain.
- Biodiversity monitoring: Field biologists can use this to identify species from phone photos, feeding into ecological surveys and conservation tracking.
- Invasive species detection: Train on native vs. invasive species to flag threats in new regions — the fine-grained classification approach handles subtle visual differences well.
- Food & produce quality control: Classify fruit/vegetable varieties or grade produce quality on a sorting line — same ResNet backbone, different labels.
- Medical image classification: Skin lesion classification, cell type identification, or pathology slide screening all follow the same transfer learning pattern with domain-specific data.
- Retail & e-commerce: Product categorization from photos — clothing types, furniture styles, or any catalog with visual categories.
The core architecture (pretrained ResNet + replaced classifier head + CLI inference script) is a general-purpose template. Change the dataset, adjust the number of output classes, and retrain.
- Stronger backbone: Swap ResNet-18 for ResNet-50 or EfficientNet-B3 for better feature extraction
- Aggressive augmentation: Add color jitter, random rotation, CutMix/MixUp to improve generalization
- Learning rate scheduling: Use cosine annealing with warmup instead of StepLR
- Test-time augmentation (TTA): Average predictions over multiple crops/flips at inference
- Grad-CAM visualization: Show which leaf regions the model focuses on for each prediction
- Lightweight deployment: Export to ONNX or TorchScript for faster CPU inference
Python · PyTorch · torchvision · scikit-learn · Pandas · Matplotlib · PIL
MIT