Deep Learning Leaf Species Classifier

Fine-grained image classification across 176 leaf species using transfer learning with ResNet-18. Trained on ~14.7K labeled leaf images, achieving ~87% validation accuracy.

Overview

Identifying plant species from leaf photographs is a challenging fine-grained recognition task — many species share similar shapes, venation patterns, and colors. This project builds an end-to-end pipeline from data exploration to a deployable inference script.

Key results:

87% validation accuracy on 176 classes using ResNet-18 pretrained on ImageNet
Clean CLI inference script for batch predictions on new images
Full reproducible training pipeline with fixed seeds

Sample Predictions

Below are sample training images from the dataset spanning various species:

Project Structure

├── report.ipynb        # Full training notebook — data exploration, training, evaluation
├── predict.py          # Standalone CLI inference script
├── label_map.json      # Class index → species name mapping
├── requirements.txt    # Python dependencies
└── README.md

Dataset

Training set: 14,682 labeled images across 176 species
Test set: 3,671 unlabeled images
Format: JPG images of varying sizes, resized to 224×224 for training

The dataset is not included in this repository. It consists of leaf photographs with species labels such as quercus_alba, acer_rubrum, pinus_echinata, etc.

Approach

Model: ResNet-18 with ImageNet pretrained weights, final fully connected layer replaced for 176-class output.

Training details:

80/20 stratified train/validation split (seed=42)
Adam optimizer, learning rate 1e-3 with StepLR decay
Data augmentation: random resized crop, horizontal flip
Normalization: ImageNet mean/std
Trained for 10 epochs on Google Colab (T4 GPU)

Preprocessing (train): RandomResizedCrop(224) → RandomHorizontalFlip → ImageNet normalize

Preprocessing (inference): Resize(256) → CenterCrop(224) → ImageNet normalize

Results

Metric	Value
Final Validation Accuracy	~87%
Number of Classes	176
Model Parameters	~11.2M
Training Time	~10 min (T4 GPU)

Common failure modes observed from misclassified samples:

Confusion between species within the same genus (e.g., pinus_densiflora vs pinus_echinata)
Visually similar leaves across different genera
Unusual poses, partial views, or inconsistent lighting

Quick Start

Install dependencies

pip install -r requirements.txt

Run inference

python predict.py \
    --data_dir /path/to/dataset \
    --input_csv /path/to/test.csv \
    --model_path model.pth \
    --label_map label_map.json \
    --output_csv predictions.csv \
    --device cpu

The output CSV contains two columns — image (path copied from input) and label (predicted species name).

Train from scratch

Open report.ipynb, set DATA_DIR to your dataset root, and run all cells. The notebook saves model.pth and label_map.json on completion.
Model weights are not included due to file size. Run report.ipynb to train and generate model.pth.

Applications

This pipeline generalizes readily to other image classification problems with minimal modification:

Agriculture & forestry: Crop disease detection, weed identification, tree species inventory for forest management — swap the dataset and retrain.
Biodiversity monitoring: Field biologists can use this to identify species from phone photos, feeding into ecological surveys and conservation tracking.
Invasive species detection: Train on native vs. invasive species to flag threats in new regions — the fine-grained classification approach handles subtle visual differences well.
Food & produce quality control: Classify fruit/vegetable varieties or grade produce quality on a sorting line — same ResNet backbone, different labels.
Medical image classification: Skin lesion classification, cell type identification, or pathology slide screening all follow the same transfer learning pattern with domain-specific data.
Retail & e-commerce: Product categorization from photos — clothing types, furniture styles, or any catalog with visual categories.

The core architecture (pretrained ResNet + replaced classifier head + CLI inference script) is a general-purpose template. Change the dataset, adjust the number of output classes, and retrain.

What I'd Do Next

Stronger backbone: Swap ResNet-18 for ResNet-50 or EfficientNet-B3 for better feature extraction
Aggressive augmentation: Add color jitter, random rotation, CutMix/MixUp to improve generalization
Learning rate scheduling: Use cosine annealing with warmup instead of StepLR
Test-time augmentation (TTA): Average predictions over multiple crops/flips at inference
Grad-CAM visualization: Show which leaf regions the model focuses on for each prediction
Lightweight deployment: Export to ONNX or TorchScript for faster CPU inference

Tech Stack

Python · PyTorch · torchvision · scikit-learn · Pandas · Matplotlib · PIL

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Leaf Species Classifier

Overview

Sample Predictions

Project Structure

Dataset

Approach

Results

Quick Start

Install dependencies

Run inference

Train from scratch

Applications

What I'd Do Next

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
label_map.json		label_map.json
predict.py		predict.py
report.ipynb		report.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Leaf Species Classifier

Overview

Sample Predictions

Project Structure

Dataset

Approach

Results

Quick Start

Install dependencies

Run inference

Train from scratch

Applications

What I'd Do Next

Tech Stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages