Authors: Group 21 - Alessandro Da Ros, Ergi Livanaj, Sushrut Patil, Alexandru Radu, Gabriel Merle, Mateusz Lotko
Course: Capstone Data Challenge (JBG060)
Institution: Eindhoven University of Technology
Stakeholder: ReefSupport
- Project Overview
- Problem Statement & Context
- Solution Approach
- Repository Structure
- Installation & Environment Setup
- Data Pipeline
- Model Training
- Evaluation & Results
- Reproducing Results -> (FOR TEACHING STAFF): This is where you find the main Jupyter Notebook for the reproduction of results, analysis, and plots.
- Testing & Quality Assurance
- Documentation
- Citation & References
Coral-MTL is a hierarchical multi-task learning framework for automated coral reef health assessment from underwater imagery. The system simultaneously predicts:
- Genus identification (8 coral genera)
- Health status (alive/bleached/dead)
- Contextual information (fish, human artifacts, substrate, background, biota)
- +9.9% global mIoU over baseline SegFormer
- +32.6% boundary IoU improvement (critical for cover estimation)
- +28.9% boundary F1 enhancement (perimeter fidelity)
- Honest uncertainty reporting (ECE, NLL, Brier scores without post-hoc calibration)
- Baseline SegFormer - Single flat decoder (40 classes)
- MTL Focused - 2 primary tasks (genus+health) + 5 auxiliary
- MTL Holistic - All 7 tasks as primary with full cross-attention (best performer)
Our MTL design is based on Explicit Feature Exchange, allowing for tasks to dynamically "query" each other for relevant context using a cross-attention mechanism. Given the necessity of making our model sustainable we develop a full cross-attention only for primary tasks (all MLP decoders) yet, to provide additional context we still enable feature-exchange of auxiliary tensors (1x1 convolutional blocks). In detail, primary decoders perform cross-attention: pooled Q from the task attends to concatenated K/V from all other tasks. Their enriched features are gated and fused with the original. Auxiliary heads (fish, human_artifacts, substrate) are lightweight; they provide K/V context but don’t attend, acting as regularizers.
ReefSupport requires automated coral reef health assessment at the pace of image collection. Manual annotation cannot keep up with data acquisition rates, delaying management decisions. Current semi-automated tools:
- Miss colony boundaries (critical for cover estimation)
- Cannot jointly answer "what coral" and "how healthy"
- Lack calibrated confidence for expert triage
Design a unified model that:
- Provides dense, pixel-wise genus and health segmentation
- Achieves high boundary quality (not just pixel accuracy)
- Reports calibrated probabilities for uncertainty-aware deployment
- Works robustly under variable field conditions (turbidity, lighting, depth)
Coralscapes: 2075 images, 35 sites, 5 Red Sea countries
- Dense pixel-level annotations (39 benthic classes)
- Live/bleached/dead labels for coral health
- Challenging conditions: variable depth, turbidity, illumination
Encoder: SegFormer-B2 backbone (25.4M parameters)
Decoders:
- Primary tasks (genus, health): Full MLP decoders with cross-attention feature exchange
- Auxiliary tasks (5 context heads): Lightweight regularizers for boundary sharpening
Innovation: Explicit feature exchange via cross-attention allows genus (morphology) and health (appearance) predictions to inform each other, while auxiliary heads prevent "coral-shaped background" false positives.
- Loss: Dice + Focal hybrid with IMGrad gradient balancing
- Optimization: AdamW with polynomial decay (6e-5 LR, 50 epochs)
- Augmentation: Physics-plausible underwater transformations (haze, color cast, blur)
- Sampling: Poisson Disk Sampling (PDS) to reduce spatial redundancy
- Split: Site-level hold-out (70% train, 15% val, 15% test)
Primary: Global mIoU, Boundary IoU (BIoU), Boundary F1
Calibration: ECE, NLL, Brier (reported as-is, no post-hoc adjustment)
Diagnostics: TIDE-inspired error decomposition (classification/background/missed)
dbl_capstone/
├── configs/ # YAML experiment configurations
│ ├── baseline_comparisons/ # Production training configs
│ └── task_definitions.yaml # Hierarchical class definitions
│
├── dataset/ # Dataset storage
│ └── processed/
│ └── pds_patches/ # PDS-sampled patches (generated)
│
├── experiments/ # Training outputs
│ ├── baseline_comparisons/
│ │ ├── coral_baseline_b2_run/ # Baseline results
│ │ ├── coral_mtl_b2_focused_run/ # MTL Focused results
│ │ └── coral_mtl_b2_holistic_run/# MTL Holistic results (best)
│ └── baselines_comparison/
│ └── train_val_test_script.py # HPC training orchestrator
│
├── notebooks/ # Analysis & visualization
│ ├── FINAL_NOTEBOOK.ipynb # Complete results reproduction
│ └── utils/ # Notebook helper utilities
│
├── latex/ # Report & poster source
│ ├── Methodology/ # Final report (LaTeX)
│ └── Poster_Data_shallange/ # Conference poster
│
├── pds_launcher/ # Dataset preprocessing
│ ├── pds_simple_script.py # PDS execution script
│ └── pds_config.py # Sampling parameters
│
├── src/coral_mtl/ # Core library (installable)
│ ├── ExperimentFactory.py # Central orchestrator
│ ├── data/ # Dataloaders & augmentations
│ ├── model/ # Architecture components
│ ├── engine/ # Training, losses, optimizers
│ ├── metrics/ # 3-tier metrics system
│ ├── scripts/ # Utility scripts (PDS, analysis)
│ └── utils/ # Task splitters & helpers
│
├── tests/ # Pytest suite
│ └── coral_mtl_tests/ # Mirrors src/ structure
│
├── requirements.txt # Python dependencies
├── pytest.ini # Test configuration
└── README.md # This file
- Python 3.9+
- CUDA 11.8+ (for GPU training, optional for evaluation)
- 48GB+ VRAM (for training, RTX 6000Ada or similar)
- 8GB+ RAM (sufficient for evaluation/inference)
-
Clone repository
git clone https://github.com/A-DaRo/dbl_capstone cd dbl_capstone -
Install dependencies
python.exe -m pip install --upgrade pip pip install -r requirements.txt # Refer to `requirements.txt` if you want to check the specific version of each dependency -
Install project as editable package
pip install -e .This enables package-style imports (
from coral_mtl.ExperimentFactory import ...) -
Verify installation
pytest tests/coral_mtl_tests/data/ -v
-
MOCK run on CPU with small dataset
python tests/trial_run_test.py
Training (HPC/Cloud):
- GPU: 48GB+ VRAM (RTX 6000Ada, A6000, A100)
- RAM: 64GB+ system memory
- Storage: 500GB+ for dataset + checkpoints
- Time: ~9 hours per model (50 epochs)
Evaluation/Inference (Laptop):
- GPU: Optional (CPU sufficient for evaluation)
- RAM: 8GB+ system memory
- Storage: 10GB for checkpoints + outputs
Dataset Source: Coralscapes on Zenodo
The Coralscapes dataset is distributed as a compressed .7z archive.
# Install required utilities (Ubuntu/Debian)
sudo apt install p7zip-full curl
# Navigate to parent directory (one level above project root)
cd ..
# Download dataset
curl -L -o coralscapes.7z "https://zenodo.org/records/15061505/files/coralscapes.7z?download=1"
# Alternative: using wget
wget -O coralscapes.7z "https://zenodo.org/records/15061505/files/coralscapes.7z?download=1"
# Extract archive (creates coralscapes/ directory)
7z x coralscapes.7z
# Verify structure
ls coralscapes/
# Expected output: leftImg8bit/ gtFine/ README.md (or similar)
# Return to project directory
cd dbl_capstoneManual (Recommended for Windows)
- Download from browser: Direct Link
- Install 7-Zip if not present
- Extract
coralscapes.7zto the parent directory ofdbl_capstone/ - Final structure should be:
your_workspace/ ├── dbl_capstone/ # This project └── coralscapes/ # Dataset (extracted here) ├── leftImg8bit/ └── gtFine/
Purpose: Extract spatially distributed patches to reduce redundancy in overlapping orthomosaics.
Configuration: Edit pds_launcher/pds_config.py
DATASET_ROOT = project_root.parent / "coralscapes" # Points to ../coralscapes by default
PATCH_SIZE = 512
PDS_RADIUS = 300 # Minimum distance between patch centersExecution:
cd pds_launcher
python pds_simple_script.pyOutputs:
dataset/processed/pds_patches/- ~15,000 patches (train/val/test)experiments/pds/data_analysis/- Distribution reports
Inspect distribution:
python src/coral_mtl/scripts/analyze_patch_distribution.py \
--dataset_root dataset/processed/pds_patches \
--output experiments/pds/data_analysisTraining is controlled via YAML configs in configs/baseline_comparisons/:
baseline_config.yaml- Single-task SegFormer baselinefocused_mtl_config.yaml- MTL with 2 primary tasks (genus, health) + 5 auxiliaryholistic_mtl_config.yaml- MTL with all 7 tasks as primary (best performer)mtl_config.yaml- Generic MTL configuration template
Key parameters:
model:
type: "CoralMTL" # or "SegFormerBaseline"
params:
backbone: "nvidia/mit-b2"
decoder_channel: 256
attention_dim: 128
data:
batch_size: 4
patch_size: 512
pds_train_path: "./dataset/processed/pds_patches/"
loss:
type: "CompositeHierarchical"
weighting_strategy:
type: "IMGrad" # Gradient balancing
optimizer:
params:
lr: 6.0e-5
weight_decay: 0.01
trainer:
epochs: 50
device: "cuda"
output_dir: "experiments/baseline_comparisons/coral_mtl_b2_holistic_run"Script: experiments/baselines_comparison/train_val_test_script.py
Run all models:
python experiments/baselines_comparison/train_val_test_script.py --mode bothRun specific model:
# Baseline only
python experiments/baselines_comparison/train_val_test_script.py --mode baseline
# MTL only
python experiments/baselines_comparison/train_val_test_script.py --mode mtlEvaluation only (skip training):
python experiments/baselines_comparison/train_val_test_script.py --eval-onlyEach run creates:
experiments/baseline_comparisons/<run_name>/
├── best_model.pth # Best checkpoint (model selection metric)
├── history.json # Per-epoch training/validation metrics
├── test_metrics_full_report.json # Final test evaluation
├── test_cms.jsonl # Per-image confusion matrices
└── loss_diagnostics.jsonl # Gradient norms, cosine similarity
Track progress via history.json:
import json
history = json.load(open('experiments/.../history.json'))
print(f"Epoch 50 mIoU: {history['global.mIoU'][-1]:.4f}")Automatic (part of training pipeline):
python experiments/baselines_comparison/train_val_test_script.py --mode mtlManual (specific checkpoint):
from coral_mtl.ExperimentFactory import ExperimentFactory
factory = ExperimentFactory('configs/baseline_comparisons/mtl_config.yaml')
results = factory.run_evaluation(
checkpoint_path='experiments/.../best_model.pth'
)| Metric | Baseline | MTL Focused | MTL Holistic |
|---|---|---|---|
| Global mIoU ↑ | 0.3888 | 0.4039 | 0.4272 (+9.9%) |
| Global BIoU ↑ | 0.0937 | 0.1075 | 0.1243 (+32.6%) |
| Boundary F1 ↑ | 0.1714 | 0.1942 | 0.2211 (+28.9%) |
| ECE ↓ | 0.1014 | 0.1275 | 0.1423 |
| NLL ↓ | 1.2239 | 1.3995 | 1.5162 |
| Brier ↓ | 0.5016 | 0.4959 | 0.4937 |
Interpretation:
- MTL Holistic achieves best segmentation/boundary quality
- Baseline slightly better calibrated (lower ECE/NLL)
- Trade-off reflects conservative behavior in low-contrast scenes
TIDE-inspired decomposition shows:
- Classification errors: ↓ 15% (Baseline → Holistic)
- Background FPs: ↓ 28% (fewer "coral-shaped substrate" errors)
- Missed regions: ↑ 12% (conservative in low contrast)
Actionable: Target data curation for faint bleaching, modest focal loss rebalancing.
Make sure to have installed all dependencies before running the Jupyter Notebook! Restart the Jupyter Notebook kernel after installation of the dependencies if necessary!
python.exe -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .Notebook: notebooks/FINAL_NOTEBOOK.ipynb
Execution (laptop-friendly):
jupyter notebook notebooks/FINAL_NOTEBOOK.ipynbGenerated artifacts:
- All report figures →
latex/Methodology/Result-figures/ - All poster figures →
latex/Poster_Data_shallange/Result-figures/ - Statistics →
notebooks/extra_plots_stats/
Sections:
- Environment setup & dependency verification
- Data pipeline documentation (PDS)
- Training overview (references HPC execution)
- Experiment artifact discovery
- Training dynamics analysis
- Per-class performance analysis
- Test set evaluation & inference
- Qualitative visualizations
- Architecture diagrams (Graphviz)
- Extra plots for report/poster
If training is already complete:
# Load notebook and run from Section 3 onwards
# All pre-trained checkpoints expected in experiments/baseline_comparisons/- Data preparation: Run PDS sampling (see Data Pipeline)
- Training: Execute HPC script (see Model Training)
- Analysis: Run complete notebook (see above)
- Report: Compile LaTeX sources in
latex/
Total time: ~12 hours
Structure: tests/ mirrors src/ package layout
Coverage: 85%+ on core modules (data, model, engine, metrics)
Full suite with coverage:
pytestTargeted tests:
# Data pipeline only
pytest tests/coral_mtl_tests/data/
# Loss functions only
pytest tests/coral_mtl_tests/engine/losses/
# Exclude integration tests
pytest -m "not integration"GPU tests (if CUDA available):
set CUDA_VISIBLE_DEVICES=0
pytest -m gpuCoverage reports:
# Terminal summary (automatic via pytest.ini)
pytest
# HTML report
pytest --cov-report=html
# Opens htmlcov/index.html@pytest.mark.gpu- Requires CUDA device@pytest.mark.integration- Slower end-to-end tests@pytest.mark.optdeps- Requires optional dependencies@pytest.mark.slow- Long-running tests
Before committing:
pytest -m "not integration" # Fast unit testsBefore major changes:
pytest # Full suite including integration-
Technical Specification -
project_specification/technical_specification.md- Complete API reference
- Component interfaces
- Factory orchestration
-
Theoretical Specification -
project_specification/theorethical_specification.md- Design rationale
- Multi-task learning theory
- Metric justifications
-
Loss & Optimization Guide -
project_specification/loss_and_optim_specification.md- Weighting strategies (Uncertainty, NashMTL, IMGrad)
- Gradient manipulation techniques
- Diagnostic-driven selection
-
Configuration Guide -
configs/CONFIGS_README.md- All YAML parameters documented
- Example configurations
- Validation checklist
- Final Report:
latex/Methodology/final-report.tex - Conference Poster:
latex/Poster_Data_shallange/poster1.tex - Results Notebook:
notebooks/FINAL_NOTEBOOK.ipynb
- Docstrings: NumPy style throughout
src/ - Type Hints: Full typing coverage in core modules
- Dataset: Sauder et al. (2025) - Coralscapes: Densely annotated coral reef dataset
- Baseline: Xie et al. (2021) - SegFormer: Simple and Efficient Design
- MTL Framework: Liu et al. (2019) - End-to-End Multi-Task Learning with Attention; D. N. Goncalves et al. (2023), MTLSegFormer: Multi-task Learning with Transformers for Semantic Segmentation in Precision Agricultures
- Gradient Balancing: Zhou et al. (2025) - IMGrad: Balancing Gradient Magnitude
- Boundary Metrics: Cheng et al. (2021) - Boundary IoU
- Error Decomposition: Bolya et al. (2020) - TIDE: A General Toolbox
Full bibliography: See latex/Methodology/references.bib
Group 21 Members:
- Alessandro Da Ros - a.da.ros@student.tue.nl
- Ergi Livanaj - e.livanaj@student.tue.nl
- Sushrut Patil - s.patil@student.tue.nl
- Alexandru Radu - i.a.radu@student.tue.nl
- Gabriel Merle - g.merle@student.tue.nl
- Mateusz Lotko - m.lotko@student.tue.nl
Institution: Eindhoven University of Technology
Course: JBG060 - Capstone Data Challenge
Stakeholder: ReefSupport
Issues: Please use repository issue tracker for bug reports or questions.
This project is developed for academic purposes as part of the Capstone Data Challenge course at TU/e. Code is provided as-is for educational and research use.
Dataset License: Coralscapes dataset follows its original license terms (see Hugging Face Hub).
- ReefSupport for problem formulation and domain expertise
- TU/e Faculty for guidance and compute resources
- Coralscapes Authors for the high-quality annotated dataset
- Open Source Community for foundational libraries (PyTorch, Hugging Face, etc.)
Version: 1.0 (Final Submission)