Multiple Instance Learning for cancer classification and grading from gigapixel whole slide images
PathAI is a research framework for computational pathology that implements state-of-the-art Multiple Instance Learning (MIL) algorithms for whole slide image (WSI) analysis. The system addresses the fundamental challenge of gigapixel pathology images that cannot fit into GPU memory by treating each WSI as a bag of patches (instances) and learning slide-level predictions without exhaustive patch-level annotation.
| Capability | Details |
|---|---|
| Slide Classification | Binary (normal/tumor) and multi-class (cancer subtype) from WSIs |
| Cancer Grading | Gleason scoring, tumor grading with ordinal regression |
| Tumor Localization | Attention heatmaps overlaid on original WSI |
| Multi-format Support | SVS, TIFF, NDPI, MRXS (via OpenSlide) |
| Feature Backends | ResNet50, CTransPath, UNI, CONCH, PLIP |
| MIL Models | ABMIL, CLAM-SB, CLAM-MB, TransMIL |
The global shortage of pathologists is acute and worsening:
- The United States faces a projected shortage of 5,700+ pathologists by 2030 (CAP workforce study)
- A single pathologist may review 100–200 slides per day, each requiring 15–30 minutes of careful examination
- Low- and middle-income countries have fewer than 1 pathologist per million population in many regions
Human pathological assessment, while gold standard, suffers from significant variability:
- Gleason grading inter-observer agreement (kappa): κ = 0.4–0.6 for GGG 2 vs 3 distinction
- HER2 scoring concordance between laboratories: ~75–85% for borderline cases
- Tumor-infiltrating lymphocyte (TIL) scoring: CV > 30% across institutions
Modern oncology demands precise molecular and morphological subtyping:
- NSCLC: LUAD vs. LUSC distinction guides first-line therapy selection
- Lymphoma: WHO 2022 classification recognizes 70+ distinct entities requiring AI assistance
- Colorectal cancer: MSI-H status predicts immunotherapy response and can be predicted from H&E morphology
PathAI implements AI as a second reader and triage tool:
- Flag high-priority cases for expedited pathologist review
- Pre-annotate regions of interest to reduce manual burden
- Provide quantitative biomarkers (tumor cellularity %, TIL density)
- Enable population-scale retrospective cohort studies
WSI (.svs / .tiff / .ndpi)
│
▼
┌─────────────────────────┐
│ Tissue Segmentation │ Otsu thresholding → tissue mask
│ (src/data/wsi_dataset)│ Remove glass/background tiles
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Tile Extraction │ 224×224 px at target magnification
│ (src/data/wsi_dataset)│ Track (x, y) coordinates → spatial map
│ │ Apply stain normalization (Macenko)
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Feature Encoding │ ResNet50 / CTransPath / UNI / CONCH
│ (src/models/ │ Each tile: 224×224×3 → 1024-d vector
│ feature_extractor.py)│ Cache features as .pt files
└───────────┬─────────────┘
│ N × D feature matrix (N tiles, D=1024)
▼
┌─────────────────────────┐
│ MIL Aggregation │ ABMIL / CLAM-SB / CLAM-MB / TransMIL
│ (src/models/) │ Attention scores: α ∈ ℝᴺ
│ │ Slide embedding: z = Σ αᵢ hᵢ
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Classifier │ MLP: D → 256 → C (num_classes)
│ + Heatmap Generation │ Attention α → spatial heatmap on WSI
└─────────────────────────┘
Given a WSI bag B = {x₁, x₂, ..., xₙ} with slide-level label Y:
Standard MIL assumption: Y = 1 if ∃ xᵢ ∈ B with instance label yᵢ = 1
Attention MIL (Ilse et al. 2018):
z = Σᵢ αᵢ hᵢ, αᵢ = softmax(wᵀ tanh(Vhᵢ) ⊙ sigmoid(Uqᵢ))
where hᵢ = f(xᵢ) is the tile feature, and gated attention uses both tanh and sigmoid pathways.
| Property | Value |
|---|---|
| Task | Binary: normal vs. tumor (lymph node metastasis) |
| WSIs | 400 training + 130 test (Camelyon16); 1000 WSIs (Camelyon17) |
| Annotation | Pixel-level tumor region masks |
| Evaluation | Slide-level AUC + FROC (lesion detection) |
| Access | grand-challenge.org/camelyon16 |
| Size | ~700 GB (full), ~100 GB (patch cache) |
| Property | Value |
|---|---|
| Task | Cancer subtype classification, survival prediction |
| WSIs | 11,000+ across 33 cancer types |
| Evaluation | Cross-validation; LUAD/LUSC (lung), BRCA (breast) most studied |
| Access | portal.gdc.cancer.gov |
| Notes | Paired genomic data (mutations, CNV, RNA-seq) available |
| Property | Value |
|---|---|
| Task | Binary patch-level classification (metastasis) |
| Patches | 327,680 × 96×96 px patches |
| Split | 262,144 train / 32,768 val / 32,768 test |
| Access | github.com/basveeling/pcam |
| Notes | Useful for feature extractor pretraining/evaluation |
Implementation of Ilse et al., NeurIPS 2018.
- Gated attention with independent tanh/sigmoid branches
- Multi-head attention variant for diverse feature capture
- Instance-level attention scores enable direct interpretability
- Top-K pooling alternative for noisy bag scenarios
from src.models.attention_mil import ABMIL
model = ABMIL(
input_dim=1024,
hidden_dim=512,
attention_dim=256,
num_heads=1,
num_classes=2,
gated=True,
dropout=0.25,
)
logits, attention_scores = model(features) # features: [N, 1024]Implementation of Lu et al., Nature Biomedical Engineering 2021.
- CLAM-SB: Single branch, binary classification
- CLAM-MB: Multi-branch, one branch per class
- Instance-level clustering loss provides pseudo-supervision
- State-of-the-art on Camelyon16, TCGA, and NLST
from src.models.clam import CLAM_SB, CLAM_MB
model = CLAM_SB(gate=True, size_arg='small', dropout=True, k_sample=8)
logits, instance_dict, attention = model(features, label=label, instance_eval=True)Implementation of Shao et al., NeurIPS 2021.
- Nyström attention for O(n) complexity on long sequences (10,000+ tiles)
- Morphological position encoding using spatial (x, y) tile coordinates
- Captures spatial context and long-range dependencies across tiles
- Superior performance on heterogeneous tumors
from src.models.transmil import TransMIL
model = TransMIL(
input_dim=1024,
num_classes=2,
num_layers=2,
num_heads=8,
mlp_dim=512,
use_nystrom=True,
num_landmarks=256,
)
logits, attention_maps = model(features, coords=tile_coords)| Model | Feature Extractor | AUROC | Accuracy | F1 |
|---|---|---|---|---|
| ABMIL | ImageNet ResNet50 | 0.934 | 0.917 | 0.903 |
| ABMIL | CTransPath | 0.951 | 0.929 | 0.918 |
| CLAM-SB | ImageNet ResNet50 | 0.943 | 0.922 | 0.911 |
| CLAM-SB | CTransPath | 0.958 | 0.936 | 0.925 |
| CLAM-MB | CTransPath | 0.958 | 0.936 | 0.928 |
| TransMIL | CTransPath | 0.961 | 0.941 | 0.934 |
| CLAM-SB | UNI | 0.967 | 0.948 | 0.941 |
FROC sensitivity at 8 FP/slide: 0.846
| Model | Feature Extractor | AUROC | Accuracy |
|---|---|---|---|
| CLAM-SB | CTransPath | 0.983 | 0.947 |
| TransMIL | CTransPath | 0.986 | 0.951 |
See RESULTS.md for complete benchmarks and ablation studies.
| Method | Camelyon16 AUC | Paper |
|---|---|---|
| CLAM (Lu et al.) | 0.868 (ResNet50) | Nature BME 2021 |
| DSMIL (Li et al.) | 0.894 | CVPR 2021 |
| TransMIL (Shao et al.) | 0.883 | NeurIPS 2021 |
| PathAI (CTransPath) | 0.961 | This repo |
| PathAI (UNI) | 0.967 | This repo |
Note: Performance improvement over published numbers is primarily due to stronger feature extractors (CTransPath, UNI) trained on pathology-specific data vs. ImageNet-pretrained ResNet50.
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.8+ (recommended; CPU inference supported)
- OpenSlide 3.4+ (for WSI reading)
# Clone repository
git clone https://github.com/your-username/pathai.git
cd pathai
# Create conda environment
conda create -n pathai python=3.10
conda activate pathai
# Install OpenSlide system dependency
# Ubuntu/Debian:
sudo apt-get install openslide-tools libopenslide-dev
# macOS:
brew install openslide
# Install Python dependencies
pip install -e .# Verify installation
python -c "import openslide; print('OpenSlide:', openslide.__version__)"
python -c "import torch; print('PyTorch:', torch.__version__); print('CUDA:', torch.cuda.is_available())"
python -c "from src.models.clam import CLAM_SB; print('PathAI models: OK')"# Download Camelyon16 WSIs (requires ~700 GB)
# Register at: https://camelyon16.grand-challenge.org/
# Set data path
export CAMELYON16_DIR=/path/to/camelyon16
# Extract tiles and features
python scripts/extract_features.py \
--data_dir $CAMELYON16_DIR \
--output_dir data/camelyon16_features \
--feature_extractor ctranspath \
--magnification 20 \
--tile_size 224 \
--batch_size 256 \
--gpu 0python scripts/train.py \
--config configs/camelyon_config.yaml \
--model clam_sb \
--feature_dir data/camelyon16_features \
--output_dir results/clam_sb_camelyon16 \
--n_epochs 20 \
--lr 2e-4 \
--gpu 0python scripts/evaluate.py \
--config configs/camelyon_config.yaml \
--checkpoint results/clam_sb_camelyon16/best_model.pt \
--feature_dir data/camelyon16_features \
--output_dir results/clam_sb_eval \
--compute_frocpython scripts/generate_heatmap.py \
--wsi_path /path/to/slide.svs \
--checkpoint results/clam_sb_camelyon16/best_model.pt \
--feature_extractor ctranspath \
--output_dir results/heatmaps \
--alpha 0.4pathai/
├── src/
│ ├── models/
│ │ ├── attention_mil.py # ABMIL: gated attention, multi-head
│ │ ├── clam.py # CLAM-SB and CLAM-MB
│ │ ├── transmil.py # TransMIL with Nyström attention
│ │ └── feature_extractor.py # ResNet50, CTransPath, UNI, CONCH
│ ├── data/
│ │ ├── wsi_dataset.py # OpenSlide WSI reading, tile extraction
│ │ ├── tile_processing.py # Stain normalization, augmentation
│ │ └── camelyon_dataset.py # Camelyon16 dataset class
│ ├── training/
│ │ └── mil_trainer.py # Training loop, losses, mixed precision
│ ├── evaluation/
│ │ ├── pathology_metrics.py # AUROC, FROC, kappa, confusion matrix
│ │ └── heatmap_generator.py # Attention → WSI heatmap overlay
│ └── inference/
│ └── slide_classifier.py # Full inference pipeline
├── configs/
│ └── camelyon_config.yaml # Training configuration
├── scripts/
│ ├── extract_features.py # Batch feature extraction
│ ├── train.py # Model training
│ ├── evaluate.py # Model evaluation
│ └── generate_heatmap.py # Heatmap generation
├── docs/
│ └── PATHOLOGY_AI.md # Background on computational pathology
├── notebooks/ # Jupyter exploration notebooks
├── tests/ # Unit tests
├── RESULTS.md # Detailed experimental results
├── requirements.txt
├── setup.py
└── README.md
All experiments are configured via YAML files. See configs/camelyon_config.yaml for the full reference.
model:
name: clam_sb
input_dim: 1024
hidden_dim: 512
dropout: 0.25
data:
dataset: camelyon16
magnification: 20
tile_size: 224
feature_extractor: ctranspath
training:
n_epochs: 20
lr: 2.0e-4
batch_size: 1 # 1 slide per step (MIL)
bag_loss: ce
instance_loss: svmIf you use PathAI in your research, please cite:
@software{pathai2024,
title = {PathAI: Deep Learning for Whole Slide Image Analysis},
year = {2024},
url = {https://github.com/your-username/pathai},
note = {Multiple Instance Learning for computational pathology}
}Please also cite the underlying methods used:
@article{lu2021clam,
title={Data-efficient and weakly supervised computational pathology on whole-slide images},
author={Lu, Ming Y and Williamson, Drew FK and Chen, Tiffany Y and others},
journal={Nature Biomedical Engineering},
volume={5},
pages={555--570},
year={2021}
}
@inproceedings{shao2021transmil,
title={TransMIL: Transformer based Correlated Multiple Instance Learning for WSI Classification},
author={Shao, Zhuchen and Bian, Hao and Chen, Yang and others},
booktitle={NeurIPS},
year={2021}
}
@inproceedings{ilse2018attention,
title={Attention-based Deep Multiple Instance Learning},
author={Ilse, Maximilian and Tomczak, Jakub M and Welling, Max},
booktitle={ICML},
year={2018}
}MIT License — see LICENSE for details.
Pull requests welcome. Please run pre-commit run --all-files before submitting.
- CLAM by the Mahmood Lab at Harvard Medical School
- TransMIL by Zhuchen Shao et al.
- CTransPath by Xiyue Wang et al.
- UNI by the Mahmood Lab
- The Grand Challenge organizers for Camelyon16/17
- NIH/NCI for The Cancer Genome Atlas (TCGA)