Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@ A Protocol for End-to-End Deep Learning in Computational Pathology".

We recommend installing STAMP with [uv](https://docs.astral.sh/uv/):
```bash
uv sync --all extras
git clone https://github.com/KatherLab/STAMP.git

cd STAMP/

uv sync --all-extras

source .venv/bin/activate
```
Expand Down
117 changes: 116 additions & 1 deletion getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,11 @@ meaning ignored that it was ignored during feature extraction.
[mstar]: https://huggingface.co/Wangyh/mSTAR
[musk]: https://huggingface.co/xiangjx/musk
[plip]: https://github.com/PathologyFoundation/plip
[TITAN]: https://huggingface.co/MahmoodLab/TITAN
[COBRA2]: https://huggingface.co/KatherLab/COBRA
[EAGLE]: https://github.com/KatherLab/EAGLE
[MADELEINE]: https://huggingface.co/MahmoodLab/madeleine



## Doing Cross-Validation on the Data Set
Expand Down Expand Up @@ -252,4 +257,114 @@ Afterwards, the `output_dir` should contain the following files:
as well as the 95% confidence interval for the statistical scores
for the splits.
- `roc-curve_isMSIH=yes.svg` and `pr-curve_isMSIH=yes.svg`
contain the ROC and precision recall curves of the splits.
contain the ROC and precision recall curves of the splits.

## Slide-Level Encoding
Tile-Level features can be enconded into a single feature per slide, this is useful
when trying to capture global patterns across whole slides.

STAMP currently supports the following encoders:
- [CHIEF][CHIEF_CTRANSPATH]
- [TITAN]
- [GIGAPATH]
- [COBRA2]
- [EAGLE]
- [MADELEINE]

Slide encoders take as input the already extracted tile-level features in the
preprocessing step. Each encoder accepts only certain extractors and most
work only on CUDA devices:

| Encoder | Required Extractor | Compatible Devices |
|--|--|--|
| CHIEF | CTRANSPATH, CHIEF-CTRANSPATH | CUDA only |
| TITAN | CONCH1.5 | CUDA, cpu, mps
| GIGAPATH | GIGAPATH | CUDA only
| COBRA2 | CONCH, UNI, VIRCHOW2 or H-OPTIMUS-0 | CUDA only
| EAGLE | CTRANSPATH, CHIEF-CTRANSPATH | CUDA only
| MADELEINE | CONCH | CUDA only

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crazy, I didnt know so many are cuda only? I thought only mamba models were affected Oo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some like CHIEF which I think that should be easy to accept non cuda devices, but as soon as I read in their github installation guides that they were tested on CUDA and they did not work on the pipeline, i just left it like that. Maybe we should add an issue to debug each model on a mac and linux without cuda to see if we can make them compatible.


As with feature extractors, most of these models require you to request
access. The following example uses CHIEF, which is available if you installed
STAMP with `uv sync --all-extras`. The configuration should look like this:

```yaml
# stamp-test-experiment/config.yaml

slide_encoding:
# Encoder to use for slide encoding. Possible options are "cobra",
# "eagle", "titan", "gigapath", "chief", "prism", "madeleine".
encoder: "chief"

# Directory to save the output files.
output_dir: "/path/to/save/files/to"

# Directory where the extracted features are stored.
feat_dir: "/path/your/extracted/features/are/stored/in"

# Device to run slide encoding on ("cpu", "cuda", "cuda:0", etc.)
device: "cuda"

# Optional settings:
# Directory where the aggregated features are stored. Needed for
# some encoders such as eagle (it requires virchow2 features).
#agg_feat_dir: "/path/your/aggregated/features/are/stored/in"

# Add a hash of the entire preprocessing codebase in the feature folder name.
#generate_hash: True
```

Don't forget to put in `feat_dir` a path containing, in this case, `ctranspath` or
`chief-ctranspath` tile-level features. Once everything is set, you can simply run:

```sh
stamp --config stamp-test-experiment/config.yaml encode_slides
```
The output will be one `.h5` file per slide.

## Patient-Level Encoding
Even though the available encoders are designed for slide-level use, this
option concatenates the slides of a patient along the x-axis, creating a single
"virtual" slide that contains two blocks of tissue. The configuration is the same
except for `slide_table` which is required to link slides with patients.
```yaml
# stamp-test-experiment/config.yaml

patient_encoding:
# Encoder to use for patient encoding. Possible options are "cobra",
# "eagle", "titan", "gigapath", "chief", "prism", "madeleine".
encoder: "eagle"

# Directory to save the output files.
output_dir: "/path/to/save/files/to"

# Directory where the extracted features are stored.
feat_dir: "/path/your/extracted/features/are/stored/in"

# A table (.xlsx or .csv) relating every slide to their feature files.
# The table must contain at least two columns, one titled "SLIDE",
# containing the slide ID, and one called "FILENAME", containing the feature file path relative to `feat_dir`.
slide_table: "/path/of/slide.csv"

# Device to run slide encoding on ("cpu", "cuda", "cuda:0", etc.)
device: "cuda"

# Optional settings:
patient_label: "PATIENT"
filename_label: "FILENAME"

# Directory where the aggregated features are stored. Needed for
# some encoders such as eagle (it requires virchow2 features).
#agg_feat_dir: "/path/your/aggregated/features/are/stored/in"

# Add a hash of the entire preprocessing codebase in the feature folder name.
#generate_hash: True
```

Then run:
```sh
stamp --config stamp-test-experiment/config.yaml encode_patients
```

The output `.h5` features will have the patient's id as name.
4 changes: 2 additions & 2 deletions src/stamp/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ heatmaps:
slide_encoding:
# Encoder to use for slide encoding. Possible options are "cobra",
# "eagle", "titan", "gigapath", "chief", "prism", "madeleine".
encoder: "eagle"
encoder: "chief"

# Directory to save the output files.
output_dir: "/path/to/save/files/to"
Expand All @@ -246,7 +246,7 @@ slide_encoding:
patient_encoding:
# Encoder to use for patient encoding. Possible options are "cobra",
# "eagle", "titan", "gigapath", "chief", "prism", "madeleine".
encoder: "eagle"
encoder: "chief"

# Directory to save the output files.
output_dir: "/path/to/save/files/to"
Expand Down