GENESIS

Code repository for the Graph nEural Networks for pulmonary EmboliSm rIsk Stratification (GENESIS) project.

Publications

Description

A project applying tabular models and graph neural networks to the task of pulmonary embolism risk stratification.

For tabular models, it is assumed that a CSV of global features (i.e., medical records, cardiac biomarkers, vascular biomarkers) for all patients is available. This code repository can then fit tabular models to predict the risk of pulmonary embolism from these features.

For graph neural networks (GNNs), it is assumed that an image processing pipeline previously segmented and extracted the graph of the vascular tree from 3D CTPA images. This code repository then takes these graphs as input and trains GNNs to predict the risk of pulmonary embolism from the vascular graphs and global features.

Important

Using this project requires a basic understanding of PyTorch Lightning and Hydra. If you do not know at least what these libraries do and how they work at a high level, you should familiarize yourself with them. We refer you to the PyTorch Lightning documentation and the Hydra documentation.

Installation

uv (recommended)

Note

uv is a Python package and project manager. It allows you to manage Python interpreters, dependencies, and project configuration in a single tool. If you don't have it installed already, you can install it (on Linux and macOS) by running:

curl -LsSf https://astral.sh/uv/install.sh | sh

Download the repository.

git clone https://github.com/creatis-myriad/GENESIS
cd GENESIS

Create a virtual environment and install the project and its dependencies. You must specify as an extra the desired compute platform for PyTorch (i.e. CPU/CUDA). Supported values are: cpu, cu129, cu128, cu126.

# e.g. to install the project with the PyTorch version built for CPU
uv sync --extra cpu

# e.g. to install the project with the PyTorch version built for CUDA 12.8
uv sync --extra cu128

[OPTIONAL] You can also specify other extras for additional functionalities:

# e.g. to install the `wandb` extra for W&B integration
uv sync --extra cpu --extra wandb

# e.g. to install all extra functionalities at once
uv sync --extra cpu --extra all

Activate the virtual environment created by uv.
```
source .venv/bin/activate
```

Pip

Download the repository.

git clone https://github.com/creatis-myriad/GENESIS
cd GENESIS

Create a virtual environment and activate it.

python -m venv .venv
source .venv/bin/activate

Install PyTorch according to the official instructions. Follow the instructions for pip and the compute platform compatible with your system.

# e.g. to install the PyTorch version built for CPU
pip install torch --index-url https://download.pytorch.org/whl/cpu

# e.g. to install the PyTorch version built for CUDA 12.8
pip install torch --index-url https://download.pytorch.org/whl/cu128

Install PyG and its torch_scatter dependency according to the official instructions Follow the instructions for pip and the compute platform compatible with your system.

# install PyG
pip install torch_geometric

# install `torch_scatter` optional dependency (e.g. for CUDA 12.8)
pip install torch_scatter -f https://data.pyg.org/whl/torch-2.8.0+cu128.html

Install the project in editable mode.

pip install -e .

[OPTIONAL] You can also specify other extras for additional functionalities:

# e.g. to install the `wandb` extra for W&B integration
pip install -e .[wandb]

# e.g. to install all extra functionalities at once
pip install -e .[all]

List of available extras

[cpu|cu129|cu128|cu126]: Required mutually exclusive extras to install the project with a PyTorch version built for CPU or a specific CUDA version (only available when using uv, not pip).
all: Install all (non-mutually exclusive) extras at once.
baselines: Extra dependencies required to run the baselines.
totalsegmentator: For using the pretrained TotalSegmentator model for segmenting heart ventricles to preprocess images.
wandb: For experiment tracking with Weights & Biases.

Setup Weight & Biases

Create an account

Follow the instructions on the Weights & Biases website to create an account.

Install W&B

Make sure that you install the wandb extra when installing the project, as shown in the installation instructions.

Configure your credentials

The recommended way to configure your W&B credentials is to expose them as environment variables (see W&B's documentation on this). You can do this by copying the configs/local/example.yaml to a new default.yaml (which will be ignored by Git) and filling in your W&B credentials.

You don't have to do anything more than that, as the project is configured to automatically load keys under hydra.job.env_set as environment variables when executing the scripts.

Use the wandb logger

Follow the instructions provided in the How to run section to enable experiment tracking via W&B.

Reproduce published experiments

The commands below are meant to reproduce the experiments described in the paper. They will run different combinations of models and data configurations in a 10-fold cross-validation setting.

Results are logged both locally and online on W&B (see previous section for instructions to set up W&B). Each experiment corresponds to a configuration run on a specific cross-validation fold. To facilitate analysis, groups in W&B correspond to the same configuration run on the cross-validation folds.

Depending on the type of model (tabular or GNN) different Python entry point scripts are called:

Tabular models use the tabular_baseline.py script;
GNNs use the train.py script.

Warning

Running the experiments below requires access to the PERSEVERE dataset, which is not publicly available. Thus, the scripts should not be expected to run as-is without the dataset. Rather, the scripts and code are provided for reference.

Tip

Since models are implemented in a dataset-agnostic way, implementing PyG datasets and providing corresponding configs should be all that is needed to test the models on other datasets.

Ablation study of global features with tabular models for risk stratification

To run tabular models (TabPFN, XGBoost) on combinations of global features (medical records, cardiac biomarkers, vascular biomarkers):

scripts/train-persevere-tabular.sh

Benchmark of GNNs on vascular graph and global features for risk stratification

To run GNN backbones (GCN, GAT, GIN, GPS), with and without Virtual Nodes (VN) for MPNN backbones, and with different strategies to combine global features (early fusion (EF), late fusion (LF), virtual node (VN), Feature Tokenizer with cross-attention (FTxA)):

scripts/train-persevere-gnn.sh

Vascular biomarkers regression as sanity check on GNNs

To compare the best tabular and GNN backbones for the prediction of vascular biomarkers that are derived from local graph features:

Runs TabPFN on global features (medical records, cardiac biomarkers);
Runs GIN and GPS on the vascular graphs.

scripts/run-persevere-sanity-check-targets.sh

Ablation study of data and graph representations on the best GNN configuration

To run the best GNN configuration with alternative graph and global features representations:

# Test the primal graph representation.
# The default config uses the dual (i.e. line graph) representation.
scripts/run-persevere-gnn-ablation.sh graph_representation

# Test linear and TabPFN embedding of global features.
# The default config uses the Feature Tokenizer embedding.
scripts/run-persevere-gnn-ablation.sh global_features_embedding

# Test using a CLS token on global features as readout, i.e. graph-level representation.
# The default configuration uses global graph pooling (i.e., mean or sum depending on the config).
scripts/run-persevere-gnn-ablation.sh readout

Important

The results of these runs are meant to be compared to runs launched with the best GNN configuration, GPS + Feature Tokenizer with Cross-Attention (gps+ftxa), run as part of the GNN benchmark.

Tip

Calling the run-persevere-gnn-ablation.sh script with the name of one of the folders in configs/experiment/ablation will run all the experiment configs in that folder using the train.py script.

Run custom experiments

This section describes how to configure individual experiments, e.g., to change hyperparameters, models, datasets, etc., if you want more control over the configuration than the predefined batch of experiments described in the previous section.

The basics

Train model with the default configuration (on the small MUTAG dataset).

# train on CPU
gnn-train trainer=cpu

# train on GPU
gnn-train trainer=gpu

Override any individual parameter in the config files from the command line like this:

# override the number of epochs and batch size
gnn-train trainer.max_epochs=20 data.batch_size=64 ...

# train default model on your dataset
gnn-train data/dataset=<YOUR_DATASET_CONFIG> ...

# train your model on the default dataset
gnn-train model=<YOUR_MODEL_CONFIG> ...

To evaluate a trained model, use the gnn-eval script.

# evaluate a trained model on your dataset's test set
gnn-eval data=<DATAMODULE_CONFIG> data/dataset=<DATASET_CONFIG> model=<MODEL_CONFIG> ckpt_path=<PATH_TO_CHECKPOINT>

Use preset configs

Train model with chosen experiment configuration from configs/experiment/.

Tip

This allows you to provide (complete) presets on top of the default configuration, typically for experiments you want to run regularly.

gnn-train experiment=<YOUR_EXPERIMENT_CONFIG>

Track experiments

The implemented tool to track experiments is Weights & Biases, by using W&B's integration in PyTorch Lightning.

Warning

You must have followed the W&B setup instructions to use this feature.

# track experiment online w/ W&B
gnn-train logger=wandb

# track experiment offline w/ W&B
gnn-train logger=wandb logger.wandb.offline=True

Run multiple experiments

Launch multiple experiments at once using the multirun (-m) option.

# run multiple experiments sequentially, here w/ 5 different seeds
gnn-train -m seed=0,1,2,3,4

Launch multiple experiments at once in parallel using the Joblib launcher for Hydra.

Note

The hydra-joblib-launcher plugin required to use this feature is installed by default with the project, so no need to install it by yourself.

# run multiple experiments in parallel, here w/ 5 different seeds
gnn-train -m hydra/launcher=joblib seed=0,1,2,3,4

Run automatic hyperparameter search with Optuna

Launch an automatic hyperparameter search using the Optuna sweeper for Hydra.

Warning

You have to make sure that the hparams_search config you use is compatible with the model, since hparams_search defines how to sweep over model-dependent config options.

# Example of a predefined Optuna config for graph-level models compatible with the default experiment
gnn-train hparams_search=graph_classification_optuna

Tip

Optuna can be used in a cross-validation setting, by evaluating each sampling of hyperparameters on the different dataset folds and reporting the average performance. However, this approach is not compatible with the default Optuna sweeper plugin, where each trial corresponds to one Hydra run, i.e. one model trained/evaluated on a specific partition of the dataset.

To support this feature, we rely on our custom serial_sweeper, designed to run multiple jobs in sequence within the same Hydra run and then aggregate the results of these jobs. By sweeping over the different folds with this sweeper, we support cross-validation with Optuna.

This is all handled already in the predefined Optuna config graph_classification_optuna for graph-level models. If you want to support this in your own Optuna config, all you have to do is to use the predefined splits config for serial_sweeper, and make sure that data/split=kfold is used to split the data into multiple folds.

gnn-train [...] hparams_search=<YOUR_OPTUNA_CONFIG> data/split=k_fold serial_sweeper=splits

Run tests

Run the tests using Pytest.

# run all tests
pytest

# run a test package
pytest tests/integration

# run tests from a specific file
pytest tests/integration/test_train.py

# run all tests except the ones marked as slow
pytest -k "not slow"

Name		Name	Last commit message	Last commit date
Latest commit History 559 Commits
.github		.github
assets		assets
checkpoints		checkpoints
data		data
logs		logs
notebooks		notebooks
scripts		scripts
src/genesis		src/genesis
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GENESIS

Publications

Description

Table of Contents

Installation

uv (recommended)

Pip

List of available extras

Setup Weight & Biases

Create an account

Install W&B

Configure your credentials

Use the wandb logger

Reproduce published experiments

Ablation study of global features with tabular models for risk stratification

Benchmark of GNNs on vascular graph and global features for risk stratification

Vascular biomarkers regression as sanity check on GNNs

Ablation study of data and graph representations on the best GNN configuration

Run custom experiments

The basics

Use preset configs

Track experiments

Run multiple experiments

Run automatic hyperparameter search with Optuna

Run tests

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GENESIS

Publications

Description

Table of Contents

Installation

uv (recommended)

Pip

List of available extras

Setup Weight & Biases

Create an account

Install W&B

Configure your credentials

Use the wandb logger

Reproduce published experiments

Ablation study of global features with tabular models for risk stratification

Benchmark of GNNs on vascular graph and global features for risk stratification

Vascular biomarkers regression as sanity check on GNNs

Ablation study of data and graph representations on the best GNN configuration

Run custom experiments

The basics

Use preset configs

Track experiments

Run multiple experiments

Run automatic hyperparameter search with Optuna

Run tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages