Skip to content

ai4nucleome/SPAMO

Repository files navigation

SPAMO: Spatial Multi-Omics Integration via Dual-Graph Encoding and Cross-Modal Interaction

A spatial multi-omics integration framework that combines dual-graph encoding, cross-modal interaction-guided fusion, feature-graph refinement, and self-supervised structural regularization for clustering-oriented representation learning.

Abstract

Spatial multi-omics technologies enable the joint measurement of complementary molecular modalities within intact tissues, providing new opportunities to characterize cellular heterogeneity and spatial organization in situ. However, integrating heterogeneous modalities for unsupervised clustering and spatial domain identification remains challenging because of modality-specific noise, heterogeneous feature structures, and the need to preserve spatial context. Existing methods often rely on coarse fusion strategies or insufficient structural constraints, limiting their ability to capture cross-modal dependencies and maintain coherent latent organization.

SPAMO addresses these challenges through:

  • Dual-Graph Encoding — Per-modality 2-layer GCN encoders with adaptive spatial-feature graph blending (AdaptiveAdjFusion)
  • Cross-Modal Interaction — Lightweight cross-modal attention enabling inter-modality information exchange, followed by gated MLP fusion
  • Feature-Graph Refinement — Learnable parameterized adjacency matrices updated via EMA, allowing iterative refinement of feature graphs during training
  • Self-Supervised Structural Regularization — Deep Graph Infomax (DGI) loss and spatial smoothness regularization to preserve graph topology without external labels

We evaluate SPAMO on benchmark datasets including Human Lymph Node, Mouse Brain, and simulated multi-modality settings. Across these datasets, SPAMO shows improved performance over strong baselines on the main clustering metrics and achieves competitive results across diverse settings.

Architecture

Input (RNA + ADT/ATAC, or RNA + ADT + ATAC for 3-modal)
    │
    ├── RobustEncoder (per modality)
    │     ├── 2-layer GCN with dropout
    │     ├── AdaptiveAdjFusion (learned spatial–feature graph blend)
    │     └── LayerNorm
    │
    ├── RobustFusionModule
    │     ├── LightCrossModalAttention (bidirectional)
    │     ├── Gated MLP fusion
    │     └── Residual connection with average pooling
    │
    ├── RobustDecoder (per modality)
    │     └── Single-layer GCN reconstruction
    │
    └── Loss Functions
          ├── Weighted MSE reconstruction loss
          ├── Feature-graph Frobenius norm regularization (EMA target)
          ├── DGI self-supervised loss (node–global MI maximization)
          └── Spatial smoothness regularization

License

Part of the preprocessing code is derived from SpatialGlue (Long et al. 2024).

Dependencies

  • python >= 3.8
  • torch >= 2.0
  • anndata == 0.8.0
  • numpy == 1.22.3
  • pandas == 1.4.2
  • rpy2 == 3.4.1
  • scanpy == 1.9.1
  • scikit-learn == 1.1.1
  • scikit-misc == 0.2.0
  • scipy == 1.8.1
  • scvi == 0.6.8

The above packages are the main packages used for the experiments. Most PyTorch 2.0+ environments can run the code directly.

Data

Please download Human Lymph Node dataset (Long et al. 2024) and spatial epigenome–transcriptome mouse brain dataset (Zhang et al. 2023) from https://zenodo.org/records/14591305, and unzip them into ./Data/.

The expected data directory layout:

Data/
├── HLN/
│   ├── adata_RNA.h5ad
│   └── adata_ADT.h5ad
├── Mouse_Brain/
│   ├── adata_RNA.h5ad
│   └── adata_peaks_normalized.h5ad
└── Simulation/
    ├── adata_RNA.h5ad
    ├── adata_ADT.h5ad
    └── adata_ATAC.h5ad

Quick Start

Create and activate a Python virtual environment with Anaconda:

conda create -n spamo python=3.8
conda activate spamo

Install packages:

pip install torch scanpy scikit-learn numpy anndata rpy2 scikit-misc scipy scvi-tools

To reproduce all benchmark results (HLN, Mouse Brain, Simulation):

sh run.sh

The quantification results and visualizations will be saved in the ./results/ directory.

Running Individual Datasets

Human Lymph Node (RNA + ADT):

python main.py \
  --file_fold ./Data/HLN --data_type 10x \
  --n_clusters 10 --init_k 10 --KNN_k 20 \
  --RNA_weight 5 --ADT_weight 5 \
  --dgi_weight 0.1 --spatial_weight 0.01 \
  --epochs_override 200 --optimizer_type sgd \
  --random_seed 2025 \
  --vis_out_path results/HLN.png \
  --txt_out_path results/HLN.txt

Mouse Brain (RNA + ATAC):

python main.py \
  --file_fold ./Data/Mouse_Brain --data_type Spatial-epigenome-transcriptome \
  --n_clusters 14 --init_k 14 --KNN_k 20 \
  --RNA_weight 1 --ADT_weight 10 \
  --dgi_weight 0.1 --spatial_weight 0.01 \
  --epochs_override 300 --optimizer_type adamw --lr_scheduler_type cosine \
  --random_seed 2025 \
  --vis_out_path results/MB.png \
  --txt_out_path results/MB.txt

Simulation (RNA + ADT + ATAC, triple modality):

python main.py \
  --file_fold ./Data/Simulation --data_type Simulation \
  --n_clusters 5 --init_k 5 --KNN_k 20 \
  --RNA_weight 5 --ADT_weight 5 \
  --random_seed 2025 \
  --vis_out_path results/Sim.png \
  --txt_out_path results/Sim.txt

Applying SPAMO to Custom Datasets

To apply SPAMO to your own dataset, ensure that the count matrices from different omics layers are stored in the anndata.AnnData format (.h5ad), and they share the same number of spots/cells and spatial coordinates. Then run:

python main.py \
  --file_fold <Path to AnnData directory> \
  --data_type <10x | Spatial-epigenome-transcriptome | SPOTS | Stereo-CITE-seq | Simulation> \
  --n_clusters <Number of clusters> \
  --init_k <Estimated number of clusters> \
  --KNN_k 20 \
  --RNA_weight <Reconstruction weight for modality 1> \
  --ADT_weight <Reconstruction weight for modality 2> \
  --dgi_weight <DGI self-supervised loss weight, default 0.1> \
  --spatial_weight <Spatial smoothness weight, default 0.01> \
  --vis_out_path <Output visualization path, e.g., results/XXX.png> \
  --txt_out_path <Output cluster labels path, e.g., results/XXX.txt>

Key Hyperparameters

Parameter Default Description
--n_clusters Number of spatial domains for clustering
--KNN_k 20 Number of neighbors for feature graph construction
--RNA_weight 5 Reconstruction weight for RNA modality
--ADT_weight 5 Reconstruction weight for ADT/ATAC modality
--dgi_weight 0.1 Weight of DGI self-supervised loss
--spatial_weight 0.01 Weight of spatial smoothness regularization
--dim_output 64 Latent embedding dimension
--dropout 0.1 Dropout rate
--epochs_override 0 Override training epochs (0 = use dataset default)
--optimizer_type adamw Optimizer: sgd, adam, or adamw
--lr_scheduler_type none LR scheduler: none, cosine, or plateau
--use_cross_attn True Enable cross-modal attention in fusion
--random_seed 2025 Random seed for reproducibility

Project Structure

SPAMO/
├── main.py                    # Main entry: data loading, training, clustering, visualization
├── run.sh                     # Shell script to reproduce all benchmark results
├── bio_analysis.py            # Downstream biological analyses (DEG, PAGA, GO enrichment, etc.)
├── cal_matrics.py             # Metric evaluation script
├── clustering_utils.py        # Split-and-merge clustering utilities
├── metric.py                  # Evaluation metrics (ARI, NMI, ASW, MAP, etc.)
├── spamo/                     # Core package
│   ├── __init__.py
│   ├── model.py               # SpaMO model (2-modality): Encoder, Decoder, Fusion, DGI, Spatial Reg.
│   ├── model_3m.py            # SpaMO-3M model (3-modality extension)
│   ├── trainer.py             # Training loop for 2-modality
│   ├── trainer_3m.py          # Training loop for 3-modality
│   ├── preprocess.py          # Data preprocessing & graph construction (2-modality)
│   ├── preprocess_3m.py       # Data preprocessing & graph construction (3-modality)
│   ├── optimal_clustering.py  # Optimal clustering utilities
│   └── utils.py               # Clustering (mclust/leiden/louvain) & spatial smoothing
├── results/                   # Output directory for results
└── Data/                      # Dataset directory

Reference

Houcheng Su, Juning Feng, Weicai Long, Yusen Hou, Yanlin Zhang. SPAMO: Spatial Multi-Omics Integration via Dual-Graph Encoding and Cross-Modal Interaction. Information Hub, Hong Kong University of Science and Technology (Guangzhou).

[1] Long, Y.; Ang, K. S.; Sethi, R.; Liao, S.; Heng, Y.; van Olst, L.; Ye, S.; Zhong, C.; Xu, H.; Zhang, D.; et al. 2024. Deciphering spatial domains from spatial multi-omics with SpatialGlue. Nature Methods, 1–10.

[2] Zhang, D.; Deng, Y.; Kukanja, P.; Agirre, E.; Bartosovic, M.; Dong, M.; Ma, C.; Ma, S.; Su, G.; Bao, S.; et al. 2023. Spatial epigenome–transcriptome co-profiling of mammalian tissues. Nature, 616(7955): 113–122.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors