This repository contains scripts and notebooks used for the analysis of single-nucleus RNA-sequencing (snRNA-seq) data in the AMP-PD cohort. The project includes data preprocessing, clustering, cell type composition modeling, GWAS scoring, differential expression, TF inference, Hotspot module detection, and more — supporting our accompanying manuscript.
- Project Overview
- Preprocessing & Clustering
- GWAS Scoring with scDRS
- Cell Type Compositional Analysis
- Differential Expression with dreamlet
- Variance Partitioning
- Hotspot Module Detection
- TF Activity Inference
- Cell-Cell Interaction Inference
- Installation & Requirements
This pipeline supports the full spectrum of single-cell and pseudobulk-level analyses in the AMP-PD dataset. Modules include:
- Multistage data preprocessing, QC, and integration (Scanpy + Harmony)
- Cell-type proportion modeling across clinical groups (crumblr + dream)
- GWAS signal scoring with scDRS
- Differential expression via dreamlet with hierarchical covariates
- Hotspot-based gene module discovery
- TF activity inference and regulatory network visualization (decoupler)
- Cell-cell interaction prediction (LIANA)
Script: prepare_and_integrate_AMPPD.py
Performs HVG filtering, PCA, Harmony batch correction, and Leiden clustering using Scanpy and Pegasus.
python src/prepare_and_integrate_AMPPD.pyScript: score_gwas_scdrs.py
Scores single cells using scDRS for prioritized Parkinson’s disease GWAS gene sets.
python src/score_gwas_scdrs.pyScript: cell_type_composition.R
Performs compositional modeling across subclasses using crumblr and dream, followed by meta-analysis with metafor and visualization via ggtree.
- Meta-analysis of compositional shifts
- Coefficient plots annotated on hierarchical cell tree
Script: dreamlet_differential_expression_PD.R
Runs subclass-level pseudobulk DE using dreamlet. Includes models with and without ethnicity/participant covariates.
- DE results in
.csvand.RData
Script: variance_partition_braakLB_dreamlet.R
Uses dreamlet::fitVarPart() to quantify gene expression variance explained by covariates including Braak LB stage.
Rscript scripts/variance_partition_braakLB_dreamlet.RScript: hotspot_example.py
Runs the Hotspot algorithm to detect local autocorrelated gene modules, focusing on myeloid cells.
python src/hotspot_example.pyScript: tf_inference.py
Infers transcription factor activity per cell type using decoupler and CollecTRI network. Includes specificity scoring, normalized activity, and heatmap visualization.
- Normalized TF activity scores
- TF specificity scores
- Ranked TFs per subclass with heatmap
Script: run_cci_liana_example.py
Uses the LIANA Python package to infer cell-cell interactions from integrated single-cell data.
python src/run_cci_liana_example.pyThe following packages are required and can be installed via pip:
pip install scanpy pegasuspy pegasusio anndata scdrs matplotlib seaborn pandas numpy \
decoupler liana igraph scikit-learn harmony-pytorchNote:
pegasuspyandpegasusioare required for HDF5/AnnData processingharmony-pytorchis used for batch correctiondecoupleris used for TF activity inferencelianais used for CCI prediction
Use the following commands to install the required R packages:
# Bioconductor
BiocManager::install(c(
"zellkonverter", "SingleCellExperiment", "dreamlet", "variancePartition",
"crumblr", "ggtree", "qvalue", "GSEABase", "BiocParallel"
))
# CRAN
install.packages(c(
"ggplot2", "tidyverse", "aplot", "broom", "cowplot", "metafor", "reticulate"
))