Pseudobulk DGE Pipeline (Snakemake)

This repository contains the Snakemake workflow and scripts for running the ODAP pseudobulk differential gene expression (DGE) pipeline.

Overview

This pipeline performs pseudobulk differential gene expression analysis from single-cell or single-nucleus RNA-seq data.

It aggregates counts across cells to generate sample-level expression profiles for defined cell populations, then performs differential expression testing between conditions while accounting for donor effects and optional covariates.

The workflow is implemented in Snakemake for reproducibility and scalability on HPC systems.

Pipeline Summary

The pipeline includes:

Data ingestion from multiple formats
Metadata harmonisation and mapping
Cell set definition (metadata-driven)
Pseudobulk aggregation (per sample/donor)
Differential expression analysis (configurable model)
Reporting (QC metrics, PCA, DE results)

Input Support

Supported input formats include:

Seurat (.rds)
SingleCellExperiment (.rds)
Parse Splitpipe outputs
Standard Matrix Market (.mtx, .tsv)
(Planned) AnnData (.h5ad)

SEE FULL DOCUMENTATION FOR INPUT FORMAT REQUIREMENTS Multiple datasets and formats can be mixed within a single analysis.

Key Concepts

Cell Sets

Cells can be grouped into user-defined populations using:

Logical expressions (e.g. metadata filters)
Set algebra (e.g. intersections, unions)

Pseudobulk

Counts are aggregated per:

sample (default)
optionally donor or other grouping variables

This enables robust statistical testing using bulk RNA-seq methods.

Differential Expression

Flexible model design allows:

specification of reference condition
inclusion of covariates (e.g. donor, batch)
multiple contrasts and tests

Usage

This pipeline is designed for HPC execution using Slurm.

Basic steps:

Edit configuration:

nano config.yaml

Run pipeline:

sbatch run_pipeline.slurm

Configuration

All parameters are defined in config.yaml, including:

input data and metadata mapping
cell set definitions
pseudobulk settings
differential expression model and contrasts

The pipeline is highly flexible and supports heterogeneous input datasets.

Outputs

Intermediate files (outputs/)
Differential expression results (outputs/dge/)
- per cell set and contrast
- results.tsv.gz for downstream analysis
HTML reports (results/)
- per cell set
- global summaries
Top gene summaries

Documentation

Full user guide (including setup, metadata structure, and configuration details) is available in the ODAP documentation:

Full documentation and usage instructions

Notes

Designed for multi-sample, multi-condition studies
Supports complex experimental designs
Combines single-cell resolution with bulk statistical robustness

Requirements

These are all available in the ODAP DGE Pipeline container.

Snakemake
Python
Slurm (for HPC execution)
R
- Matrix
- data.table
- arrow
- SingleCellExperiment
- zellkonverter
- yaml
- rlang
- jsonlite
- Seurat
- fs
- SeuratObject
- scuttle
- edgeR
- r.utils
- ggplot2
- rmarkdown
- dplyr
- UpsetR
- tibble
- readr
- knitr
- scales
- ggrepel

Author

Kathryn Campbell on behalf of The ODAP Team

30th March 2026

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
run_pipeline.slurm		run_pipeline.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pseudobulk DGE Pipeline (Snakemake)

Overview

Pipeline Summary

Input Support

Key Concepts

Cell Sets

Pseudobulk

Differential Expression

Usage

Configuration

Outputs

Documentation

Notes

Requirements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pseudobulk DGE Pipeline (Snakemake)

Overview

Pipeline Summary

Input Support

Key Concepts

Cell Sets

Pseudobulk

Differential Expression

Usage

Configuration

Outputs

Documentation

Notes

Requirements

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages