A comprehensive Snakemake workflow for running SCENIC analysis on single-cell RNA-seq data.
This workflow implements the complete SCENIC pipeline using Snakemake, including downstream analysis.
- Data preprocessing: Quality control and filtering of single-cell data
- Network inference: Gene regulatory network inference using GENIE3
- Regulon discovery: Identification of regulons using cisTarget
- Activity scoring: Calculation of regulon activity scores (AUC)
- Visualization: Comprehensive plots and reports
- Scalable: Parallel processing support
- Reproducible: Conda environments and version control
-
Expression Matrix Preparation
- Normalize expression data
- Log-transform values
- Format for SCENIC input
-
Gene Regulatory Network Inference
- Run GENIE3 algorithm
- Generate gene-gene adjacency matrix
-
Regulon Creation
- Use cisTarget for motif enrichment
- Create transcription factor regulons
-
Activity Scoring
- Calculate AUC scores for each regulon
- Generate binary activity matrix
-
Visualization & Analysis
- Regulon activity heatmaps
- UMAP plots colored by regulon activity
- Regulon Specificity Score (RSS) analysis
git clone <repository-url>
cd scenicSnakeCreate the conda environment:
conda env create -f envs/scenic.yaml
conda activate scenicEdit the configuration files:
config/config.yaml: Main workflow parameters
Create an AnnData object (.h5ad) of the preprocessed data.
# Dry run to check the workflow
snakemake -n
# Run the complete workflow
snakemake --cores 8 --use-conda
# Run specific steps
snakemake results/scenic/regulons.json --cores 4 --use-condaMake sure the config.yaml file is updated prior to running.
Files will be saved for each split condition if applicable.
results/scenic/adjacencies.tsv: Gene-gene adjacency matrixresults/scenic/regulons.json: Discovered regulonsresults/scenic/auc_matrix.csv: Regulon activity scoresresults/scenic/binary_regulon_activity.csv: Binary regulon activity
results/plots/regulon_heatmap.pdf: Regulon activity heatmapresults/plots/umap_regulon_activity.pdf: UMAP with regulon overlayresults/plots/rss_plot.pdf: Regulon specificity scores
results/reports/scenic_report.html: Comprehensive analysis report
- Use cluster execution:
snakemake --cluster "sbatch --time={resources.time} --mem={resources.mem}" --cores 32- Adjust resource allocation in
config/config.yaml
If you use this workflow, please cite:
- SCENIC: Aibar et al. Nature Methods (2017)
- Snakemake: Köster & Rahmann, Bioinformatics (2012)
- scanpy: Wolf et al. Genome Biology (2018)
This workflow is released under the MIT License.