From c8b9ca1a2444e1d8cadd98d72cc798d96f43c251 Mon Sep 17 00:00:00 2001 From: Jenny Date: Fri, 17 Apr 2026 12:04:43 -0400 Subject: [PATCH 1/7] Rewrite getting-started: SoS conda env + pixi install, updated analysis modules --- code/xqtl_protocol_demo.ipynb | 137 ++++++++++++++++------------------ 1 file changed, 65 insertions(+), 72 deletions(-) diff --git a/code/xqtl_protocol_demo.ipynb b/code/xqtl_protocol_demo.ipynb index 35dbb303..305047d9 100644 --- a/code/xqtl_protocol_demo.ipynb +++ b/code/xqtl_protocol_demo.ipynb @@ -10,11 +10,6 @@ "\n", "This page is a guided on-ramp. A minimal toy dataset of **49 de-identified samples** is used throughout the examples so you can try every pipeline end-to-end before running on real data. In about an hour you'll install the environment, clone the repo, download the demo dataset, and run your first cis-QTL scan.\n", "\n", - "```{image} images/complete_workflow.png\n", - ":alt: FunGen-xQTL analysis workflow\n", - ":align: center\n", - ":width: 90%\n", - "```\n", "\n", ":::{seealso}\n", "**New to the consortium?** Start with [How to use the resource](https://statfungen.github.io/xqtl-protocol/README.html#how-to-use-the-resource) on the homepage for the big-picture background, then come back here to set up.\n", @@ -63,17 +58,45 @@ "\n", "If you don't have conda yet, install [Miniforge](https://github.com/conda-forge/miniforge) (recommended) or [Anaconda](https://www.anaconda.com/download).\n", "\n", + "**Create and activate a new environment:**\n", + "\n", "```bash\n", - "# Create and activate a new environment for SoS\n", "conda create -n sos python=3.12 -y\n", "conda activate sos\n", + "```\n", + "\n", + "**Install the SoS Workflow engine and cluster support:**\n", + "\n", + "```bash\n", + "conda install -c conda-forge sos sos-pbs\n", + "```\n", + "\n", + "- `sos` \u2014 the core SoS workflow engine for running pipelines from the command line\n", + "- `sos-pbs` \u2014 task queue support for submitting jobs to HPC schedulers (SLURM, PBS, LSF, SGE)\n", + "\n", + "**Install SoS Notebook and extensions:**\n", + "\n", + "```bash\n", + "conda install -c conda-forge sos-notebook jupyterlab-sos sos-papermill\n", + "```\n", "\n", - "# Install the full SoS suite\n", - "conda install -c conda-forge \\\n", - " sos sos-pbs sos-notebook jupyterlab-sos sos-papermill \\\n", - " sos-bash sos-python sos-r\n", + "- `sos-notebook` \u2014 the SoS kernel for Jupyter, enabling multi-language notebooks\n", + "- `jupyterlab-sos` \u2014 JupyterLab extension for the SoS Notebook interface\n", + "- `sos-papermill` \u2014 Papermill extension for running SoS notebooks non-interactively from the command line\n", "\n", - "# Register the SoS kernel with Jupyter\n", + "**Install language modules:**\n", + "\n", + "```bash\n", + "conda install -c conda-forge sos-r sos-python sos-bash\n", + "```\n", + "\n", + "- `sos-r` \u2014 R language module; this will also install `r-base`, `r-irkernel`, and required R libraries automatically\n", + "- `sos-python` \u2014 Python language module for cross-kernel data exchange\n", + "- `sos-bash` \u2014 Bash language module for shell commands within SoS notebooks\n", + "\n", + "**Register the SoS kernel with Jupyter:**\n", + "\n", + "```bash\n", "python -m sos_notebook.install\n", "```\n", "\n", @@ -246,95 +269,65 @@ "source": [ "## Analysis\n", "\n", - "With the environment set up, here's the full protocol in order. Each link is a self-contained mini-protocol; all commands in them should be executed from the command line with `sos run pipeline/.ipynb ...`.\n", + "Please visit [the homepage of the protocol website](https://statfungen.github.io/xqtl-protocol/) for the general background on this resource, in particular the [How to use the resource](https://statfungen.github.io/xqtl-protocol/README.html#how-to-use-the-resource) section. To perform a complete analysis from molecular phenotype quantification to xQTL discovery, please conduct your analysis in the order listed below, each link contains a mini-protocol for a specific task. All commands documented in each mini-protocol should be executed in the command line environment.\n", "\n", ":::{important}\n", - "**Minimum Working Example (MWE) \u2014 new users, start here.**\n", + "**Minimum Working Example \u2014 new users, start here.**\n", "\n", - "Every module in the repo ships a minimal `MWE`-prefixed test dataset under [Synapse `syn36416559`](https://www.synapse.org/#!Synapse:syn36416559/files/). To go end-to-end on the demo data, run these **five** pipelines in order and skip everything else on the first pass:\n", + "Every module ships a minimal test dataset (prefixed with `MWE`) under [Synapse `syn36416559`](https://www.synapse.org/#!Synapse:syn36416559/files/). To go end-to-end on the demo data, run these five pipelines in order and skip everything else on the first pass:\n", "\n", - "1. [`reference_data.ipynb`](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 pull the standardized reference files\n", - "2. [`bulk_expression.ipynb`](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 quantify gene expression (MWE default)\n", - "3. [`genotype_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2192 [`phenotype_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2192 [`covariate_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2014 QC + normalization\n", - "4. [`qtl_association_testing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2014 run cis-QTL with TensorQTL\n", - "5. [`mnm_miniprotocol.ipynb`](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 single-trait fine-mapping + TWAS with SuSiE\n", + "1. [`reference_data.ipynb`](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 prepare standardized reference files\n", + "2. [`bulk_expression.ipynb`](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 quantify gene expression\n", + "3. [`genotype_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2192 [`phenotype_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2192 [`covariate_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2014 QC and normalization\n", + "4. [`qtl_association_testing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2014 cis-QTL with TensorQTL\n", + "5. [`mnm_miniprotocol.ipynb`](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 fine-mapping + TWAS with SuSiE\n", "\n", - "Once this pass completes cleanly, branch out to the additional modules below (methylation, splicing, multivariate mixture, GWAS integration, enrichment, EMS) based on what your project needs.\n", + "Once this pass completes, branch out to the additional modules below based on what your project needs.\n", ":::\n", "\n", - "### 1. Reference Data\n", - "\n", - "Before quantifying phenotypes, set up the standardized reference files \u2014 genomes, gene annotations, variant annotations, LD maps, and topologically associated domains.\n", - "\n", - "- [Reference data setup](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 main entry point \u2b50 *MWE*\n", - "- [Reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) \u2014 detailed preparation steps\n", - "- [Generalized TAD-B](https://statfungen.github.io/xqtl-protocol/code/reference_data/generalized_TADB.html) \u2014 TAD boundaries for analysis windows\n", - "- [LD reference pruning](https://statfungen.github.io/xqtl-protocol/code/reference_data/ld_prune_reference.html) and [RSS LD sketching](https://statfungen.github.io/xqtl-protocol/code/reference_data/rss_ld_sketch.html) \u2014 advanced LD utilities\n", - "\n", - "### 2. Molecular Phenotypes\n", - "\n", - "We support bulk RNA-seq, DNA methylation, and alternative splicing phenotypes. Each path has its own calling, QC, and normalization steps.\n", + "### Reference Data\n", "\n", - "- **Bulk RNA-seq** \u2014 [bulk_expression mini-protocol](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2b50 *MWE*, with sub-modules for [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html), [QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html), and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html)\n", - "- **DNA methylation** \u2014 [methylation mini-protocol](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) with [methylation calling via SeSAMe](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html)\n", - "- **Alternative splicing** \u2014 [splicing mini-protocol](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) with [splicing calling via leafcutter2](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html) and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html)\n", + "Multiple [reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) files are required before molecular phenotypes are quantified in samples. These include, but are not limited to, reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains. The [reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) module handles downloading and standardizing these files. Additional utilities are available for [generalized TAD boundaries](https://statfungen.github.io/xqtl-protocol/code/reference_data/generalized_TADB.html), [LD reference pruning](https://statfungen.github.io/xqtl-protocol/code/reference_data/ld_prune_reference.html), and [RSS LD sketching](https://statfungen.github.io/xqtl-protocol/code/reference_data/rss_ld_sketch.html).\n", "\n", - "### 3. Data Pre-processing\n", + "### Molecular Phenotype Quantification\n", "\n", - "- [Genotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2b50 *MWE* \u2014 VCF QC, GWAS QC, PCA, GRM, plink formatting\n", - "- [Phenotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2b50 *MWE* \u2014 gene annotation, imputation, formatting\n", - "- [Covariate preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2b50 *MWE* \u2014 merge genetic PCs with phenotypes, compute hidden factors\n", + "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes in our pipeline. [Quantification of gene expression](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) is conducted with either RNA-SeQC for gene-level counts, or RSEM for transcript-level counts, with sub-modules for [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html), [QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html), and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html). [Quantification of alternative splicing events](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) is conducted with leafcutter2 to identify alternatively excised introns, with [splicing calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html) and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html) sub-modules. [Quantification of DNA methylation](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) is done using SeSAMe via the [methylation calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html) module. Each of these molecular phenotypes then undergo phenotype specific quality control and normalization.\n", "\n", - "### 4. QTL Association Testing\n", + "### Data Pre-Processing\n", "\n", - "- [QTL association testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2b50 *MWE* \u2014 [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html) scans (cis, trans, interaction) and [quantile regression QTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html)\n", - "- [Association postprocessing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) \u2014 hierarchical multiple testing and p-value adjustment\n", + "[Preprocessing of genotype data](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) begins with the application of variant filters using bcftools ([VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html), [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html)). VCF files are then converted to plink format so that kinship analyses may be performed to identify unrelated individuals. Genetic principal components ([PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html)) and genetic relationship matrices ([GRM](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GRM.html)) are then generated for unrelated samples and genotype files are [formatted](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/genotype_formatting.html) for later generation of quantitative trait loci.\n", "\n", - "### 5. Multivariate Mixture Model\n", + "[Preprocessing of phenotypic data](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) begins with [annotation of features](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/gene_annotation.html), if required. Missing entries may then be [imputed](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_imputation.html) using a variety of methods included in the pipeline. Last, the phenotypes are [formatted](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_formatting.html) for later generation of quantitative trait loci.\n", "\n", - "Learn a data-driven mixture prior across contexts/tissues for multivariate fine-mapping.\n", + "[Preprocessing of covariates](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) begins with the [merging](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_formatting.html) of phenotypic data with previously generated genetic principal components. The merged data is then used to [calculate hidden factors](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_hidden_factor.html) which will later be used as additional covariates.\n", "\n", - "- [Multivariate mixture vignette](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) \u2014 overview\n", - "- [Mixture prior with MASH](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) and [MASH fit](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mash_fit.html) \u2014 data-driven prior estimation\n", + "### QTL Association Testing\n", "\n", - "### 6. Multiomics Regression Models\n", + "[QTL association analysis](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) is conducted with [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html). We include options for cis or trans analysis, with options to include interaction terms. We also support [quantile regression-based QTL analysis](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html) for detecting non-linear effects. [Hierarchical multiple testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) may then be applied to the results to adjust p-values.\n", "\n", - "Fine-mapping and multi-context regression \u2014 the core of the post-discovery analysis.\n", + "### Multivariate Mixture Model\n", "\n", - "- [Multi-omic regression mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2b50 *MWE* \u2014 start here\n", - "- [Univariate fine-mapping + TWAS with SuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html)\n", - "- [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html)\n", - "- [Univariate fine-mapping with fSuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) \u2014 functional / epigenomic data\n", - "- [Multivariate fine-mapping vignette](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_fine_mapping_vignette.html)\n", - "- [Summary-statistics fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html)\n", - "- [Multi-omic multi-trait regression](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html) and [RSS analysis](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/rss_analysis.html)\n", - "- [MNM postprocessing](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_postprocessing.html)\n", + "For multi-context or multi-tissue analyses, we provide a [multivariate mixture model](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) framework based on MASH. This learns a data-driven [mixture prior](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) across contexts and then [fits the model](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mash_fit.html) to estimate effect sizes and posterior probabilities for sharing of eQTLs across tissues.\n", "\n", - "### 7. GWAS Integration\n", + "### Multiomics Regression Models\n", "\n", - "Link xQTL signals to disease-associated loci.\n", + "Our pipeline includes multiple methods for fine-mapping of QTLs. The [multi-omic regression mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) is the recommended starting point for new users. [Univariate fine-mapping and TWAS with SuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) generates TWAS weights and credible sets using SuSiE. [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html) extends this to joint analyses across genes. [Regression with summary statistics](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) allows for the inclusion of summary statistics from GWAS in SuSiE finemapping. [Univariate fine-mapping of functional data](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) uses epigenomic data to fine-map with fSuSiE. [Multivariate fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_fine_mapping_vignette.html) handles multi-context credible sets. Additional modules include [multi-omic multi-trait regression](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html), [RSS analysis](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/rss_analysis.html), and [MNM postprocessing](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_postprocessing.html).\n", "\n", - "- [SuSiE-enloc colocalization](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html) \u2014 pairwise colocalization of xQTL and GWAS fine-mapping\n", - "- [TWAS / cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) \u2014 causal TWAS for complex traits\n", - "- [Colocboost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html) \u2014 shared-variant discovery across multiple molecular traits\n", + "### GWAS Integration\n", "\n", - "### 8. Enrichment and Validation\n", + "We include methods for [colocalization analysis](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html). This starts with the generation of prior probabilities followed by pairwise colocalization analysis of xQTL and GWAS fine-mapping results to identify shared causal variants. We also include [TWAS and cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) in our pipeline to identify genes associated with complex traits. An alternative method, [colocboost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html), identifies shared genetic variants influencing multiple molecular traits.\n", "\n", - "- [Excess-of-overlap enrichment](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) \u2014 significance of variants in annotation sets\n", - "- [Pathway enrichment (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html)\n", - "- [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) \u2014 annotation-based enrichment for significant variants\n", - "- [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) \u2014 heritability partitioning by annotation\n", + "### Enrichment and Validation\n", "\n", - "### 9. xQTL Modifier Score (EMS)\n", + "We utilize an [excess of overlap](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) method to evaluate the enrichment of significant variants within specific genomic annotations. [Pathway enrichment analysis (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) identifies biological pathways that are statistically overrepresented in a given gene set, giving information on potential biological functions, disease relevance, or regulatory mechanisms associated with the gene set. [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) provides annotation-based enrichment testing for regulatory variants. [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) (S-LDSC) is used to quantify the contribution of different genomic functional annotations to the heritability of complex traits and assess their statistical significance. By integrating GWAS summary statistics with genome annotations, S-LDSC distinguishes true polygenic signals from confounding effects.\n", "\n", - "Train and apply a per-variant score for prioritizing regulatory variants.\n", + "### xQTL Modifier Score (EMS)\n", "\n", - "- [EMS training](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_training.html)\n", - "- [EMS prediction](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_prediction.html)\n", + "The xQTL modifier score framework trains a per-variant model for prioritizing regulatory variants. The [EMS training](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_training.html) module fits the model using functional annotation features, and [EMS prediction](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_prediction.html) applies the trained model to score new variants.\n", "\n", - "### Command Generator (shortcut)\n", + "### Command Generator\n", "\n", - "Want to skip writing SoS commands by hand? The [eQTL analysis command generator](https://statfungen.github.io/xqtl-protocol/code/commands_generator/eQTL_analysis_commands.html) produces the full pipeline from a single configuration file \u2014 great for reproducing a run or sharing a recipe.\n", + "The [eQTL analysis command generator](https://statfungen.github.io/xqtl-protocol/code/commands_generator/eQTL_analysis_commands.html) produces the full pipeline commands from a single configuration file \u2014 useful for reproducing a complete run or sharing an analysis recipe without writing SoS commands by hand.\n", "\n", "\n", "---\n", From 140d4675d5256f18e354f7319a2436fa3c0302b6 Mon Sep 17 00:00:00 2001 From: Jenny Date: Fri, 17 Apr 2026 12:09:07 -0400 Subject: [PATCH 2/7] Rewrite getting-started: SoS conda env + pixi install, updated analysis modules --- code/xqtl_protocol_demo.ipynb | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/code/xqtl_protocol_demo.ipynb b/code/xqtl_protocol_demo.ipynb index 305047d9..6ba90265 100644 --- a/code/xqtl_protocol_demo.ipynb +++ b/code/xqtl_protocol_demo.ipynb @@ -285,15 +285,15 @@ "Once this pass completes, branch out to the additional modules below based on what your project needs.\n", ":::\n", "\n", - "### Reference Data\n", + "### 1. Reference Data\n", "\n", "Multiple [reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) files are required before molecular phenotypes are quantified in samples. These include, but are not limited to, reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains. The [reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) module handles downloading and standardizing these files. Additional utilities are available for [generalized TAD boundaries](https://statfungen.github.io/xqtl-protocol/code/reference_data/generalized_TADB.html), [LD reference pruning](https://statfungen.github.io/xqtl-protocol/code/reference_data/ld_prune_reference.html), and [RSS LD sketching](https://statfungen.github.io/xqtl-protocol/code/reference_data/rss_ld_sketch.html).\n", "\n", - "### Molecular Phenotype Quantification\n", + "### 2. Molecular Phenotype Quantification\n", "\n", "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes in our pipeline. [Quantification of gene expression](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) is conducted with either RNA-SeQC for gene-level counts, or RSEM for transcript-level counts, with sub-modules for [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html), [QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html), and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html). [Quantification of alternative splicing events](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) is conducted with leafcutter2 to identify alternatively excised introns, with [splicing calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html) and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html) sub-modules. [Quantification of DNA methylation](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) is done using SeSAMe via the [methylation calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html) module. Each of these molecular phenotypes then undergo phenotype specific quality control and normalization.\n", "\n", - "### Data Pre-Processing\n", + "### 3. Data Pre-Processing\n", "\n", "[Preprocessing of genotype data](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) begins with the application of variant filters using bcftools ([VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html), [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html)). VCF files are then converted to plink format so that kinship analyses may be performed to identify unrelated individuals. Genetic principal components ([PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html)) and genetic relationship matrices ([GRM](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GRM.html)) are then generated for unrelated samples and genotype files are [formatted](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/genotype_formatting.html) for later generation of quantitative trait loci.\n", "\n", @@ -301,31 +301,31 @@ "\n", "[Preprocessing of covariates](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) begins with the [merging](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_formatting.html) of phenotypic data with previously generated genetic principal components. The merged data is then used to [calculate hidden factors](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_hidden_factor.html) which will later be used as additional covariates.\n", "\n", - "### QTL Association Testing\n", + "### 4. QTL Association Testing\n", "\n", "[QTL association analysis](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) is conducted with [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html). We include options for cis or trans analysis, with options to include interaction terms. We also support [quantile regression-based QTL analysis](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html) for detecting non-linear effects. [Hierarchical multiple testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) may then be applied to the results to adjust p-values.\n", "\n", - "### Multivariate Mixture Model\n", + "### 5. Multivariate Mixture Model\n", "\n", "For multi-context or multi-tissue analyses, we provide a [multivariate mixture model](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) framework based on MASH. This learns a data-driven [mixture prior](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) across contexts and then [fits the model](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mash_fit.html) to estimate effect sizes and posterior probabilities for sharing of eQTLs across tissues.\n", "\n", - "### Multiomics Regression Models\n", + "### 6. Multiomics Regression Models (Fine-mapping)\n", "\n", "Our pipeline includes multiple methods for fine-mapping of QTLs. The [multi-omic regression mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) is the recommended starting point for new users. [Univariate fine-mapping and TWAS with SuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) generates TWAS weights and credible sets using SuSiE. [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html) extends this to joint analyses across genes. [Regression with summary statistics](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) allows for the inclusion of summary statistics from GWAS in SuSiE finemapping. [Univariate fine-mapping of functional data](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) uses epigenomic data to fine-map with fSuSiE. [Multivariate fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_fine_mapping_vignette.html) handles multi-context credible sets. Additional modules include [multi-omic multi-trait regression](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html), [RSS analysis](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/rss_analysis.html), and [MNM postprocessing](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_postprocessing.html).\n", "\n", - "### GWAS Integration\n", + "### 7. GWAS Integration\n", "\n", "We include methods for [colocalization analysis](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html). This starts with the generation of prior probabilities followed by pairwise colocalization analysis of xQTL and GWAS fine-mapping results to identify shared causal variants. We also include [TWAS and cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) in our pipeline to identify genes associated with complex traits. An alternative method, [colocboost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html), identifies shared genetic variants influencing multiple molecular traits.\n", "\n", - "### Enrichment and Validation\n", + "### 8. Enrichment and Validation\n", "\n", "We utilize an [excess of overlap](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) method to evaluate the enrichment of significant variants within specific genomic annotations. [Pathway enrichment analysis (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) identifies biological pathways that are statistically overrepresented in a given gene set, giving information on potential biological functions, disease relevance, or regulatory mechanisms associated with the gene set. [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) provides annotation-based enrichment testing for regulatory variants. [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) (S-LDSC) is used to quantify the contribution of different genomic functional annotations to the heritability of complex traits and assess their statistical significance. By integrating GWAS summary statistics with genome annotations, S-LDSC distinguishes true polygenic signals from confounding effects.\n", "\n", - "### xQTL Modifier Score (EMS)\n", + "### 9. xQTL Modifier Score (EMS)\n", "\n", "The xQTL modifier score framework trains a per-variant model for prioritizing regulatory variants. The [EMS training](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_training.html) module fits the model using functional annotation features, and [EMS prediction](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_prediction.html) applies the trained model to score new variants.\n", "\n", - "### Command Generator\n", + "### 10. Command Generator\n", "\n", "The [eQTL analysis command generator](https://statfungen.github.io/xqtl-protocol/code/commands_generator/eQTL_analysis_commands.html) produces the full pipeline commands from a single configuration file \u2014 useful for reproducing a complete run or sharing an analysis recipe without writing SoS commands by hand.\n", "\n", From 6405811adbda28c3d8ed2d31ea6c616cf03b80d6 Mon Sep 17 00:00:00 2001 From: Jenny Date: Fri, 17 Apr 2026 12:11:52 -0400 Subject: [PATCH 3/7] Rewrite getting-started: SoS conda env + pixi install, updated analysis modules --- code/xqtl_protocol_demo.ipynb | 86 +++++++++++++++++++++++++++++------ 1 file changed, 71 insertions(+), 15 deletions(-) diff --git a/code/xqtl_protocol_demo.ipynb b/code/xqtl_protocol_demo.ipynb index 6ba90265..c62f0c37 100644 --- a/code/xqtl_protocol_demo.ipynb +++ b/code/xqtl_protocol_demo.ipynb @@ -287,48 +287,104 @@ "\n", "### 1. Reference Data\n", "\n", - "Multiple [reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) files are required before molecular phenotypes are quantified in samples. These include, but are not limited to, reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains. The [reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) module handles downloading and standardizing these files. Additional utilities are available for [generalized TAD boundaries](https://statfungen.github.io/xqtl-protocol/code/reference_data/generalized_TADB.html), [LD reference pruning](https://statfungen.github.io/xqtl-protocol/code/reference_data/ld_prune_reference.html), and [RSS LD sketching](https://statfungen.github.io/xqtl-protocol/code/reference_data/rss_ld_sketch.html).\n", + "Multiple reference data files are required before molecular phenotypes are quantified in samples. These include, but are not limited to, reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains.\n", "\n", - "### 2. Molecular Phenotype Quantification\n", + "- [Reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 overview and required input files\n", + "- [Reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) \u2014 downloading and standardizing reference files\n", + "- [Generalized TAD boundaries](https://statfungen.github.io/xqtl-protocol/code/reference_data/generalized_TADB.html) \u2014 topologically associating domain annotations\n", + "- [LD reference pruning](https://statfungen.github.io/xqtl-protocol/code/reference_data/ld_prune_reference.html) \u2014 pruned LD reference panels\n", + "- [RSS LD sketching](https://statfungen.github.io/xqtl-protocol/code/reference_data/rss_ld_sketch.html) \u2014 LD matrix sketches for summary-statistics methods\n", "\n", - "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes in our pipeline. [Quantification of gene expression](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) is conducted with either RNA-SeQC for gene-level counts, or RSEM for transcript-level counts, with sub-modules for [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html), [QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html), and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html). [Quantification of alternative splicing events](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) is conducted with leafcutter2 to identify alternatively excised introns, with [splicing calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html) and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html) sub-modules. [Quantification of DNA methylation](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) is done using SeSAMe via the [methylation calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html) module. Each of these molecular phenotypes then undergo phenotype specific quality control and normalization.\n", + "### 2. Molecular Phenotype Quantification\n", "\n", - "### 3. Data Pre-Processing\n", + "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes in our pipeline. Each of these molecular phenotypes then undergoes phenotype-specific quality control and normalization.\n", "\n", - "[Preprocessing of genotype data](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) begins with the application of variant filters using bcftools ([VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html), [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html)). VCF files are then converted to plink format so that kinship analyses may be performed to identify unrelated individuals. Genetic principal components ([PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html)) and genetic relationship matrices ([GRM](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GRM.html)) are then generated for unrelated samples and genotype files are [formatted](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/genotype_formatting.html) for later generation of quantitative trait loci.\n", + "- [Gene expression quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 RNA-SeQC for gene-level counts, or RSEM for transcript-level counts\n", + " - [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html) \u2014 read alignment and quantification\n", + " - [Expression QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html) \u2014 sample-level and gene-level quality filters\n", + " - [Expression normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html) \u2014 TMM, quantile normalization, inverse-normal transform\n", + "- [Alternative splicing quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) \u2014 leafcutter2 to identify alternatively excised introns\n", + " - [Splicing calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html) \u2014 junction extraction and intron clustering\n", + " - [Splicing normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html) \u2014 ratio normalization across samples\n", + "- [DNA methylation quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) \u2014 SeSAMe\n", + " - [Methylation calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html) \u2014 probe-level beta values\n", "\n", - "[Preprocessing of phenotypic data](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) begins with [annotation of features](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/gene_annotation.html), if required. Missing entries may then be [imputed](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_imputation.html) using a variety of methods included in the pipeline. Last, the phenotypes are [formatted](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_formatting.html) for later generation of quantitative trait loci.\n", + "### 3. Data Pre-Processing\n", "\n", - "[Preprocessing of covariates](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) begins with the [merging](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_formatting.html) of phenotypic data with previously generated genetic principal components. The merged data is then used to [calculate hidden factors](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_hidden_factor.html) which will later be used as additional covariates.\n", + "Pre-processing prepares genotype, phenotype, and covariate data for QTL analysis.\n", + "\n", + "- [Genotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2014 variant filters, format conversion, kinship, and PCA\n", + " - [VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html) \u2014 variant-level quality filters with bcftools\n", + " - [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html) \u2014 additional filters for GWAS-ready variants\n", + " - [PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html) \u2014 genetic principal components for unrelated samples\n", + " - [GRM](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GRM.html) \u2014 genetic relationship matrix\n", + " - [Genotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/genotype_formatting.html) \u2014 convert to analysis-ready format\n", + "- [Phenotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2014 feature annotation, imputation, and formatting\n", + " - [Gene annotation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/gene_annotation.html) \u2014 annotate features with genomic coordinates\n", + " - [Phenotype imputation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_imputation.html) \u2014 impute missing expression values\n", + " - [Phenotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_formatting.html) \u2014 format for QTL analysis\n", + "- [Covariate preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2014 merge phenotypic data with genetic PCs and compute hidden factors\n", + " - [Covariate formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_formatting.html) \u2014 merge and format covariate matrices\n", + " - [Hidden factor estimation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_hidden_factor.html) \u2014 compute latent factors (e.g. PEER) as additional covariates\n", "\n", "### 4. QTL Association Testing\n", "\n", - "[QTL association analysis](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) is conducted with [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html). We include options for cis or trans analysis, with options to include interaction terms. We also support [quantile regression-based QTL analysis](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html) for detecting non-linear effects. [Hierarchical multiple testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) may then be applied to the results to adjust p-values.\n", + "QTL association analysis identifies genetic variants associated with molecular phenotypes.\n", + "\n", + "- [QTL association testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2014 overview and configuration\n", + "- [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html) \u2014 GPU-accelerated cis/trans scans with optional interaction terms\n", + "- [Quantile regression QTL & TWAS](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html) \u2014 detect non-linear genotype\u2013phenotype effects\n", + "- [Association post-processing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) \u2014 hierarchical multiple testing correction\n", "\n", "### 5. Multivariate Mixture Model\n", "\n", - "For multi-context or multi-tissue analyses, we provide a [multivariate mixture model](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) framework based on MASH. This learns a data-driven [mixture prior](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) across contexts and then [fits the model](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mash_fit.html) to estimate effect sizes and posterior probabilities for sharing of eQTLs across tissues.\n", + "For multi-context or multi-tissue analyses, we provide a multivariate mixture model framework based on MASH. This learns a data-driven mixture prior across contexts and estimates effect sizes and posterior probabilities for sharing of eQTLs across tissues.\n", + "\n", + "- [Multivariate mixture vignette](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) \u2014 overview and walkthrough\n", + "- [Mixture prior estimation](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) \u2014 learn data-driven covariance matrices\n", + "- [MASH model fitting](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mash_fit.html) \u2014 fit the model and compute posterior summaries\n", "\n", "### 6. Multiomics Regression Models (Fine-mapping)\n", "\n", - "Our pipeline includes multiple methods for fine-mapping of QTLs. The [multi-omic regression mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) is the recommended starting point for new users. [Univariate fine-mapping and TWAS with SuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) generates TWAS weights and credible sets using SuSiE. [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html) extends this to joint analyses across genes. [Regression with summary statistics](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) allows for the inclusion of summary statistics from GWAS in SuSiE finemapping. [Univariate fine-mapping of functional data](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) uses epigenomic data to fine-map with fSuSiE. [Multivariate fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_fine_mapping_vignette.html) handles multi-context credible sets. Additional modules include [multi-omic multi-trait regression](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html), [RSS analysis](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/rss_analysis.html), and [MNM postprocessing](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_postprocessing.html).\n", + "Our pipeline includes multiple methods for fine-mapping of QTLs. The mini-protocol is the recommended starting point for new users.\n", + "\n", + "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point\n", + "- [Univariate fine-mapping & TWAS (SuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) \u2014 TWAS weights and credible sets\n", + "- [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html) \u2014 joint analysis across genes\n", + "- [Summary statistics fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) \u2014 include GWAS summary stats in SuSiE\n", + "- [Functional fine-mapping (fSuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) \u2014 incorporate epigenomic annotations\n", + "- [Multivariate fine-mapping (mvSuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_fine_mapping_vignette.html) \u2014 multi-context credible sets\n", + "- [Multiomics regression](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html) \u2014 multi-omic multi-trait regression\n", + "- [RSS analysis](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/rss_analysis.html) \u2014 regression with summary statistics\n", + "- [MNM post-processing](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_postprocessing.html) \u2014 aggregate and format fine-mapping results\n", "\n", "### 7. GWAS Integration\n", "\n", - "We include methods for [colocalization analysis](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html). This starts with the generation of prior probabilities followed by pairwise colocalization analysis of xQTL and GWAS fine-mapping results to identify shared causal variants. We also include [TWAS and cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) in our pipeline to identify genes associated with complex traits. An alternative method, [colocboost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html), identifies shared genetic variants influencing multiple molecular traits.\n", + "Methods for integrating xQTL results with GWAS to identify shared causal variants and genes associated with complex traits.\n", + "\n", + "- [Colocalization (SuSiE-enloc)](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html) \u2014 pairwise colocalization of xQTL and GWAS fine-mapping results\n", + "- [TWAS & cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) \u2014 identify genes associated with complex traits\n", + "- [ColocBoost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html) \u2014 shared-variant discovery across multiple molecular traits\n", "\n", "### 8. Enrichment and Validation\n", "\n", - "We utilize an [excess of overlap](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) method to evaluate the enrichment of significant variants within specific genomic annotations. [Pathway enrichment analysis (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) identifies biological pathways that are statistically overrepresented in a given gene set, giving information on potential biological functions, disease relevance, or regulatory mechanisms associated with the gene set. [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) provides annotation-based enrichment testing for regulatory variants. [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) (S-LDSC) is used to quantify the contribution of different genomic functional annotations to the heritability of complex traits and assess their statistical significance. By integrating GWAS summary statistics with genome annotations, S-LDSC distinguishes true polygenic signals from confounding effects.\n", + "Enrichment analyses evaluate whether significant variants or genes are concentrated in specific biological annotations or pathways.\n", + "\n", + "- [Excess-of-overlap enrichment](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) \u2014 enrichment of significant variants within genomic annotations\n", + "- [Gene set enrichment (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) \u2014 overrepresented biological pathways in a gene set\n", + "- [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) \u2014 annotation-based enrichment testing for regulatory variants\n", + "- [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) \u2014 heritability partitioning by functional annotation\n", "\n", "### 9. xQTL Modifier Score (EMS)\n", "\n", - "The xQTL modifier score framework trains a per-variant model for prioritizing regulatory variants. The [EMS training](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_training.html) module fits the model using functional annotation features, and [EMS prediction](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_prediction.html) applies the trained model to score new variants.\n", + "The xQTL modifier score framework trains a per-variant model for prioritizing regulatory variants.\n", "\n", - "### 10. Command Generator\n", + "- [EMS training](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_training.html) \u2014 fit the model using functional annotation features\n", + "- [EMS prediction](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_prediction.html) \u2014 apply the trained model to score new variants\n", "\n", - "The [eQTL analysis command generator](https://statfungen.github.io/xqtl-protocol/code/commands_generator/eQTL_analysis_commands.html) produces the full pipeline commands from a single configuration file \u2014 useful for reproducing a complete run or sharing an analysis recipe without writing SoS commands by hand.\n", + "### 10. Command Generator\n", "\n", + "- [eQTL analysis command generator](https://statfungen.github.io/xqtl-protocol/code/commands_generator/eQTL_analysis_commands.html) \u2014 produce full pipeline commands from a single configuration file\n", "\n", "---\n", "\n", From 68e0b3e115c3e74f62b5f9fb4067e8d9b548319a Mon Sep 17 00:00:00 2001 From: Jenny Date: Fri, 17 Apr 2026 12:15:01 -0400 Subject: [PATCH 4/7] Rewrite getting-started: SoS conda env + pixi install, updated analysis modules --- code/xqtl_protocol_demo.ipynb | 65 ++++++++++++++++------------------- 1 file changed, 30 insertions(+), 35 deletions(-) diff --git a/code/xqtl_protocol_demo.ipynb b/code/xqtl_protocol_demo.ipynb index c62f0c37..12c2423f 100644 --- a/code/xqtl_protocol_demo.ipynb +++ b/code/xqtl_protocol_demo.ipynb @@ -287,9 +287,8 @@ "\n", "### 1. Reference Data\n", "\n", - "Multiple reference data files are required before molecular phenotypes are quantified in samples. These include, but are not limited to, reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains.\n", + "Multiple [reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) files are required before molecular phenotypes are quantified in samples. These include, but are not limited to, reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains.\n", "\n", - "- [Reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 overview and required input files\n", "- [Reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) \u2014 downloading and standardizing reference files\n", "- [Generalized TAD boundaries](https://statfungen.github.io/xqtl-protocol/code/reference_data/generalized_TADB.html) \u2014 topologically associating domain annotations\n", "- [LD reference pruning](https://statfungen.github.io/xqtl-protocol/code/reference_data/ld_prune_reference.html) \u2014 pruned LD reference panels\n", @@ -297,58 +296,54 @@ "\n", "### 2. Molecular Phenotype Quantification\n", "\n", - "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes in our pipeline. Each of these molecular phenotypes then undergoes phenotype-specific quality control and normalization.\n", + "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes in our pipeline. Multiple reference data files are required before molecular phenotypes are quantified in samples. [Quantification of gene expression](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) is conducted with either RNA-SeQC for gene-level counts, or RSEM for transcript-level counts. [Quantification of alternative splicing events](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) is conducted with leafcutter2 to identify alternatively excised introns. [Quantification of DNA methylation](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) is done using SeSAMe. Each of these molecular phenotypes then undergo phenotype specific quality control and normalization.\n", "\n", - "- [Gene expression quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 RNA-SeQC for gene-level counts, or RSEM for transcript-level counts\n", - " - [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html) \u2014 read alignment and quantification\n", - " - [Expression QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html) \u2014 sample-level and gene-level quality filters\n", - " - [Expression normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html) \u2014 TMM, quantile normalization, inverse-normal transform\n", - "- [Alternative splicing quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) \u2014 leafcutter2 to identify alternatively excised introns\n", - " - [Splicing calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html) \u2014 junction extraction and intron clustering\n", - " - [Splicing normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html) \u2014 ratio normalization across samples\n", - "- [DNA methylation quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) \u2014 SeSAMe\n", - " - [Methylation calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html) \u2014 probe-level beta values\n", + "- [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html) \u2014 read alignment and quantification\n", + "- [Expression QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html) \u2014 sample-level and gene-level quality filters\n", + "- [Expression normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html) \u2014 TMM, quantile normalization, inverse-normal transform\n", + "- [Splicing calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html) \u2014 junction extraction and intron clustering\n", + "- [Splicing normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html) \u2014 ratio normalization across samples\n", + "- [Methylation calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html) \u2014 probe-level beta values\n", "\n", "### 3. Data Pre-Processing\n", "\n", - "Pre-processing prepares genotype, phenotype, and covariate data for QTL analysis.\n", - "\n", - "- [Genotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2014 variant filters, format conversion, kinship, and PCA\n", - " - [VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html) \u2014 variant-level quality filters with bcftools\n", - " - [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html) \u2014 additional filters for GWAS-ready variants\n", - " - [PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html) \u2014 genetic principal components for unrelated samples\n", - " - [GRM](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GRM.html) \u2014 genetic relationship matrix\n", - " - [Genotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/genotype_formatting.html) \u2014 convert to analysis-ready format\n", - "- [Phenotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2014 feature annotation, imputation, and formatting\n", - " - [Gene annotation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/gene_annotation.html) \u2014 annotate features with genomic coordinates\n", - " - [Phenotype imputation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_imputation.html) \u2014 impute missing expression values\n", - " - [Phenotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_formatting.html) \u2014 format for QTL analysis\n", - "- [Covariate preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2014 merge phenotypic data with genetic PCs and compute hidden factors\n", - " - [Covariate formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_formatting.html) \u2014 merge and format covariate matrices\n", - " - [Hidden factor estimation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_hidden_factor.html) \u2014 compute latent factors (e.g. PEER) as additional covariates\n", + "[Preprocessing of genotype data](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) begins with the application of variant filters using bcftools. VCF files are then converted to plink format so that kinship analyses may be performed to identify unrelated individuals. Genetic principal components are then generated for unrelated samples and genotype files are formatted for later generation of quantitative trait loci.\n", + "\n", + "[Preprocessing of phenotypic data](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) begins with annotation of features, if required. Missing entries may then be imputed using a variety of methods included in the pipeline. Last, the phenotypes are formatted for later generation of quantitative trait loci.\n", + "\n", + "[Preprocessing of covariates](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) begins with the merging of phenotypic data with previously generated genetic principal components. The merged data is then used to calculate hidden factors which will later be used as additional covariates.\n", + "\n", + "- [VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html) \u2014 variant-level quality filters with bcftools\n", + "- [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html) \u2014 additional filters for GWAS-ready variants\n", + "- [PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html) \u2014 genetic principal components for unrelated samples\n", + "- [GRM](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GRM.html) \u2014 genetic relationship matrix\n", + "- [Genotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/genotype_formatting.html) \u2014 convert to analysis-ready format\n", + "- [Gene annotation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/gene_annotation.html) \u2014 annotate features with genomic coordinates\n", + "- [Phenotype imputation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_imputation.html) \u2014 impute missing expression values\n", + "- [Phenotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_formatting.html) \u2014 format for QTL analysis\n", + "- [Covariate formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_formatting.html) \u2014 merge and format covariate matrices\n", + "- [Hidden factor estimation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_hidden_factor.html) \u2014 compute latent factors as additional covariates\n", "\n", "### 4. QTL Association Testing\n", "\n", - "QTL association analysis identifies genetic variants associated with molecular phenotypes.\n", + "[QTL association analysis](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) is conducted with TensorQTL. We include options for cis or trans analysis, with options to include interaction terms. [Hierarchical multiple testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) may then be applied to the results to adjust p-values.\n", "\n", - "- [QTL association testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2014 overview and configuration\n", "- [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html) \u2014 GPU-accelerated cis/trans scans with optional interaction terms\n", "- [Quantile regression QTL & TWAS](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html) \u2014 detect non-linear genotype\u2013phenotype effects\n", "- [Association post-processing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) \u2014 hierarchical multiple testing correction\n", "\n", "### 5. Multivariate Mixture Model\n", "\n", - "For multi-context or multi-tissue analyses, we provide a multivariate mixture model framework based on MASH. This learns a data-driven mixture prior across contexts and estimates effect sizes and posterior probabilities for sharing of eQTLs across tissues.\n", + "For multi-context or multi-tissue analyses, we provide a [multivariate mixture model](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) framework based on MASH. This learns a data-driven mixture prior across contexts and estimates effect sizes and posterior probabilities for sharing of eQTLs across tissues.\n", "\n", - "- [Multivariate mixture vignette](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) \u2014 overview and walkthrough\n", "- [Mixture prior estimation](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) \u2014 learn data-driven covariance matrices\n", "- [MASH model fitting](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mash_fit.html) \u2014 fit the model and compute posterior summaries\n", "\n", "### 6. Multiomics Regression Models (Fine-mapping)\n", "\n", - "Our pipeline includes multiple methods for fine-mapping of QTLs. The mini-protocol is the recommended starting point for new users.\n", + "Our pipeline includes multiple methods for fine-mapping of QTLs. [Univariate fine-mapping and TWAS with SuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) generates TWAS weights and credible sets using SuSiE. [Regression with summary statistics](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) allows for the inclusion of summary statistics from GWAS in SuSiE fine-mapping. [Univariate fine-mapping of functional data](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) uses epigenomic data to fine-map with fSuSiE.\n", "\n", - "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point\n", + "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point for new users\n", "- [Univariate fine-mapping & TWAS (SuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) \u2014 TWAS weights and credible sets\n", "- [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html) \u2014 joint analysis across genes\n", "- [Summary statistics fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) \u2014 include GWAS summary stats in SuSiE\n", @@ -360,7 +355,7 @@ "\n", "### 7. GWAS Integration\n", "\n", - "Methods for integrating xQTL results with GWAS to identify shared causal variants and genes associated with complex traits.\n", + "We include methods for [colocalization analysis](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html). This starts with the generation of prior probabilities followed by pairwise colocalization analysis of xQTL and GWAS fine-mapping results to identify shared causal variants. We also include [TWAS and cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) in our pipeline to identify genes associated with complex traits. An alternative method, [ColocBoost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html), identifies shared genetic variants influencing multiple molecular traits.\n", "\n", "- [Colocalization (SuSiE-enloc)](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html) \u2014 pairwise colocalization of xQTL and GWAS fine-mapping results\n", "- [TWAS & cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) \u2014 identify genes associated with complex traits\n", @@ -368,7 +363,7 @@ "\n", "### 8. Enrichment and Validation\n", "\n", - "Enrichment analyses evaluate whether significant variants or genes are concentrated in specific biological annotations or pathways.\n", + "We utilize an [excess of overlap](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) method to evaluate the enrichment of significant variants within specific genomic annotations. [Pathway enrichment analysis (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) identifies biological pathways that are statistically overrepresented in a given gene set, giving information on potential biological functions, disease relevance, or regulatory mechanisms associated with the gene set. [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) provides annotation-based enrichment testing for regulatory variants. [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) (S-LDSC) is used to quantify the contribution of different genomic functional annotations to the heritability of complex traits and assess their statistical significance. By integrating GWAS summary statistics with genome annotations, S-LDSC distinguishes true polygenic signals from confounding effects.\n", "\n", "- [Excess-of-overlap enrichment](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) \u2014 enrichment of significant variants within genomic annotations\n", "- [Gene set enrichment (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) \u2014 overrepresented biological pathways in a gene set\n", From 42047691891ab038389d938e341146a59ba10e8c Mon Sep 17 00:00:00 2001 From: Jenny Date: Fri, 17 Apr 2026 12:19:16 -0400 Subject: [PATCH 5/7] Rewrite getting-started: SoS conda env + pixi install, updated analysis modules --- code/xqtl_protocol_demo.ipynb | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/code/xqtl_protocol_demo.ipynb b/code/xqtl_protocol_demo.ipynb index 12c2423f..4963ac1d 100644 --- a/code/xqtl_protocol_demo.ipynb +++ b/code/xqtl_protocol_demo.ipynb @@ -289,6 +289,7 @@ "\n", "Multiple [reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) files are required before molecular phenotypes are quantified in samples. These include, but are not limited to, reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains.\n", "\n", + "- [Reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 overview and required input files \u2b50 *MWE*\n", "- [Reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) \u2014 downloading and standardizing reference files\n", "- [Generalized TAD boundaries](https://statfungen.github.io/xqtl-protocol/code/reference_data/generalized_TADB.html) \u2014 topologically associating domain annotations\n", "- [LD reference pruning](https://statfungen.github.io/xqtl-protocol/code/reference_data/ld_prune_reference.html) \u2014 pruned LD reference panels\n", @@ -298,6 +299,9 @@ "\n", "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes in our pipeline. Multiple reference data files are required before molecular phenotypes are quantified in samples. [Quantification of gene expression](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) is conducted with either RNA-SeQC for gene-level counts, or RSEM for transcript-level counts. [Quantification of alternative splicing events](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) is conducted with leafcutter2 to identify alternatively excised introns. [Quantification of DNA methylation](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) is done using SeSAMe. Each of these molecular phenotypes then undergo phenotype specific quality control and normalization.\n", "\n", + "- [Gene expression quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 RNA-SeQC or RSEM \u2b50 *MWE*\n", + "- [Splicing quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) \u2014 leafcutter2\n", + "- [DNA methylation quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) \u2014 SeSAMe\n", "- [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html) \u2014 read alignment and quantification\n", "- [Expression QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html) \u2014 sample-level and gene-level quality filters\n", "- [Expression normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html) \u2014 TMM, quantile normalization, inverse-normal transform\n", @@ -313,6 +317,9 @@ "\n", "[Preprocessing of covariates](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) begins with the merging of phenotypic data with previously generated genetic principal components. The merged data is then used to calculate hidden factors which will later be used as additional covariates.\n", "\n", + "- [Genotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2014 variant filters, PCA, GRM \u2b50 *MWE*\n", + "- [Phenotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2014 annotation, imputation, formatting \u2b50 *MWE*\n", + "- [Covariate preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2014 merge PCs, hidden factors \u2b50 *MWE*\n", "- [VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html) \u2014 variant-level quality filters with bcftools\n", "- [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html) \u2014 additional filters for GWAS-ready variants\n", "- [PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html) \u2014 genetic principal components for unrelated samples\n", @@ -328,6 +335,7 @@ "\n", "[QTL association analysis](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) is conducted with TensorQTL. We include options for cis or trans analysis, with options to include interaction terms. [Hierarchical multiple testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) may then be applied to the results to adjust p-values.\n", "\n", + "- [QTL association testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2014 overview and configuration \u2b50 *MWE*\n", "- [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html) \u2014 GPU-accelerated cis/trans scans with optional interaction terms\n", "- [Quantile regression QTL & TWAS](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html) \u2014 detect non-linear genotype\u2013phenotype effects\n", "- [Association post-processing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) \u2014 hierarchical multiple testing correction\n", @@ -343,7 +351,7 @@ "\n", "Our pipeline includes multiple methods for fine-mapping of QTLs. [Univariate fine-mapping and TWAS with SuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) generates TWAS weights and credible sets using SuSiE. [Regression with summary statistics](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) allows for the inclusion of summary statistics from GWAS in SuSiE fine-mapping. [Univariate fine-mapping of functional data](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) uses epigenomic data to fine-map with fSuSiE.\n", "\n", - "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point for new users\n", + "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point for new users \u2b50 *MWE* \u2b50 *MWE*\n", "- [Univariate fine-mapping & TWAS (SuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) \u2014 TWAS weights and credible sets\n", "- [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html) \u2014 joint analysis across genes\n", "- [Summary statistics fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) \u2014 include GWAS summary stats in SuSiE\n", From b8c2a1988395047076bb3de528e8ddd4099c7c76 Mon Sep 17 00:00:00 2001 From: Jenny Empawi <80464805+jaempawi@users.noreply.github.com> Date: Fri, 17 Apr 2026 12:21:01 -0400 Subject: [PATCH 6/7] Fix formatting in fine-mapping mini-protocol list --- code/xqtl_protocol_demo.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/code/xqtl_protocol_demo.ipynb b/code/xqtl_protocol_demo.ipynb index 4963ac1d..6249e0d8 100644 --- a/code/xqtl_protocol_demo.ipynb +++ b/code/xqtl_protocol_demo.ipynb @@ -351,7 +351,7 @@ "\n", "Our pipeline includes multiple methods for fine-mapping of QTLs. [Univariate fine-mapping and TWAS with SuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) generates TWAS weights and credible sets using SuSiE. [Regression with summary statistics](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) allows for the inclusion of summary statistics from GWAS in SuSiE fine-mapping. [Univariate fine-mapping of functional data](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) uses epigenomic data to fine-map with fSuSiE.\n", "\n", - "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point for new users \u2b50 *MWE* \u2b50 *MWE*\n", + "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point for new users \u2b50 *MWE*\n", "- [Univariate fine-mapping & TWAS (SuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) \u2014 TWAS weights and credible sets\n", "- [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html) \u2014 joint analysis across genes\n", "- [Summary statistics fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) \u2014 include GWAS summary stats in SuSiE\n", From 730531716a87f96bc6c9491a0c3dd76ae579474e Mon Sep 17 00:00:00 2001 From: Jenny Date: Fri, 17 Apr 2026 12:30:45 -0400 Subject: [PATCH 7/7] Update Getting Started page: pixi install, MWE callout, restructured Analysis sections --- code/xqtl_protocol_demo.ipynb | 162 +++++++++++----------------------- 1 file changed, 51 insertions(+), 111 deletions(-) diff --git a/code/xqtl_protocol_demo.ipynb b/code/xqtl_protocol_demo.ipynb index 6249e0d8..5463fcf8 100644 --- a/code/xqtl_protocol_demo.ipynb +++ b/code/xqtl_protocol_demo.ipynb @@ -10,7 +10,6 @@ "\n", "This page is a guided on-ramp. A minimal toy dataset of **49 de-identified samples** is used throughout the examples so you can try every pipeline end-to-end before running on real data. In about an hour you'll install the environment, clone the repo, download the demo dataset, and run your first cis-QTL scan.\n", "\n", - "\n", ":::{seealso}\n", "**New to the consortium?** Start with [How to use the resource](https://statfungen.github.io/xqtl-protocol/README.html#how-to-use-the-resource) on the homepage for the big-picture background, then come back here to set up.\n", ":::\n", @@ -18,22 +17,6 @@ "\n", "---\n", "\n", - "## At a Glance\n", - "\n", - "The protocol is modular. Each numbered pipeline is a self-contained [SoS (Script of Scripts)](https://vatlab.github.io/sos-docs/) notebook that can run independently or be chained into the full workflow.\n", - "\n", - "| Stage | What it does | Key pipelines |\n", - "|---|---|---|\n", - "| **1. Preprocess** | Clean, normalize, and align inputs | phenotype QC, genotype QC, covariate generation |\n", - "| **2. Discover** | Scan for QTLs | TensorQTL (cis/trans), APEX (interactions) |\n", - "| **3. Fine-map** | Identify credible causal variants | SuSiE, mvSuSiE, fSuSiE |\n", - "| **4. Integrate** | Link QTLs to disease and biology | coloc, cTWAS, GWAS integration, enrichment |\n", - "\n", - "Full details with links to every mini-protocol are further down in [Analysis](#analysis). For now, let's get you set up.\n", - "\n", - "\n", - "---\n", - "\n", "## Before You Start\n", "\n", "You'll need a Linux or macOS shell. Windows users: install [WSL2](https://learn.microsoft.com/windows/wsl/install) first, then follow the Linux path.\n", @@ -58,45 +41,17 @@ "\n", "If you don't have conda yet, install [Miniforge](https://github.com/conda-forge/miniforge) (recommended) or [Anaconda](https://www.anaconda.com/download).\n", "\n", - "**Create and activate a new environment:**\n", - "\n", "```bash\n", + "# Create and activate a new environment\n", "conda create -n sos python=3.12 -y\n", "conda activate sos\n", - "```\n", - "\n", - "**Install the SoS Workflow engine and cluster support:**\n", - "\n", - "```bash\n", - "conda install -c conda-forge sos sos-pbs\n", - "```\n", - "\n", - "- `sos` \u2014 the core SoS workflow engine for running pipelines from the command line\n", - "- `sos-pbs` \u2014 task queue support for submitting jobs to HPC schedulers (SLURM, PBS, LSF, SGE)\n", "\n", - "**Install SoS Notebook and extensions:**\n", + "# Install the full SoS suite\n", + "conda install -c conda-forge \\\n", + " sos sos-pbs sos-notebook jupyterlab-sos sos-papermill \\\n", + " sos-bash sos-python sos-r\n", "\n", - "```bash\n", - "conda install -c conda-forge sos-notebook jupyterlab-sos sos-papermill\n", - "```\n", - "\n", - "- `sos-notebook` \u2014 the SoS kernel for Jupyter, enabling multi-language notebooks\n", - "- `jupyterlab-sos` \u2014 JupyterLab extension for the SoS Notebook interface\n", - "- `sos-papermill` \u2014 Papermill extension for running SoS notebooks non-interactively from the command line\n", - "\n", - "**Install language modules:**\n", - "\n", - "```bash\n", - "conda install -c conda-forge sos-r sos-python sos-bash\n", - "```\n", - "\n", - "- `sos-r` \u2014 R language module; this will also install `r-base`, `r-irkernel`, and required R libraries automatically\n", - "- `sos-python` \u2014 Python language module for cross-kernel data exchange\n", - "- `sos-bash` \u2014 Bash language module for shell commands within SoS notebooks\n", - "\n", - "**Register the SoS kernel with Jupyter:**\n", - "\n", - "```bash\n", + "# Register the SoS kernel with Jupyter\n", "python -m sos_notebook.install\n", "```\n", "\n", @@ -161,8 +116,6 @@ "pixi --version\n", "```\n", "\n", - "You should see a version number. If not, open a fresh terminal.\n", - "\n", ":::{warning}\n", "**On HPC**, run the installer from a compute node with at least 50 GB of memory, not the login node. The install process can be memory-intensive and may be killed on login nodes:\n", "\n", @@ -269,7 +222,7 @@ "source": [ "## Analysis\n", "\n", - "Please visit [the homepage of the protocol website](https://statfungen.github.io/xqtl-protocol/) for the general background on this resource, in particular the [How to use the resource](https://statfungen.github.io/xqtl-protocol/README.html#how-to-use-the-resource) section. To perform a complete analysis from molecular phenotype quantification to xQTL discovery, please conduct your analysis in the order listed below, each link contains a mini-protocol for a specific task. All commands documented in each mini-protocol should be executed in the command line environment.\n", + "Please visit [the homepage of the protocol website](https://statfungen.github.io/xqtl-protocol/) for the general background on this resource, in particular the [How to use the resource](https://statfungen.github.io/xqtl-protocol/README.html#how-to-use-the-resource) section. To perform a complete analysis from molecular phenotype quantification to xQTL discovery, conduct your analysis in the order listed below. Each link contains a mini-protocol for a specific task, and all commands should be executed from the command line.\n", "\n", ":::{important}\n", "**Minimum Working Example \u2014 new users, start here.**\n", @@ -287,7 +240,7 @@ "\n", "### 1. Reference Data\n", "\n", - "Multiple [reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) files are required before molecular phenotypes are quantified in samples. These include, but are not limited to, reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains.\n", + "Multiple reference data files are required before molecular phenotypes are quantified \u2014 reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains.\n", "\n", "- [Reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 overview and required input files \u2b50 *MWE*\n", "- [Reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) \u2014 downloading and standardizing reference files\n", @@ -297,103 +250,90 @@ "\n", "### 2. Molecular Phenotype Quantification\n", "\n", - "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes in our pipeline. Multiple reference data files are required before molecular phenotypes are quantified in samples. [Quantification of gene expression](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) is conducted with either RNA-SeQC for gene-level counts, or RSEM for transcript-level counts. [Quantification of alternative splicing events](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) is conducted with leafcutter2 to identify alternatively excised introns. [Quantification of DNA methylation](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) is done using SeSAMe. Each of these molecular phenotypes then undergo phenotype specific quality control and normalization.\n", + "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes. Quantification of gene expression is conducted with either RNA-SeQC for gene-level counts, or RSEM for transcript-level counts. Quantification of alternative splicing events is conducted with leafcutter2 to identify alternatively excised introns. Quantification of DNA methylation is done using SeSAMe. Each phenotype then undergoes phenotype-specific quality control and normalization.\n", "\n", - "- [Gene expression quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 RNA-SeQC or RSEM \u2b50 *MWE*\n", - "- [Splicing quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) \u2014 leafcutter2\n", - "- [DNA methylation quantification](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) \u2014 SeSAMe\n", - "- [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html) \u2014 read alignment and quantification\n", - "- [Expression QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html) \u2014 sample-level and gene-level quality filters\n", - "- [Expression normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html) \u2014 TMM, quantile normalization, inverse-normal transform\n", - "- [Splicing calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html) \u2014 junction extraction and intron clustering\n", - "- [Splicing normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html) \u2014 ratio normalization across samples\n", - "- [Methylation calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html) \u2014 probe-level beta values\n", + "- [Gene expression (RNA-seq)](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 RNA-SeQC or RSEM \u2b50 *MWE*\n", + " - [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html), [Expression QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html), [Normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html)\n", + "- [Alternative splicing](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) \u2014 leafcutter2\n", + " - [Splicing calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html), [Splicing normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html)\n", + "- [DNA methylation](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) \u2014 SeSAMe\n", + " - [Methylation calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html)\n", "\n", "### 3. Data Pre-Processing\n", "\n", - "[Preprocessing of genotype data](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) begins with the application of variant filters using bcftools. VCF files are then converted to plink format so that kinship analyses may be performed to identify unrelated individuals. Genetic principal components are then generated for unrelated samples and genotype files are formatted for later generation of quantitative trait loci.\n", + "Preprocessing of genotype data begins with the application of variant filters using bcftools. VCF files are then converted to plink format so that kinship analyses may be performed to identify unrelated individuals. Genetic principal components are then generated for unrelated samples and genotype files are formatted for QTL analysis. Preprocessing of phenotypic data begins with annotation of features, followed by imputation of missing entries and formatting. Preprocessing of covariates merges phenotypic data with genetic principal components, then computes hidden factors to use as additional covariates.\n", "\n", - "[Preprocessing of phenotypic data](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) begins with annotation of features, if required. Missing entries may then be imputed using a variety of methods included in the pipeline. Last, the phenotypes are formatted for later generation of quantitative trait loci.\n", - "\n", - "[Preprocessing of covariates](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) begins with the merging of phenotypic data with previously generated genetic principal components. The merged data is then used to calculate hidden factors which will later be used as additional covariates.\n", - "\n", - "- [Genotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2014 variant filters, PCA, GRM \u2b50 *MWE*\n", - "- [Phenotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2014 annotation, imputation, formatting \u2b50 *MWE*\n", - "- [Covariate preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2014 merge PCs, hidden factors \u2b50 *MWE*\n", - "- [VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html) \u2014 variant-level quality filters with bcftools\n", - "- [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html) \u2014 additional filters for GWAS-ready variants\n", - "- [PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html) \u2014 genetic principal components for unrelated samples\n", - "- [GRM](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GRM.html) \u2014 genetic relationship matrix\n", - "- [Genotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/genotype_formatting.html) \u2014 convert to analysis-ready format\n", - "- [Gene annotation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/gene_annotation.html) \u2014 annotate features with genomic coordinates\n", - "- [Phenotype imputation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_imputation.html) \u2014 impute missing expression values\n", - "- [Phenotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_formatting.html) \u2014 format for QTL analysis\n", - "- [Covariate formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_formatting.html) \u2014 merge and format covariate matrices\n", - "- [Hidden factor estimation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_hidden_factor.html) \u2014 compute latent factors as additional covariates\n", + "- [Genotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2b50 *MWE*\n", + " - [VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html), [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html), [PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html), [GRM](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GRM.html), [Genotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/genotype_formatting.html)\n", + "- [Phenotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2b50 *MWE*\n", + " - [Gene annotation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/gene_annotation.html), [Phenotype imputation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_imputation.html), [Phenotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_formatting.html)\n", + "- [Covariate preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2b50 *MWE*\n", + " - [Covariate formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_formatting.html), [Hidden factor estimation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_hidden_factor.html)\n", "\n", "### 4. QTL Association Testing\n", "\n", - "[QTL association analysis](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) is conducted with TensorQTL. We include options for cis or trans analysis, with options to include interaction terms. [Hierarchical multiple testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) may then be applied to the results to adjust p-values.\n", + "QTL association analysis is conducted with TensorQTL. We include options for cis or trans analysis, with options to include interaction terms. Hierarchical multiple testing may then be applied to adjust p-values.\n", "\n", - "- [QTL association testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2014 overview and configuration \u2b50 *MWE*\n", - "- [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html) \u2014 GPU-accelerated cis/trans scans with optional interaction terms\n", - "- [Quantile regression QTL & TWAS](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html) \u2014 detect non-linear genotype\u2013phenotype effects\n", + "- [QTL association testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2b50 *MWE*\n", + " - [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html) \u2014 cis/trans scans with optional interaction terms\n", + " - [Quantile regression QTL & TWAS](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html) \u2014 non-linear genotype-phenotype effects\n", "- [Association post-processing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) \u2014 hierarchical multiple testing correction\n", "\n", "### 5. Multivariate Mixture Model\n", "\n", - "For multi-context or multi-tissue analyses, we provide a [multivariate mixture model](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) framework based on MASH. This learns a data-driven mixture prior across contexts and estimates effect sizes and posterior probabilities for sharing of eQTLs across tissues.\n", + "For multi-context or multi-tissue analyses, we provide a multivariate mixture model framework based on MASH. This learns a data-driven mixture prior across contexts and estimates effect sizes and posterior probabilities for sharing of eQTLs across tissues.\n", "\n", - "- [Mixture prior estimation](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) \u2014 learn data-driven covariance matrices\n", + "- [Multivariate mixture vignette](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) \u2014 overview and walkthrough\n", + "- [Mixture prior estimation (MASH)](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) \u2014 learn data-driven covariance matrices\n", "- [MASH model fitting](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mash_fit.html) \u2014 fit the model and compute posterior summaries\n", "\n", "### 6. Multiomics Regression Models (Fine-mapping)\n", "\n", - "Our pipeline includes multiple methods for fine-mapping of QTLs. [Univariate fine-mapping and TWAS with SuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) generates TWAS weights and credible sets using SuSiE. [Regression with summary statistics](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) allows for the inclusion of summary statistics from GWAS in SuSiE fine-mapping. [Univariate fine-mapping of functional data](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) uses epigenomic data to fine-map with fSuSiE.\n", + "Our pipeline includes multiple methods for fine-mapping of QTLs. Univariate fine-mapping and TWAS with SuSiE generates TWAS weights and credible sets. Regression with summary statistics allows inclusion of GWAS summary stats in SuSiE fine-mapping. Univariate fine-mapping of functional data uses epigenomic annotations with fSuSiE.\n", "\n", - "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point for new users \u2b50 *MWE*\n", - "- [Univariate fine-mapping & TWAS (SuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html) \u2014 TWAS weights and credible sets\n", - "- [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html) \u2014 joint analysis across genes\n", - "- [Summary statistics fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html) \u2014 include GWAS summary stats in SuSiE\n", - "- [Functional fine-mapping (fSuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) \u2014 incorporate epigenomic annotations\n", - "- [Multivariate fine-mapping (mvSuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_fine_mapping_vignette.html) \u2014 multi-context credible sets\n", - "- [Multiomics regression](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html) \u2014 multi-omic multi-trait regression\n", - "- [RSS analysis](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/rss_analysis.html) \u2014 regression with summary statistics\n", - "- [MNM post-processing](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_postprocessing.html) \u2014 aggregate and format fine-mapping results\n", + "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point \u2b50 *MWE*\n", + "- [Univariate fine-mapping & TWAS (SuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html)\n", + "- [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html)\n", + "- [Summary statistics fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html)\n", + "- [Functional fine-mapping (fSuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html)\n", + "- [Multivariate fine-mapping (mvSuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_fine_mapping_vignette.html)\n", + "- [Multiomics regression](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html) and [RSS analysis](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/rss_analysis.html)\n", + "- [MNM post-processing](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_postprocessing.html)\n", "\n", "### 7. GWAS Integration\n", "\n", - "We include methods for [colocalization analysis](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html). This starts with the generation of prior probabilities followed by pairwise colocalization analysis of xQTL and GWAS fine-mapping results to identify shared causal variants. We also include [TWAS and cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) in our pipeline to identify genes associated with complex traits. An alternative method, [ColocBoost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html), identifies shared genetic variants influencing multiple molecular traits.\n", + "We include methods for colocalization analysis, starting with the generation of prior probabilities followed by pairwise colocalization of xQTL and GWAS fine-mapping results to identify shared causal variants. We also include TWAS and cTWAS to identify genes associated with complex traits.\n", "\n", - "- [Colocalization (SuSiE-enloc)](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html) \u2014 pairwise colocalization of xQTL and GWAS fine-mapping results\n", - "- [TWAS & cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) \u2014 identify genes associated with complex traits\n", - "- [ColocBoost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html) \u2014 shared-variant discovery across multiple molecular traits\n", + "- [Colocalization (SuSiE-enloc)](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html) \u2014 pairwise xQTL-GWAS colocalization\n", + "- [TWAS & cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) \u2014 genes associated with complex traits\n", + "- [ColocBoost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html) \u2014 shared-variant discovery across molecular traits\n", "\n", "### 8. Enrichment and Validation\n", "\n", - "We utilize an [excess of overlap](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) method to evaluate the enrichment of significant variants within specific genomic annotations. [Pathway enrichment analysis (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) identifies biological pathways that are statistically overrepresented in a given gene set, giving information on potential biological functions, disease relevance, or regulatory mechanisms associated with the gene set. [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) provides annotation-based enrichment testing for regulatory variants. [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) (S-LDSC) is used to quantify the contribution of different genomic functional annotations to the heritability of complex traits and assess their statistical significance. By integrating GWAS summary statistics with genome annotations, S-LDSC distinguishes true polygenic signals from confounding effects.\n", + "We utilize an excess of overlap method to evaluate the enrichment of significant variants within specific genomic annotations. Pathway enrichment analysis identifies biological pathways that are statistically overrepresented in a given gene set. Stratified LD Score Regression (S-LDSC) quantifies the contribution of genomic functional annotations to heritability of complex traits. By integrating GWAS summary statistics with genome annotations, S-LDSC distinguishes true polygenic signals from confounding effects.\n", "\n", - "- [Excess-of-overlap enrichment](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) \u2014 enrichment of significant variants within genomic annotations\n", - "- [Gene set enrichment (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) \u2014 overrepresented biological pathways in a gene set\n", - "- [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) \u2014 annotation-based enrichment testing for regulatory variants\n", - "- [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) \u2014 heritability partitioning by functional annotation\n", + "- [Excess-of-overlap enrichment](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) \u2014 variant enrichment in genomic annotations\n", + "- [Gene set enrichment (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) \u2014 overrepresented biological pathways\n", + "- [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) \u2014 annotation-based enrichment for regulatory variants\n", + "- [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) \u2014 heritability partitioning by annotation\n", "\n", "### 9. xQTL Modifier Score (EMS)\n", "\n", "The xQTL modifier score framework trains a per-variant model for prioritizing regulatory variants.\n", "\n", "- [EMS training](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_training.html) \u2014 fit the model using functional annotation features\n", - "- [EMS prediction](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_prediction.html) \u2014 apply the trained model to score new variants\n", + "- [EMS prediction](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_prediction.html) \u2014 score new variants\n", "\n", "### 10. Command Generator\n", "\n", "- [eQTL analysis command generator](https://statfungen.github.io/xqtl-protocol/code/commands_generator/eQTL_analysis_commands.html) \u2014 produce full pipeline commands from a single configuration file\n", "\n", + "\n", "---\n", "\n", "## Software Environment\n", "\n", - "Every protocol on this site runs inside the pixi environment configured in Steps 1\u20132. Once pixi and SoS are installed, each example \"just works\" \u2014 no per-pipeline container, no manual dependency wrangling.\n", + "Every protocol on this site runs inside the pixi environment configured in Steps 1-2. Once pixi and SoS are installed, each example \"just works\" \u2014 no per-pipeline container, no manual dependency wrangling.\n", "\n", "Need something extra? Install it into the right pixi environment:\n", "\n", @@ -431,7 +371,7 @@ "\n", "**Installer killed on HPC** \u2014 you're on a login node. Request a compute node with \u2265 50 GB memory and re-run.\n", "\n", - "**`sos: command not found`** \u2014 Step 2 didn't complete. Re-run the `pixi global install` command for SoS.\n", + "**`sos: command not found`** \u2014 Step 1 didn't complete. Re-run the `conda install` command for SoS.\n", "\n", "**`ModuleNotFoundError` during a pipeline** \u2014 install the missing package into pixi's python env with the command above.\n", "\n",