diff --git a/code/xqtl_protocol_demo.ipynb b/code/xqtl_protocol_demo.ipynb index 35dbb303..5463fcf8 100644 --- a/code/xqtl_protocol_demo.ipynb +++ b/code/xqtl_protocol_demo.ipynb @@ -10,12 +10,6 @@ "\n", "This page is a guided on-ramp. A minimal toy dataset of **49 de-identified samples** is used throughout the examples so you can try every pipeline end-to-end before running on real data. In about an hour you'll install the environment, clone the repo, download the demo dataset, and run your first cis-QTL scan.\n", "\n", - "```{image} images/complete_workflow.png\n", - ":alt: FunGen-xQTL analysis workflow\n", - ":align: center\n", - ":width: 90%\n", - "```\n", - "\n", ":::{seealso}\n", "**New to the consortium?** Start with [How to use the resource](https://statfungen.github.io/xqtl-protocol/README.html#how-to-use-the-resource) on the homepage for the big-picture background, then come back here to set up.\n", ":::\n", @@ -23,22 +17,6 @@ "\n", "---\n", "\n", - "## At a Glance\n", - "\n", - "The protocol is modular. Each numbered pipeline is a self-contained [SoS (Script of Scripts)](https://vatlab.github.io/sos-docs/) notebook that can run independently or be chained into the full workflow.\n", - "\n", - "| Stage | What it does | Key pipelines |\n", - "|---|---|---|\n", - "| **1. Preprocess** | Clean, normalize, and align inputs | phenotype QC, genotype QC, covariate generation |\n", - "| **2. Discover** | Scan for QTLs | TensorQTL (cis/trans), APEX (interactions) |\n", - "| **3. Fine-map** | Identify credible causal variants | SuSiE, mvSuSiE, fSuSiE |\n", - "| **4. Integrate** | Link QTLs to disease and biology | coloc, cTWAS, GWAS integration, enrichment |\n", - "\n", - "Full details with links to every mini-protocol are further down in [Analysis](#analysis). For now, let's get you set up.\n", - "\n", - "\n", - "---\n", - "\n", "## Before You Start\n", "\n", "You'll need a Linux or macOS shell. Windows users: install [WSL2](https://learn.microsoft.com/windows/wsl/install) first, then follow the Linux path.\n", @@ -64,7 +42,7 @@ "If you don't have conda yet, install [Miniforge](https://github.com/conda-forge/miniforge) (recommended) or [Anaconda](https://www.anaconda.com/download).\n", "\n", "```bash\n", - "# Create and activate a new environment for SoS\n", + "# Create and activate a new environment\n", "conda create -n sos python=3.12 -y\n", "conda activate sos\n", "\n", @@ -138,8 +116,6 @@ "pixi --version\n", "```\n", "\n", - "You should see a version number. If not, open a fresh terminal.\n", - "\n", ":::{warning}\n", "**On HPC**, run the installer from a compute node with at least 50 GB of memory, not the login node. The install process can be memory-intensive and may be killed on login nodes:\n", "\n", @@ -246,102 +222,118 @@ "source": [ "## Analysis\n", "\n", - "With the environment set up, here's the full protocol in order. Each link is a self-contained mini-protocol; all commands in them should be executed from the command line with `sos run pipeline/.ipynb ...`.\n", + "Please visit [the homepage of the protocol website](https://statfungen.github.io/xqtl-protocol/) for the general background on this resource, in particular the [How to use the resource](https://statfungen.github.io/xqtl-protocol/README.html#how-to-use-the-resource) section. To perform a complete analysis from molecular phenotype quantification to xQTL discovery, conduct your analysis in the order listed below. Each link contains a mini-protocol for a specific task, and all commands should be executed from the command line.\n", "\n", ":::{important}\n", - "**Minimum Working Example (MWE) \u2014 new users, start here.**\n", + "**Minimum Working Example \u2014 new users, start here.**\n", "\n", - "Every module in the repo ships a minimal `MWE`-prefixed test dataset under [Synapse `syn36416559`](https://www.synapse.org/#!Synapse:syn36416559/files/). To go end-to-end on the demo data, run these **five** pipelines in order and skip everything else on the first pass:\n", + "Every module ships a minimal test dataset (prefixed with `MWE`) under [Synapse `syn36416559`](https://www.synapse.org/#!Synapse:syn36416559/files/). To go end-to-end on the demo data, run these five pipelines in order and skip everything else on the first pass:\n", "\n", - "1. [`reference_data.ipynb`](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 pull the standardized reference files\n", - "2. [`bulk_expression.ipynb`](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 quantify gene expression (MWE default)\n", - "3. [`genotype_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2192 [`phenotype_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2192 [`covariate_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2014 QC + normalization\n", - "4. [`qtl_association_testing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2014 run cis-QTL with TensorQTL\n", - "5. [`mnm_miniprotocol.ipynb`](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 single-trait fine-mapping + TWAS with SuSiE\n", + "1. [`reference_data.ipynb`](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 prepare standardized reference files\n", + "2. [`bulk_expression.ipynb`](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 quantify gene expression\n", + "3. [`genotype_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2192 [`phenotype_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2192 [`covariate_preprocessing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2014 QC and normalization\n", + "4. [`qtl_association_testing.ipynb`](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2014 cis-QTL with TensorQTL\n", + "5. [`mnm_miniprotocol.ipynb`](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 fine-mapping + TWAS with SuSiE\n", "\n", - "Once this pass completes cleanly, branch out to the additional modules below (methylation, splicing, multivariate mixture, GWAS integration, enrichment, EMS) based on what your project needs.\n", + "Once this pass completes, branch out to the additional modules below based on what your project needs.\n", ":::\n", "\n", "### 1. Reference Data\n", "\n", - "Before quantifying phenotypes, set up the standardized reference files \u2014 genomes, gene annotations, variant annotations, LD maps, and topologically associated domains.\n", + "Multiple reference data files are required before molecular phenotypes are quantified \u2014 reference genomes, gene annotations, variant annotations, linkage disequilibrium data and topologically associated domains.\n", "\n", - "- [Reference data setup](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 main entry point \u2b50 *MWE*\n", - "- [Reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) \u2014 detailed preparation steps\n", - "- [Generalized TAD-B](https://statfungen.github.io/xqtl-protocol/code/reference_data/generalized_TADB.html) \u2014 TAD boundaries for analysis windows\n", - "- [LD reference pruning](https://statfungen.github.io/xqtl-protocol/code/reference_data/ld_prune_reference.html) and [RSS LD sketching](https://statfungen.github.io/xqtl-protocol/code/reference_data/rss_ld_sketch.html) \u2014 advanced LD utilities\n", + "- [Reference data](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data.html) \u2014 overview and required input files \u2b50 *MWE*\n", + "- [Reference data preparation](https://statfungen.github.io/xqtl-protocol/code/reference_data/reference_data_preparation.html) \u2014 downloading and standardizing reference files\n", + "- [Generalized TAD boundaries](https://statfungen.github.io/xqtl-protocol/code/reference_data/generalized_TADB.html) \u2014 topologically associating domain annotations\n", + "- [LD reference pruning](https://statfungen.github.io/xqtl-protocol/code/reference_data/ld_prune_reference.html) \u2014 pruned LD reference panels\n", + "- [RSS LD sketching](https://statfungen.github.io/xqtl-protocol/code/reference_data/rss_ld_sketch.html) \u2014 LD matrix sketches for summary-statistics methods\n", "\n", - "### 2. Molecular Phenotypes\n", + "### 2. Molecular Phenotype Quantification\n", "\n", - "We support bulk RNA-seq, DNA methylation, and alternative splicing phenotypes. Each path has its own calling, QC, and normalization steps.\n", + "Molecular phenotypic data is required for the generation of QTLs. We support bulk RNA-Seq, methylation and splicing phenotypes. Quantification of gene expression is conducted with either RNA-SeQC for gene-level counts, or RSEM for transcript-level counts. Quantification of alternative splicing events is conducted with leafcutter2 to identify alternatively excised introns. Quantification of DNA methylation is done using SeSAMe. Each phenotype then undergoes phenotype-specific quality control and normalization.\n", "\n", - "- **Bulk RNA-seq** \u2014 [bulk_expression mini-protocol](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2b50 *MWE*, with sub-modules for [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html), [QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html), and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html)\n", - "- **DNA methylation** \u2014 [methylation mini-protocol](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) with [methylation calling via SeSAMe](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html)\n", - "- **Alternative splicing** \u2014 [splicing mini-protocol](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) with [splicing calling via leafcutter2](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html) and [normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html)\n", + "- [Gene expression (RNA-seq)](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/bulk_expression.html) \u2014 RNA-SeQC or RSEM \u2b50 *MWE*\n", + " - [RNA calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/RNA_calling.html), [Expression QC](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_QC.html), [Normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/bulk_expression_normalization.html)\n", + "- [Alternative splicing](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/splicing.html) \u2014 leafcutter2\n", + " - [Splicing calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/splicing_calling.html), [Splicing normalization](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/QC/splicing_normalization.html)\n", + "- [DNA methylation](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/methylation.html) \u2014 SeSAMe\n", + " - [Methylation calling](https://statfungen.github.io/xqtl-protocol/code/molecular_phenotypes/calling/methylation_calling.html)\n", "\n", - "### 3. Data Pre-processing\n", + "### 3. Data Pre-Processing\n", "\n", - "- [Genotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2b50 *MWE* \u2014 VCF QC, GWAS QC, PCA, GRM, plink formatting\n", - "- [Phenotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2b50 *MWE* \u2014 gene annotation, imputation, formatting\n", - "- [Covariate preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2b50 *MWE* \u2014 merge genetic PCs with phenotypes, compute hidden factors\n", + "Preprocessing of genotype data begins with the application of variant filters using bcftools. VCF files are then converted to plink format so that kinship analyses may be performed to identify unrelated individuals. Genetic principal components are then generated for unrelated samples and genotype files are formatted for QTL analysis. Preprocessing of phenotypic data begins with annotation of features, followed by imputation of missing entries and formatting. Preprocessing of covariates merges phenotypic data with genetic principal components, then computes hidden factors to use as additional covariates.\n", + "\n", + "- [Genotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype_preprocessing.html) \u2b50 *MWE*\n", + " - [VCF QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/VCF_QC.html), [GWAS QC](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GWAS_QC.html), [PCA](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/PCA.html), [GRM](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/GRM.html), [Genotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/genotype/genotype_formatting.html)\n", + "- [Phenotype preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype_preprocessing.html) \u2b50 *MWE*\n", + " - [Gene annotation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/gene_annotation.html), [Phenotype imputation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_imputation.html), [Phenotype formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/phenotype/phenotype_formatting.html)\n", + "- [Covariate preprocessing](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate_preprocessing.html) \u2b50 *MWE*\n", + " - [Covariate formatting](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_formatting.html), [Hidden factor estimation](https://statfungen.github.io/xqtl-protocol/code/data_preprocessing/covariate/covariate_hidden_factor.html)\n", "\n", "### 4. QTL Association Testing\n", "\n", - "- [QTL association testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2b50 *MWE* \u2014 [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html) scans (cis, trans, interaction) and [quantile regression QTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html)\n", - "- [Association postprocessing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) \u2014 hierarchical multiple testing and p-value adjustment\n", + "QTL association analysis is conducted with TensorQTL. We include options for cis or trans analysis, with options to include interaction terms. Hierarchical multiple testing may then be applied to adjust p-values.\n", + "\n", + "- [QTL association testing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_testing.html) \u2b50 *MWE*\n", + " - [TensorQTL](https://statfungen.github.io/xqtl-protocol/code/association_scan/TensorQTL/TensorQTL.html) \u2014 cis/trans scans with optional interaction terms\n", + " - [Quantile regression QTL & TWAS](https://statfungen.github.io/xqtl-protocol/code/association_scan/quantile_models/qr_and_twas.html) \u2014 non-linear genotype-phenotype effects\n", + "- [Association post-processing](https://statfungen.github.io/xqtl-protocol/code/association_scan/qtl_association_postprocessing.html) \u2014 hierarchical multiple testing correction\n", "\n", "### 5. Multivariate Mixture Model\n", "\n", - "Learn a data-driven mixture prior across contexts/tissues for multivariate fine-mapping.\n", + "For multi-context or multi-tissue analyses, we provide a multivariate mixture model framework based on MASH. This learns a data-driven mixture prior across contexts and estimates effect sizes and posterior probabilities for sharing of eQTLs across tissues.\n", "\n", - "- [Multivariate mixture vignette](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) \u2014 overview\n", - "- [Mixture prior with MASH](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) and [MASH fit](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mash_fit.html) \u2014 data-driven prior estimation\n", + "- [Multivariate mixture vignette](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/multivariate_mixture_vignette.html) \u2014 overview and walkthrough\n", + "- [Mixture prior estimation (MASH)](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mixture_prior.html) \u2014 learn data-driven covariance matrices\n", + "- [MASH model fitting](https://statfungen.github.io/xqtl-protocol/code/multivariate_genome/MASH/mash_fit.html) \u2014 fit the model and compute posterior summaries\n", "\n", - "### 6. Multiomics Regression Models\n", + "### 6. Multiomics Regression Models (Fine-mapping)\n", "\n", - "Fine-mapping and multi-context regression \u2014 the core of the post-discovery analysis.\n", + "Our pipeline includes multiple methods for fine-mapping of QTLs. Univariate fine-mapping and TWAS with SuSiE generates TWAS weights and credible sets. Regression with summary statistics allows inclusion of GWAS summary stats in SuSiE fine-mapping. Univariate fine-mapping of functional data uses epigenomic annotations with fSuSiE.\n", "\n", - "- [Multi-omic regression mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2b50 *MWE* \u2014 start here\n", - "- [Univariate fine-mapping + TWAS with SuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html)\n", + "- [Fine-mapping mini-protocol](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_miniprotocol.html) \u2014 recommended starting point \u2b50 *MWE*\n", + "- [Univariate fine-mapping & TWAS (SuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_twas_vignette.html)\n", "- [Multivariate multi-gene fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_multigene_fine_mapping_vignette.html)\n", - "- [Univariate fine-mapping with fSuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html) \u2014 functional / epigenomic data\n", - "- [Multivariate fine-mapping vignette](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_fine_mapping_vignette.html)\n", - "- [Summary-statistics fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html)\n", - "- [Multi-omic multi-trait regression](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html) and [RSS analysis](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/rss_analysis.html)\n", - "- [MNM postprocessing](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_postprocessing.html)\n", + "- [Summary statistics fine-mapping](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/summary_stats_finemapping_vignette.html)\n", + "- [Functional fine-mapping (fSuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/univariate_fine_mapping_fsusie_vignette.html)\n", + "- [Multivariate fine-mapping (mvSuSiE)](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/multivariate_fine_mapping_vignette.html)\n", + "- [Multiomics regression](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html) and [RSS analysis](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/rss_analysis.html)\n", + "- [MNM post-processing](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_postprocessing.html)\n", "\n", "### 7. GWAS Integration\n", "\n", - "Link xQTL signals to disease-associated loci.\n", + "We include methods for colocalization analysis, starting with the generation of prior probabilities followed by pairwise colocalization of xQTL and GWAS fine-mapping results to identify shared causal variants. We also include TWAS and cTWAS to identify genes associated with complex traits.\n", "\n", - "- [SuSiE-enloc colocalization](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html) \u2014 pairwise colocalization of xQTL and GWAS fine-mapping\n", - "- [TWAS / cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) \u2014 causal TWAS for complex traits\n", - "- [Colocboost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html) \u2014 shared-variant discovery across multiple molecular traits\n", + "- [Colocalization (SuSiE-enloc)](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/SuSiE_enloc.html) \u2014 pairwise xQTL-GWAS colocalization\n", + "- [TWAS & cTWAS](https://statfungen.github.io/xqtl-protocol/code/pecotmr_integration/twas_ctwas.html) \u2014 genes associated with complex traits\n", + "- [ColocBoost](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html) \u2014 shared-variant discovery across molecular traits\n", "\n", "### 8. Enrichment and Validation\n", "\n", - "- [Excess-of-overlap enrichment](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) \u2014 significance of variants in annotation sets\n", - "- [Pathway enrichment (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html)\n", - "- [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) \u2014 annotation-based enrichment for significant variants\n", + "We utilize an excess of overlap method to evaluate the enrichment of significant variants within specific genomic annotations. Pathway enrichment analysis identifies biological pathways that are statistically overrepresented in a given gene set. Stratified LD Score Regression (S-LDSC) quantifies the contribution of genomic functional annotations to heritability of complex traits. By integrating GWAS summary statistics with genome annotations, S-LDSC distinguishes true polygenic signals from confounding effects.\n", + "\n", + "- [Excess-of-overlap enrichment](https://statfungen.github.io/xqtl-protocol/code/enrichment/eoo_enrichment.html) \u2014 variant enrichment in genomic annotations\n", + "- [Gene set enrichment (GSEA)](https://statfungen.github.io/xqtl-protocol/code/enrichment/gsea.html) \u2014 overrepresented biological pathways\n", + "- [GREGOR](https://statfungen.github.io/xqtl-protocol/code/enrichment/gregor.html) \u2014 annotation-based enrichment for regulatory variants\n", "- [Stratified LD Score Regression](https://statfungen.github.io/xqtl-protocol/code/enrichment/sldsc_enrichment.html) \u2014 heritability partitioning by annotation\n", "\n", "### 9. xQTL Modifier Score (EMS)\n", "\n", - "Train and apply a per-variant score for prioritizing regulatory variants.\n", + "The xQTL modifier score framework trains a per-variant model for prioritizing regulatory variants.\n", "\n", - "- [EMS training](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_training.html)\n", - "- [EMS prediction](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_prediction.html)\n", + "- [EMS training](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_training.html) \u2014 fit the model using functional annotation features\n", + "- [EMS prediction](https://statfungen.github.io/xqtl-protocol/code/xqtl_modifier_score/ems_prediction.html) \u2014 score new variants\n", "\n", - "### Command Generator (shortcut)\n", + "### 10. Command Generator\n", "\n", - "Want to skip writing SoS commands by hand? The [eQTL analysis command generator](https://statfungen.github.io/xqtl-protocol/code/commands_generator/eQTL_analysis_commands.html) produces the full pipeline from a single configuration file \u2014 great for reproducing a run or sharing a recipe.\n", + "- [eQTL analysis command generator](https://statfungen.github.io/xqtl-protocol/code/commands_generator/eQTL_analysis_commands.html) \u2014 produce full pipeline commands from a single configuration file\n", "\n", "\n", "---\n", "\n", "## Software Environment\n", "\n", - "Every protocol on this site runs inside the pixi environment configured in Steps 1\u20132. Once pixi and SoS are installed, each example \"just works\" \u2014 no per-pipeline container, no manual dependency wrangling.\n", + "Every protocol on this site runs inside the pixi environment configured in Steps 1-2. Once pixi and SoS are installed, each example \"just works\" \u2014 no per-pipeline container, no manual dependency wrangling.\n", "\n", "Need something extra? Install it into the right pixi environment:\n", "\n", @@ -379,7 +371,7 @@ "\n", "**Installer killed on HPC** \u2014 you're on a login node. Request a compute node with \u2265 50 GB memory and re-run.\n", "\n", - "**`sos: command not found`** \u2014 Step 2 didn't complete. Re-run the `pixi global install` command for SoS.\n", + "**`sos: command not found`** \u2014 Step 1 didn't complete. Re-run the `conda install` command for SoS.\n", "\n", "**`ModuleNotFoundError` during a pipeline** \u2014 install the missing package into pixi's python env with the command above.\n", "\n",