diff --git a/README.md b/README.md index b3a5a05..6e427d8 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ Learn how to perform colocalization analysis with step-by-step examples. For det If you use ColocBoost in your research, please cite: -Cao X, Sun H, Feng R, Mazumder R, Najar CFB, Li YI, de Jager PL, Bennett D, The Alzheimer's Disease Functional Genomics Consortium, Dey KK, Wang G. (2025+). Integrative multi-omics QTL colocalization maps regulatory architecture in aging human brain. bioRxiv. [https://doi.org/](https://doi.org/) +> Cao X, Sun H, Feng R, Mazumder R, Najar CFB, Li YI, de Jager PL, Bennett D, The Alzheimer's Disease Functional Genomics Consortium, Dey KK, Wang G. (2025+). Integrative multi-omics QTL colocalization maps regulatory architecture in aging human brain. bioRxiv. [https://doi.org/](https://doi.org/) ## License diff --git a/_pkgdown.yml b/_pkgdown.yml index df7b38a..b4618f4 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -25,9 +25,8 @@ articles: - Input_Data_Format - Individual_Level_Colocalization - Summary_Level_Colocalization - - ColocBoost_tutorial_basic + - Disease_Prioritized_Colocalization - ColocBoost_tutorial_advance - - ColocBoost_tutorial_GTEx - ColocBoost_tutorial_strong_colocalization - ColocBoost_tutorial_diagnostic diff --git a/data/Heterogeneous_Effect.rda b/data/Heterogeneous_Effect.rda index 00e2b1d..4bab8bb 100644 Binary files a/data/Heterogeneous_Effect.rda and b/data/Heterogeneous_Effect.rda differ diff --git a/data/Ind_5traits.rda b/data/Ind_5traits.rda index 58da259..c2e6117 100644 Binary files a/data/Ind_5traits.rda and b/data/Ind_5traits.rda differ diff --git a/data/Non_Causal_Strongest_Marginal.rda b/data/Non_Causal_Strongest_Marginal.rda index 06756e8..1ca17c0 100644 Binary files a/data/Non_Causal_Strongest_Marginal.rda and b/data/Non_Causal_Strongest_Marginal.rda differ diff --git a/data/Sumstat_5traits.rda b/data/Sumstat_5traits.rda index d61cdde..45adba2 100644 Binary files a/data/Sumstat_5traits.rda and b/data/Sumstat_5traits.rda differ diff --git a/data/Weaker_GWAS_Effect.rda b/data/Weaker_GWAS_Effect.rda index 98b4f46..8330d4a 100644 Binary files a/data/Weaker_GWAS_Effect.rda and b/data/Weaker_GWAS_Effect.rda differ diff --git a/vignettes/ColocBoost_tutorial_GTEx.Rmd b/vignettes/ColocBoost_tutorial_GTEx.Rmd deleted file mode 100644 index bbfec9e..0000000 --- a/vignettes/ColocBoost_tutorial_GTEx.Rmd +++ /dev/null @@ -1,29 +0,0 @@ ---- -title: "ColocBoost Tutortial (GTEx tissues)" -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{ColocBoost Tutortial (GTEx tissues)} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>" -) -``` - -ColocBoost: Multi-omics xQTL colocalization improves the discovery of causal variants for complex diseases. - - - -```{r setup} -library(colocboost) -``` - - -## GTEx data preparations - - - diff --git a/vignettes/ColocBoost_tutorial_basic.Rmd b/vignettes/ColocBoost_tutorial_basic.Rmd deleted file mode 100644 index 47a09be..0000000 --- a/vignettes/ColocBoost_tutorial_basic.Rmd +++ /dev/null @@ -1,108 +0,0 @@ ---- -title: "ColocBoost Tutorial (Basic Usage)" -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{ColocBoost Tutorial (Basic Usage)} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>" -) -``` - -This tutorial will guide you through using `ColocBoost` with individual-level data, summary statistics data, or the combination of individual-level data and summary statistics. - - -```{r setup} -library(colocboost) -``` - -## Individual-level data only - -This tutorial demonstrates how to analyze individual-level data using `colocboost` package, specifically with the `Ind_5traits` dataset. Detailed information about the `Ind_5traits` dataset, which includes 5 simulated phenotypes alongside corresponding genotype matrices, is avaiable at `url`. The dataset is specifically designed to facilitate the identification of causal variants for complex traits. - -**Loading and Analyzing Data** - -To get started, load the `Ind_5traits` dataset into your R session. Once loaded, you can proceed with the analysis using the `colocboost` function. This function requires specifying genotypes `X` and phenotypes `Y` from the dataset: -```{r individual} -data("Ind_5traits") -res <- colocboost(X = Ind_5traits$X, Y = Ind_5traits$Y) -``` -This command initiates the colocalization analysis, applying the ColocBoost methodology to identify potential genetic intersections between phenotypes and their respective genotypes. - -**Results Exploration** - -After running the analysis, you can explore the results to identify colocalized variants and review the summary statistics. This output will provide insights into which variants are colocalized across the different phenotypes and offer a comprehensive overview of the statistical results from the colocalization analysis. - -```{r indResults} -res$cos_details$cos$cos_index - -res$cos_summary - -``` - -## Summary statistics only - -This tutorial demonstrates how to analyze summary statistics data using `colocboost` package, specifically with the `Sumstat_5traits` dataset. Detailed information about the `Sumstat_5traits` dataset, which includes the summary data for 5 simulated summary statistics and one LD matrix, where the summary data is directly caluculated using the marginal association from `Ind_5traits` data, is avaiable at `url`. This dataset is designed to facilitate the identification of causal variants for complex traits using summary statistics. - - -**Loading and Analyzing Data** -To get started, load the `Sumstat_5traits` dataset into your R session. Note: The `Sumstat_5traits` dataset includes only one LD matrix that applies to all traits. To demonstrate handling multiple traits, we replicate this single LD matrix for each trait as follows. To analyze the data using summary statistics, apply the colocboost function specifying the summary statistics and LD matrices. -```{r sumstat} -data("Sumstat_5traits") -data("Ind_5traits") -LD <- get_cormat(Ind_5traits$X[[1]]) -res <- colocboost(sumstat = Sumstat_5traits$sumstat, LD = LD) -``` -*Note*: This step duplicates the single LD matrix into a list of five matrices, one for each trait. This is to mimic scenarios where different traits might have different LD structures. ColocBoost allows for the input of a single LD matrix if the LD across traits is consistent. For more advanced usage involving different LD matrices or more complex setups, please refer to the advanced tutorial (URL). - -**Results Exploration** (Consistent results obtained from individual-level) - -After running the analysis, you can explore the results to identify colocalized variants and review the summary statistics. This output will provide insights into which variants are colocalized across the different phenotypes and offer a comprehensive overview of the statistical results from the colocalization analysis. - -```{r sumstatResults} -res$cos_details$cos$cos_index - -res$cos_summary -``` -This section of the analysis provides insights into which variants are colocalized across the different phenotypes and offers a comprehensive overview of the statistical results from the colocalization analysis. - - - -## Mixture usage of individual-level data and summary statistics - -This tutorial provides a step-by-step guide on using both individual-level data and summary statistics within the `colocboost` package to perform multi-trait colocalization analysis. This approach is especially beneficial when comprehensive individual-level genotype and phenotype data is not available for all traits. - -**Loading and Analyzing Datasets** - -To get started, load both `Ind_5traits` and `Sumstat_5traits` datasets into your R session. Once loaded, we want to create a mixture usage datasets. For example, for traits 1,2,3, we use individual-level genotype and phenotype data; for traits 4 and 5, we use summary statistics and duplicated LD matrices. -```{r mixture} -data("Ind_5traits") -data("Sumstat_5traits") -X <- Ind_5traits$X[1:3] -Y <- Ind_5traits$Y[1:3] -sumstat <- Sumstat_5traits$sumstat[4:5] -LD <- get_cormat(Ind_5traits$X[[1]]) -``` -*Note*: This step duplicates the single LD matrix into a list of two matrices, one for each trait. This is to mimic scenarios where different traits might have different LD structures. ColocBoost allows for the input of a single LD matrix if the LD across traits is consistent. For more advanced usage involving different LD matrices or more complex setups, please refer to the advanced tutorial (URL). - -Once loaded, you can proceed with the analysis using the `colocboost` function. This function requires specifying genotypes `X` and phenotypes `Y` from the individual-level dataset and summary statistics `sumstat` and LD matrices `LD` from summary dataset: -```{r mixRun} -res <- colocboost(X = X, Y = Y, sumstat = sumstat, LD = LD) -``` - -**Results Exploration** (Consistent results obtained from both individual-level only and summary statistics only) - -After running the analysis, you can explore the results to identify colocalized variants and review the summary statistics. This output will provide insights into which variants are colocalized across the different phenotypes and offer a comprehensive overview of the statistical results from the colocalization analysis. -```{r mixResults} -res$cos_details$cos$cos_index - -res$cos_summary -``` -This section of the analysis provides insights into which variants are colocalized across the different phenotypes and offers a comprehensive overview of the statistical results from the colocalization analysis. - - diff --git a/vignettes/Disease_Prioritized_Colocalization.Rmd b/vignettes/Disease_Prioritized_Colocalization.Rmd new file mode 100644 index 0000000..ed4469f --- /dev/null +++ b/vignettes/Disease_Prioritized_Colocalization.Rmd @@ -0,0 +1,98 @@ +--- +title: "Mixed Data-type and Disease Prioritized Colocalization" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Mixed Data-type and Disease Prioritized Colocalization} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + + +This vignette demonstrates how to perform multi-trait colocalization analysis using a mixed-type dataset, including both individual level data and summary statistics. +ColocBoost provides a flexible framework to integrate data both at the individual level or at the summary statistic level, +allowing to handle scenarios where the individual data is available for some traits (like xQTLs) and the summary data is available for other traits (disease/trait GWAS). + + +```{r setup} +library(colocboost) +``` + +# 1. Loading and Analyzing Datasets + +To get started, load both Ind_5traits and Sumstat_5traits datasets into your R session. Once loaded, create a mixed dataset as follows: + +- For traits 1, 2, 3, 4: use individual-level gentype and phenotype data. +- For trait 5: use summary statistics data. +- Note that `LD` could be calculated from the `X` data in the `Ind_5traits` dataset, but it is not included in the `Sumstat_5traits` dataset. + +### Causal variant structure +The dataset features two causal variants with indices 644 and 2289. + +- Causal variant 644 is associated with traits 1, 2, 3, and 4. +- Causal variant 2289 is associated with traits 2, 3, and 5 (summary level data). + +```{r load-mixed-data} +# Load example data +data(Ind_5traits) +data(Sumstat_5traits) + +# Create a mixed dataset +X <- Ind_5traits$X[1:4] +Y <- Ind_5traits$Y[1:4] +sumstat <- Sumstat_5traits$sumstat[5] +LD <- get_cormat(Ind_5traits$X[[1]]) +``` + +For analyze the specific one type of data, you can refer to the following +tutorals [Individual Level Data Colocalization](https://statfungen.github.io/colocboost/articles/Individual_Level_Colocalization.html) and +[Summary Level Data Colocalization](https://statfungen.github.io/colocboost/articles/Summary_Level_Colocalization.html). + + +# 2. Run ColocBoost (Basic usage) + + +The preferred format for colocalization analysis in ColocBoost using mixed-type dataset: + +- **Individual level data**: `X` and `Y` are organized as lists, matched by trait index, + - `(X[1], Y[1])` contains individuals for trait 1, + - `(X[2], Y[2])` contains individuals for trait 2, + - And so on for each trait under analysis. + +- **Summary level data**: + - `sumstat` is organized as a list of data.frames for all traits + - `LD` is a matrix of linkage disequilibrium (LD) information for all variants across all traits. + +This function requires specifying genotypes `X` and phenotypes `Y` from the individual-level dataset and summary statistics `sumstat` and LD matrix `LD` from summary dataset: + + +```{r mixd-basic} +# Run colocboost +res <- colocboost(X = X, Y = Y, sumstat = sumstat, LD = LD) + +# Identified CoS +res$cos_details$cos$cos_index +``` + +### Results Interpretation + +For comprehensive tutorials on result interpretation and advanced visualization techniques, please visit our documentation portal at FIXME (link). + + + +# 3. Run ColocBoost (Disease Prioritized Colocalization) + + +```{r disease-basic} +# Run colocboost +res <- colocboost(X = X, Y = Y, sumstat = sumstat, LD = LD, focal_outcome_idx = 5) + +# Identified CoS +res$cos_details$cos$cos_index +``` \ No newline at end of file diff --git a/vignettes/Individual_Level_Colocalization.Rmd b/vignettes/Individual_Level_Colocalization.Rmd index 467d9d3..cb2d430 100644 --- a/vignettes/Individual_Level_Colocalization.Rmd +++ b/vignettes/Individual_Level_Colocalization.Rmd @@ -64,6 +64,7 @@ The preferred format for colocalization analysis in ColocBoost using individual - This is particularly useful when you have a large dataset with many traits and want to focus on specific individuals for each trait. +This function requires specifying genotypes `X` and phenotypes `Y` from the dataset: ```{r multiple-matched} # Extract genotype (X) and phenotype (Y) data X <- Ind_5traits$X diff --git a/vignettes/Summary_Level_Colocalization.Rmd b/vignettes/Summary_Level_Colocalization.Rmd index 61448ab..59e5769 100644 --- a/vignettes/Summary_Level_Colocalization.Rmd +++ b/vignettes/Summary_Level_Colocalization.Rmd @@ -69,6 +69,8 @@ and the summary statistics are organized in a list. The **Basic format** us - `sumstat` is organized as a list of data.frames for all traits - `LD` is a matrix of linkage disequilibrium (LD) information for all variants across all traits. + +This function requires specifying summary statistics `sumstat` and LD matrix `LD` from the dataset: ```{r one-LD} # Extract genotype (X) and calculate LD matrix data("Ind_5traits") diff --git a/vignettes/announcements.Rmd b/vignettes/announcements.Rmd index e99d349..50ebd7b 100644 --- a/vignettes/announcements.Rmd +++ b/vignettes/announcements.Rmd @@ -8,7 +8,8 @@ vignette: > --- -## **Initial release in ColocBoost** +## Initial release in ColocBoost -We are excited to release ColocBoost, where it is now the default version for new installs. + +We are excited to release ColocBoost (FIXME version), where it is now the default version for new installs.