StatFunGen · xueweic · Apr 17, 2025 · Apr 17, 2025 · Apr 17, 2025 · Apr 17, 2025
diff --git a/README.md b/README.md
@@ -35,7 +35,7 @@ Learn how to perform colocalization analysis with step-by-step examples. For det
 
 If you use ColocBoost in your research, please cite:
 
-Cao X, Sun H, Feng R, Mazumder R, Najar CFB, Li YI, de Jager PL, Bennett D, The Alzheimer's Disease Functional Genomics Consortium, Dey KK, Wang G. (2025+). Integrative multi-omics QTL colocalization maps regulatory architecture in aging human brain. bioRxiv. [https://doi.org/](https://doi.org/)
+> Cao X, Sun H, Feng R, Mazumder R, Najar CFB, Li YI, de Jager PL, Bennett D, The Alzheimer's Disease Functional Genomics Consortium, Dey KK, Wang G. (2025+). Integrative multi-omics QTL colocalization maps regulatory architecture in aging human brain. bioRxiv. [https://doi.org/](https://doi.org/)
 
 
 ## License

diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -25,9 +25,8 @@ articles:
     - Input_Data_Format
     - Individual_Level_Colocalization
     - Summary_Level_Colocalization
-    - ColocBoost_tutorial_basic
+    - Disease_Prioritized_Colocalization
     - ColocBoost_tutorial_advance
-    - ColocBoost_tutorial_GTEx
     - ColocBoost_tutorial_strong_colocalization
     - ColocBoost_tutorial_diagnostic
 

diff --git a/data/Heterogeneous_Effect.rda b/data/Heterogeneous_Effect.rda
diff --git a/data/Ind_5traits.rda b/data/Ind_5traits.rda
diff --git a/data/Non_Causal_Strongest_Marginal.rda b/data/Non_Causal_Strongest_Marginal.rda
diff --git a/data/Sumstat_5traits.rda b/data/Sumstat_5traits.rda
diff --git a/data/Weaker_GWAS_Effect.rda b/data/Weaker_GWAS_Effect.rda
diff --git a/vignettes/ColocBoost_tutorial_GTEx.Rmd b/vignettes/ColocBoost_tutorial_GTEx.Rmd
diff --git a/vignettes/ColocBoost_tutorial_basic.Rmd b/vignettes/ColocBoost_tutorial_basic.Rmd
diff --git a/vignettes/Disease_Prioritized_Colocalization.Rmd b/vignettes/Disease_Prioritized_Colocalization.Rmd
@@ -0,0 +1,98 @@
+---
+title: "Mixed Data-type and Disease Prioritized Colocalization"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Mixed Data-type and Disease Prioritized Colocalization}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+
+This vignette demonstrates how to perform multi-trait colocalization analysis using a mixed-type dataset, including both individual level data and summary statistics. 
+ColocBoost provides a flexible framework to integrate data both at the individual level or at the summary statistic level, 
+allowing to handle scenarios where the individual data is available for some traits (like xQTLs) and the summary data is available for other traits (disease/trait GWAS).
+
+
+```{r setup}
+library(colocboost)
+```
+
+# 1. Loading and Analyzing Datasets
+
+To get started, load both Ind_5traits and Sumstat_5traits datasets into your R session. Once loaded, create a mixed dataset as follows:
+
+- For traits 1, 2, 3, 4: use individual-level gentype and phenotype data.
+- For trait 5: use summary statistics data.
+- Note that `LD` could be calculated from the `X` data in the `Ind_5traits` dataset, but it is not included in the `Sumstat_5traits` dataset.
+
+### Causal variant structure
+The dataset features two causal variants with indices 644 and 2289.
+
+- Causal variant 644 is associated with traits 1, 2, 3, and 4.
+- Causal variant 2289 is associated with traits 2, 3, and 5 (summary level data).
+
+```{r load-mixed-data}
+# Load example data
+data(Ind_5traits)
+data(Sumstat_5traits) 
+
+# Create a mixed dataset
+X <- Ind_5traits$X[1:4]
+Y <- Ind_5traits$Y[1:4]
+sumstat <- Sumstat_5traits$sumstat[5]
+LD <- get_cormat(Ind_5traits$X[[1]])
+```
+
+For analyze the specific one type of data, you can refer to the following 
+tutorals [Individual Level Data Colocalization](https://statfungen.github.io/colocboost/articles/Individual_Level_Colocalization.html) and 
+[Summary Level Data Colocalization](https://statfungen.github.io/colocboost/articles/Summary_Level_Colocalization.html).
+
+
+# 2. Run ColocBoost (Basic usage)
+
+
+The preferred format for colocalization analysis in ColocBoost using mixed-type dataset:
+
+- **Individual level data**: `X` and `Y` are organized as lists, matched by trait index,
+    - `(X[1], Y[1])` contains individuals for trait 1,
+    - `(X[2], Y[2])` contains individuals for trait 2,
+    - And so on for each trait under analysis.
+
+- **Summary level data**:
+    - `sumstat` is organized as a list of data.frames for all traits
+    - `LD` is a matrix of linkage disequilibrium (LD) information for all variants across all traits.
+
+This function requires specifying genotypes `X` and phenotypes `Y` from the individual-level dataset and summary statistics `sumstat` and LD matrix `LD` from summary dataset:
+
+
+```{r mixd-basic}
+# Run colocboost
+res <- colocboost(X = X, Y = Y, sumstat = sumstat, LD = LD)
+
+# Identified CoS
+res$cos_details$cos$cos_index
+```
+
+### Results Interpretation
+
+For comprehensive tutorials on result interpretation and advanced visualization techniques, please visit our documentation portal at FIXME (link).
+
+
+
+# 3. Run ColocBoost (Disease Prioritized Colocalization)
+
+
+```{r disease-basic}
+# Run colocboost
+res <- colocboost(X = X, Y = Y, sumstat = sumstat, LD = LD, focal_outcome_idx = 5)
+
+# Identified CoS
+res$cos_details$cos$cos_index
+```
diff --git a/vignettes/Individual_Level_Colocalization.Rmd b/vignettes/Individual_Level_Colocalization.Rmd
@@ -64,6 +64,7 @@ The preferred format for colocalization analysis in ColocBoost using individual
     - This is particularly useful when you have a large dataset with many traits and want to focus on specific individuals for each trait.
 
 
+This function requires specifying genotypes `X` and phenotypes `Y` from the dataset:
 ```{r multiple-matched}
 # Extract genotype (X) and phenotype (Y) data
 X <- Ind_5traits$X

diff --git a/vignettes/Summary_Level_Colocalization.Rmd b/vignettes/Summary_Level_Colocalization.Rmd
@@ -69,6 +69,8 @@ and the summary statistics are organized in a list. The **Basic format** us
 - `sumstat` is organized as a list of data.frames for all traits
 - `LD` is a matrix of linkage disequilibrium (LD) information for all variants across all traits.
 
+
+This function requires specifying summary statistics `sumstat` and LD matrix `LD` from the dataset:
 ```{r one-LD}
 # Extract genotype (X) and calculate LD matrix
 data("Ind_5traits")

diff --git a/vignettes/announcements.Rmd b/vignettes/announcements.Rmd
@@ -8,7 +8,8 @@ vignette: >
 
 ---
 
-## **Initial release in ColocBoost**
+## Initial release in ColocBoost
 
-We are excited to release ColocBoost, where it is now the default version for new installs. 
+
+We are excited to release ColocBoost (FIXME version), where it is now the default version for new installs.