Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Learn how to perform colocalization analysis with step-by-step examples. For det

If you use ColocBoost in your research, please cite:

Cao X, Sun H, Feng R, Mazumder R, Najar CFB, Li YI, de Jager PL, Bennett D, The Alzheimer's Disease Functional Genomics Consortium, Dey KK, Wang G. (2025+). Integrative multi-omics QTL colocalization maps regulatory architecture in aging human brain. bioRxiv. [https://doi.org/](https://doi.org/)
> Cao X, Sun H, Feng R, Mazumder R, Najar CFB, Li YI, de Jager PL, Bennett D, The Alzheimer's Disease Functional Genomics Consortium, Dey KK, Wang G. (2025+). Integrative multi-omics QTL colocalization maps regulatory architecture in aging human brain. bioRxiv. [https://doi.org/](https://doi.org/)


## License
Expand Down
3 changes: 1 addition & 2 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,8 @@ articles:
- Input_Data_Format
- Individual_Level_Colocalization
- Summary_Level_Colocalization
- ColocBoost_tutorial_basic
- Disease_Prioritized_Colocalization
- ColocBoost_tutorial_advance
- ColocBoost_tutorial_GTEx
- ColocBoost_tutorial_strong_colocalization
- ColocBoost_tutorial_diagnostic

Expand Down
Binary file modified data/Heterogeneous_Effect.rda
Binary file not shown.
Binary file modified data/Ind_5traits.rda
Binary file not shown.
Binary file modified data/Non_Causal_Strongest_Marginal.rda
Binary file not shown.
Binary file modified data/Sumstat_5traits.rda
Binary file not shown.
Binary file modified data/Weaker_GWAS_Effect.rda
Binary file not shown.
29 changes: 0 additions & 29 deletions vignettes/ColocBoost_tutorial_GTEx.Rmd

This file was deleted.

108 changes: 0 additions & 108 deletions vignettes/ColocBoost_tutorial_basic.Rmd

This file was deleted.

98 changes: 98 additions & 0 deletions vignettes/Disease_Prioritized_Colocalization.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
title: "Mixed Data-type and Disease Prioritized Colocalization"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Mixed Data-type and Disease Prioritized Colocalization}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```


This vignette demonstrates how to perform multi-trait colocalization analysis using a mixed-type dataset, including both individual level data and summary statistics.
ColocBoost provides a flexible framework to integrate data both at the individual level or at the summary statistic level,
allowing to handle scenarios where the individual data is available for some traits (like xQTLs) and the summary data is available for other traits (disease/trait GWAS).


```{r setup}
library(colocboost)
```

# 1. Loading and Analyzing Datasets

To get started, load both Ind_5traits and Sumstat_5traits datasets into your R session. Once loaded, create a mixed dataset as follows:

- For traits 1, 2, 3, 4: use individual-level gentype and phenotype data.
- For trait 5: use summary statistics data.
- Note that `LD` could be calculated from the `X` data in the `Ind_5traits` dataset, but it is not included in the `Sumstat_5traits` dataset.

### Causal variant structure
The dataset features two causal variants with indices 644 and 2289.

- Causal variant 644 is associated with traits 1, 2, 3, and 4.
- Causal variant 2289 is associated with traits 2, 3, and 5 (summary level data).

```{r load-mixed-data}
# Load example data
data(Ind_5traits)
data(Sumstat_5traits)

# Create a mixed dataset
X <- Ind_5traits$X[1:4]
Y <- Ind_5traits$Y[1:4]
sumstat <- Sumstat_5traits$sumstat[5]
LD <- get_cormat(Ind_5traits$X[[1]])
```

For analyze the specific one type of data, you can refer to the following
tutorals [Individual Level Data Colocalization](https://statfungen.github.io/colocboost/articles/Individual_Level_Colocalization.html) and
[Summary Level Data Colocalization](https://statfungen.github.io/colocboost/articles/Summary_Level_Colocalization.html).


# 2. Run ColocBoost (Basic usage)


The preferred format for colocalization analysis in ColocBoost using mixed-type dataset:

- **Individual level data**: `X` and `Y` are organized as lists, matched by trait index,
- `(X[1], Y[1])` contains individuals for trait 1,
- `(X[2], Y[2])` contains individuals for trait 2,
- And so on for each trait under analysis.

- **Summary level data**:
- `sumstat` is organized as a list of data.frames for all traits
- `LD` is a matrix of linkage disequilibrium (LD) information for all variants across all traits.

This function requires specifying genotypes `X` and phenotypes `Y` from the individual-level dataset and summary statistics `sumstat` and LD matrix `LD` from summary dataset:


```{r mixd-basic}
# Run colocboost
res <- colocboost(X = X, Y = Y, sumstat = sumstat, LD = LD)

# Identified CoS
res$cos_details$cos$cos_index
```

### Results Interpretation

For comprehensive tutorials on result interpretation and advanced visualization techniques, please visit our documentation portal at FIXME (link).



# 3. Run ColocBoost (Disease Prioritized Colocalization)


```{r disease-basic}
# Run colocboost
res <- colocboost(X = X, Y = Y, sumstat = sumstat, LD = LD, focal_outcome_idx = 5)

# Identified CoS
res$cos_details$cos$cos_index
```
1 change: 1 addition & 0 deletions vignettes/Individual_Level_Colocalization.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ The preferred format for colocalization analysis in ColocBoost using individual
- This is particularly useful when you have a large dataset with many traits and want to focus on specific individuals for each trait.


This function requires specifying genotypes `X` and phenotypes `Y` from the dataset:
```{r multiple-matched}
# Extract genotype (X) and phenotype (Y) data
X <- Ind_5traits$X
Expand Down
2 changes: 2 additions & 0 deletions vignettes/Summary_Level_Colocalization.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ and the summary statistics are organized in a list. The **Basic format** us
- `sumstat` is organized as a list of data.frames for all traits
- `LD` is a matrix of linkage disequilibrium (LD) information for all variants across all traits.


This function requires specifying summary statistics `sumstat` and LD matrix `LD` from the dataset:
```{r one-LD}
# Extract genotype (X) and calculate LD matrix
data("Ind_5traits")
Expand Down
5 changes: 3 additions & 2 deletions vignettes/announcements.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ vignette: >

---

## **Initial release in ColocBoost**
## Initial release in ColocBoost

We are excited to release ColocBoost, where it is now the default version for new installs.

We are excited to release ColocBoost (FIXME version), where it is now the default version for new installs.