Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ Authors@R: c(
person(given = "Xuewei", family = "Cao", email = "xc2270@cumc.columbia.edu", role = c("cre", "aut")),
person(given = "Haochen", family = "Sun", email = "hs3393@cumc.columbia.edu", role = "aut"),
person(given = "Ru", family = "Feng", email = "rf2872@cumc.columbia.edu", role = "aut"),
person(given = "Rahul", family = "Mazumder", email = "rahulmaz@mit.edu", role = "aut"),
person(given = "Daniel", family = "Nachun", role = "aut"),
person(given = "Kushal", family = "Dey", email = "deyk@mskcc.org", role = c("aut")),
person(given = "Gao", family = "Wang", email = "gw2411@cumc.columbia.edu", role = c("aut"))
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
YEAR: 2025
COPYRIGHT HOLDER: StatFunGen authors
COPYRIGHT HOLDER: StatFunGen Lab at Columbia University
1 change: 1 addition & 0 deletions R/colocboost_assemble.R
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,7 @@ colocboost_assemble <- function(cb_obj,
if (!is.null(cb_output$ucos_details$ucos)) {
cb_output$vpa <- apply(do.call(cbind, cb_output$ucos_details$ucos_weight), 1, function(w0) 1 - prod(1 - w0))
names(cb_output$vpa) <- data_info$variables
class(cb_output) <- "colocboost"
cb_output$ucos_summary <- get_ucos_summary(cb_output)
} else {
tmp <- list("vpa" = NULL, "ucos_summary" = NULL)
Expand Down
6 changes: 3 additions & 3 deletions R/colocboost_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -244,12 +244,12 @@ colocboost_plot <- function(cb_output, y = "log10p",
x0 <- intersect(args$x, cs)
y1 <- args$y[match(x0, args$x)]
points(x0, y1,
pch = 4, col = adjustcolor(legend_text$col[i.uncoloc], alpha.f = 0.3),
pch = 4, col = adjustcolor(legend_text$col[uncoloc$cos_idx_to_uncoloc[i.uncoloc]], alpha.f = 0.3),
cex = 1.5, lwd = 1.5
)
texts <- c(texts, uncoloc$cos_uncoloc_texts[i.cs])
shape_col <- c(shape_col, adjustcolor(legend_text$col[i.uncoloc], alpha.f = 1))
texts_col <- c(texts_col, adjustcolor(legend_text$col[i.uncoloc], alpha.f = 0.8))
shape_col <- c(shape_col, adjustcolor(legend_text$col[uncoloc$cos_idx_to_uncoloc[i.uncoloc]], alpha.f = 1))
texts_col <- c(texts_col, adjustcolor(legend_text$col[uncoloc$cos_idx_to_uncoloc[i.uncoloc]], alpha.f = 0.8))
}
}
if (length(texts) == 0) {
Expand Down
10 changes: 4 additions & 6 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ template:

navbar:
left:
- text: "Home"
href: index.html
- text: "News"
href: articles/announcements.html
- text: "Installation"
Expand All @@ -24,16 +22,16 @@ articles:
contents:
- Input_Data_Format
- Individual_Level_Colocalization
- Summary_Level_Colocalization
- Summary_Statistics_Colocalization
- Disease_Prioritized_Colocalization
- ColocBoost_tutorial_advance
- ColocBoost_tutorial_strong_colocalization
- ColocBoost_tutorial_diagnostic
- Interpret_ColocBoost_Output
- Visualization_ColocBoost_Output

- title: internal
contents:
- announcements
- installation
- ColocBoost_tutorial_diagnostic

reference:
- title: "Example Data"
Expand Down
Binary file added man/figures/missing_representation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
74 changes: 0 additions & 74 deletions vignettes/ColocBoost_tutorial_advance.Rmd

This file was deleted.

35 changes: 0 additions & 35 deletions vignettes/ColocBoost_tutorial_strong_colocalization.Rmd

This file was deleted.

59 changes: 33 additions & 26 deletions vignettes/Input_Data_Format.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,44 +18,51 @@ knitr::opts_chunk$set(
library(colocboost)
```

## Input Data Format

This vignette documents the required input data formats and provides examples of data included in the package.
This vignette documents the standard input data formats of `colocboost`.

### Individual Level Data
## 1. Individual Level Data

For analyses using individual-level data, the package requires matched X and Y data. Below is the format and an example from the package:
For analyses using individual-level data, the basic format for single trait is as follows:

```{r individual-level-example}
# Load example individual-level data
data(Ind_5traits)
- `X` is an N * P matrix with N individuals and P variants. Including variant names as column names is highly recommended, especially when working with multiple X matrices and Y vectors.
- `Y` is a length N vector containing phenotype values for the same N individuals with X.

# Display the structure
str(Ind_5traits)
```
The input format for multiple traits is similar, but `X` matrix should be a list of matrices, each corresponding to a different trait. `Y` vector should also be a list of vectors.
For example:

- `X = list(X1, X2, X3, X4, X5)` where each `Xi` is a matrix for trait `i` - with the dimension of Ni * Pi, where Ni and Pi do not need to be the same for different traits.
- `Y = list(Y1, Y2, Y3, Y4, Y5)` where each `Yi` is a vector for trait `i` - with Ni individuals.


`colocboost` also offers flexible input options (see detailed usage with different input formats,
refer to [Individual Data Colocalization](https://statfungen.github.io/colocboost/articles/Individual_Level_Colocalization.html).):

#### Format Requirements
- Single X matrix with N * P with Y matrix with N * L for L traits.
- Multiple X matrices and unmatched Y vectors with a mapping dictionary.

- Data should be in a data frame or matrix
- Each row represents an individual
- Columns must include matched genotype (X) and phenotype (Y) data
- Missing values should be coded as NA

### Summary Statistics
## 2. Summary Statistics

For analyses using summary statistics, the package requires a data frame with matched linkage disequilibrium (LD) information:
For analyses using summary statistics, the basic format for single trait is as follows:

- `sumstat` is a data frame with required columns `z` or (`beta`, `sebeta`), and optional columns but highly recommended `n` and `variant`.
```{r summary-stats-example}
# Load example summary statistics data
data(Sumstat_5traits)

# Display the structure
str(Sumstat_5traits)
head(Sumstat_5traits$sumstat[[1]])
```

#### Format Requirements
- `z` or (`beta`, `sebeta`) - required: either z-score or (effect size and standard error)
- `n` - highly recommended: sample size for the summary statistics, it is highly recommendation to provide.
- `variant` - highly recommended: required if sumstat for different outcomes do not have the same number of variables (multiple sumstat and multiple LD).


- `LD` is a matrix of LD. This matrix does not need to contain the exact same variants as in `sumstat`, but the `colnames` and `rownames` of `LD` should include the `variant` names for proper alignment.

The input format for multiple traits is similar, but `sumstat` should be a list of data frames `sumstat = list(sumstat1, sumstat2, sumstat3)`.
The flexibility of input format for multiple traits is as follows (see detailed usage with different input formats,
refer to [Summary Statistics Colocalization](https://statfungen.github.io/colocboost/articles/Summary_Level_Colocalization.html).)::

- Data should be in a data frame
- Each row represents a variant
- Must include effect size, standard error, and sample size information
- LD matrix must be provided separately or calculated from the data
- One LD matrix with a superset of variants in `sumstat` for all traits is allowed.
- Multiple LD matrices, each corresponding to a different trait, are also allowed for the trait-specific LD structure.
- Multiple LD matrices and unmatched `sumstat` data frames with a mapping dictionary are also allowed.
101 changes: 101 additions & 0 deletions vignettes/Interpret_ColocBoost_Output.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
title: "Interpret ColocBoost Output"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Interpret ColocBoost Output}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```


This vignette demonstrates how to interpret the output of ColocBoost, specifically to get the summary of colocalization and filtering our the weak signals.

```{r setup}
library(colocboost)
```

## 1. Look at summary of colocalization

### Causal variant structure
The dataset features two causal variants with indices 644 and 2289.

- Causal variant 644 is associated with traits 1, 2, 3, and 4.
- Causal variant 2289 is associated with traits 2, 3, and 5.

```{r run-colocboost}
# Loading the Dataset
data(Ind_5traits)
# Run colocboost
res <- colocboost(X = Ind_5traits$X, Y = Ind_5traits$Y)
cos_summary <- res$cos_summary
names(cos_summary)
```

The `cos_summary` object contains the colocalization summary for all colocalization events, with each row representing a single colocalization event.
The summary includes the following columns:


- **focal_outcome**: The focal outcome being analyzed, or `FALSE` if no focal outcome exists.
- **colocalized_outcomes**: Traits that are colocalized within the 95% colocalization confidence set (CoS).
- **cos_id**: A unique identifier for each 95% colocalization confidence set (CoS).
- **purity**: The minimum absolute correlation of variants within the 95% colocalization confidence set (CoS).
- **top_variable**: The variable with the highest variant colocalization probability (VCP).
- **top_variable_vcp**: The variant colocalization probability for the top variable.
- **cos_npc**: The normalized probability of colocalization for the 95% confidence set, providing empirical evidence in favor of colocalization over a trait-specific configuration.
- **min_npc_outcome**: The minimum normalized probability among colocalized traits.
- **n_variables**: The number of variables in the 95% colocalization confidence set (CoS).
- **colocalized_index**: The indices of colocalized variables.
- **colocalized_variables**: A list of colocalized variables.
- **colocalized_variables_vcp**: The variant colocalization probabilities for all colocalized variables.


To obtain the summary of colocalization with a specific focus on traits of interest,
you can use the `get_cos_summary`, see detailed usage of this function in [link](https://statfungen.github.io/colocboost/reference/get_cos_summary.html).
This function allows you to filter the colocalization summary based on a particular outcome of interest,
making it easier to interpret the results for specific traits.
For example, if you are interested in the colocalization events involving the traits `Y1` and `Y2`, you can use the following code:


```{r summary-colocboost}
# Get summary table of colocalization
cos_interest_outcome <- get_cos_summary(res, interest_outcome = c("Y1", "Y2"))
```



## 2. Get strong colocalization signals

In `cos_summary`, for each 95% CoS, the `cos_npc` column provides a normalized probability of colocalization and
`min_npc_outcome` column provides the minimum normalized probability among colocalized traits.
Those two metrices are measured as an empirical evidence of colocalization both in CoS-level and in trait-level.
To obtain the best minimal colocalization configuration can be defined by using both `cos_npc` and `npc_outcome`
See detailed usage of this function in [link](https://statfungen.github.io/colocboost/reference/get_strong_colocalization.html).


```{r run-strong-colocalization}
filter_res <- get_strong_colocalization(res, cos_npc_cutoff = 0.5, npc_outcome_cutoff = 0.2)
```

The output from `get_strong_colocalization` is the same as output from `colocboost`, which can be directly used in any post inference and visualization.


## 3. Details of ColocBoost output

The entire colocalization output from `colocboost` is stored in the `colocboost` object, which contains several components:

- `cos_summary`: A summary table for colocalization events.
- `vcp`: The variable colocalized probability for each variable.
- `cos_details`: A object with all information for colocalization results.
- `data_info`: A object with detailed information from input data.
- `model_info`: A object with detailed information for colocboost model.




Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: "Summary Level Data Colocalization"
title: "Summary Statistics Colocalization"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Summary Level Data Colocalization}
%\VignetteIndexEntry{Summary Statistics Colocalization}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
Expand Down
Loading