Conversation
DanielGarbozo
commented
Nov 18, 2025
- Build the main clustering/analysis function - build_net():
- Reads all .gmt files from a user-specified directory.
- Filters enrichment results by a configurable FDR column and threshold.
- Intersects pathways from the results with gene set names present in the GMT files.
- Computes a Jaccard similarity matrix between the selected gene sets.
- Performs hierarchical clustering using the ward.D2 method.
- Determines the optimal number of clusters using the silhouette index.
- Returns a structured list with: clusters, jaccard_matrix, hclust_obj, optimal_k, and silhouette_data.
- Build the visualization function - view_net():
- Builds an igraph network from the Jaccard similarity matrix returned by build_net().
- Applies an edge_threshold on Jaccard similarity to keep only sufficiently strong connections.
- Maps cluster membership from build_net() output to node groups for visualization.
- Produces a static igraph object and an interactive visNetwork object.
- Optionally saves the interactive visualization as an HTML file when save_html_path is provided.
- Added roxygen2 documentation for both functions (build_net() and view_net()):
- Documented all parameters, returns, and internal logic.
- Included example usage with read.csv for loading enrichment results.
- Notes:
- No library() calls; all external functions are referenced via :: and @importFrom.
- No hard-coded paths, thresholds, or output directories; these are exposed as function arguments.
- No automatic writing of CSVs or plot files; file I/O is delegated to the caller.
- Implemented basic input validation and error handling:
- Checks existence of gmt_path and presence of .gmt files.
- Verifies that results_df includes the columns defined by fdr_col and pathway_col.
- Ensures that filtered pathways overlap with GMT gene set names.
- Handles edge cases with too few gene sets to form at least two clusters.
Add unified GSEA plotting function and resolve documentation issues
Replace names of GSEA functions
There was a problem hiding this comment.
Pull Request Overview
This PR implements gene set enrichment network analysis and visualization capabilities for pathway clustering analysis. The changes introduce two new main functions (build_net() and view_net()) for computing Jaccard similarity-based gene set networks and generating interactive visualizations, while also refactoring existing pathway analysis functions to be more general-purpose.
- Added
build_net()function to compute gene set clusters from GMT files using hierarchical clustering with silhouette-based optimal cluster selection - Added
view_net()function to generate static igraph and interactive visNetwork visualizations from clustering results - Refactored function names from GSEA-specific to general Pathway Analysis (PA) naming convention
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| R/build_net.R | New function implementing GMT parsing, Jaccard similarity computation, hierarchical clustering, and silhouette-based cluster optimization |
| R/view_net.R | New function creating network visualizations from clustering output with configurable edge thresholds and HTML export |
| R/plot_PA.R | Refactored from plot_GSEA with unified interface supporting single/multiple comparison modes and configurable themes |
| R/merge_PA.R | Renamed from merge_GSEA with added globalVariables declaration inside the function |
| R/heatmap_PA.R | Renamed from heatmap_GSEA with added globalVariables declaration inside the function |
| man/*.Rd | Updated documentation files reflecting renamed functions and new network analysis capabilities |
| NAMESPACE | Updated exports and imports to support new clustering and visualization functions |
| DESCRIPTION | Added igraph, visNetwork, and cowplot to Imports; updated RoxygenNote version |
Comments suppressed due to low confidence (2)
R/merge_PA.R:19
utils::globalVariables()should be called at the package level (typically in a separate R file like zzz.R or globals.R), not inside individual functions. When called inside a function, it executes every time the function runs, which is inefficient and not the intended usage pattern.
R/heatmap_PA.R:29utils::globalVariables()should be called at the package level (typically in a separate R file like zzz.R or globals.R), not inside individual functions. When called inside a function, it executes every time the function runs, which is inefficient and not the intended usage pattern.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| axis.ticks.length = grid::unit(params$tick_length, "cm"), | ||
| strip.text = element_text(size = params$strip_text_size), | ||
| panel.spacing = grid::unit(params$panel_spacing_multi, "lines")) + | ||
| geom_hline(yintercept = 0, size = params$hline_size) + expand_limits(y = 0) + |
There was a problem hiding this comment.
The size parameter in geom_hline() has been deprecated in ggplot2 3.4.0 in favor of linewidth. Consider using linewidth = params$hline_size instead to avoid deprecation warnings.
| geom_hline(yintercept = 0, size = params$hline_size) + expand_limits(y = 0) + | |
| geom_hline(yintercept = 0, linewidth = params$hline_size) + expand_limits(y = 0) + |
R/build_net.R
Outdated
| #' @importFrom utils read.delim | ||
| #' @importFrom cluster silhouette | ||
| #' @importFrom dplyr filter pull distinct | ||
| #' @importFrom dplyr .data |
There was a problem hiding this comment.
Both dplyr and rlang export .data, but you're importing it from both packages. This creates redundant imports. You only need to import .data once, preferably from rlang as it's the original source.
| #' @importFrom dplyr .data |
| strip.text = element_text(size = params$strip_text_size), | ||
| panel.spacing = grid::unit(params$panel_spacing_multi, "lines")) + | ||
| geom_hline(yintercept = 0, size = params$hline_size) + expand_limits(y = 0) + | ||
| facet_wrap(~ .data$New_name, ncol = ncol_wrap, |
There was a problem hiding this comment.
The code references a column New_name that is not present in the input data and is not documented in the function parameters. This will cause an error when the else branch (multiple comparisons mode) is executed. You need to either add New_name as a parameter or document that the input data must contain this column.
DESCRIPTION
Outdated
| patchwork, | ||
| cowplot, | ||
| igraph, | ||
| visNetwork |
There was a problem hiding this comment.
The cluster package is missing from the Imports or Suggests section. The function build_net() uses cluster::silhouette() (line 212 in R/build_net.R), but this dependency is not declared in DESCRIPTION. This will cause the package to fail when users try to use the build_net() function.
| visNetwork | |
| visNetwork, | |
| cluster |
| gmt_path, | ||
| fdr_threshold = 0.25, | ||
| fdr_col, | ||
| pathway_col, | ||
| max_k_ratio = 2.5) { |
There was a problem hiding this comment.
The function parameter indentation is inconsistent - parameters on lines 82-86 have excessive indentation (many spaces) compared to the function name on line 81. Standard R style suggests aligning parameters with the opening parenthesis or using consistent 2-4 space indentation.
| gmt_path, | |
| fdr_threshold = 0.25, | |
| fdr_col, | |
| pathway_col, | |
| max_k_ratio = 2.5) { | |
| gmt_path, | |
| fdr_threshold = 0.25, | |
| fdr_col, | |
| pathway_col, | |
| max_k_ratio = 2.5) { |
| scale_y_discrete(position = "right") + facet_grid(Collection ~ ., scales = "free_y", space = "free_y") + | ||
| theme_bw() + labs(x = "NES", y = "") + | ||
| theme(axis.text.y = element_blank(), strip.background = element_rect(fill = "white", color = "black", linewidth = 1), | ||
| axis.ticks.y = element_line(size = params$tick_size), |
There was a problem hiding this comment.
The size parameter in element_line() has been deprecated in ggplot2 3.4.0 in favor of linewidth. Consider using linewidth = params$tick_size instead to avoid deprecation warnings.
| axis.ticks.y = element_line(size = params$tick_size), | |
| axis.ticks.y = element_line(linewidth = params$tick_size), |
| axis.title.y = element_text(size = params$axis_title_size), | ||
| axis.text.x = element_text(size = params$axis_text_size_x), | ||
| axis.text.y = element_text(size = params$axis_text_size_y), | ||
| axis.ticks = element_line(size = params$tick_size), |
There was a problem hiding this comment.
The size parameter in element_line() has been deprecated in ggplot2 3.4.0 in favor of linewidth. Consider using linewidth = params$tick_size instead to avoid deprecation warnings.
| axis.ticks = element_line(size = params$tick_size), | |
| axis.ticks = element_line(linewidth = params$tick_size), |
| save_html_path = NULL) { | ||
| # Basic input checks | ||
| if (!is.list(clustering_output)) { | ||
| stop("'clustering_output' must be a list as returned by 'calculate_geneset_clusters'.") |
There was a problem hiding this comment.
The error message references 'calculate_geneset_clusters' but the actual function name is 'build_net'. This inconsistency will confuse users if the error is triggered.
| stop("'clustering_output' must be a list as returned by 'calculate_geneset_clusters'.") | |
| stop("'clustering_output' must be a list as returned by 'build_net'.") |
| patchwork::plot_layout(ncol = 6, widths = params$panel_widths) | ||
| } else { | ||
| final_plot <- ggplot(data, aes(x = .data[[Comparison]], y = .data[[axis_y]], fill = .data$logFDR)) + | ||
| geom_bar(stat = "identity", color = params$bar_col, size = params$bar_size, |
There was a problem hiding this comment.
The size parameter in geom_bar() has been deprecated in ggplot2 3.4.0 in favor of linewidth for line widths. Consider using linewidth = params$bar_size instead to avoid deprecation warnings.
| geom_bar(stat = "identity", color = params$bar_col, size = params$bar_size, | |
| geom_bar(stat = "identity", color = params$bar_col, linewidth = params$bar_size, |
| edge_threshold = 0.25, | ||
| save_html_path = NULL) { |
There was a problem hiding this comment.
The function parameter indentation is inconsistent - parameters on lines 77-78 have excessive indentation (many spaces) compared to the function name on line 76. Standard R style suggests aligning parameters with the opening parenthesis or using consistent 2-4 space indentation.
| edge_threshold = 0.25, | |
| save_html_path = NULL) { | |
| edge_threshold = 0.25, | |
| save_html_path = NULL) { |