Skip to content

Dev#35

Open
DanielGarbozo wants to merge 8 commits intoBigMindLab:mainfrom
DanielGarbozo:dev
Open

Dev#35
DanielGarbozo wants to merge 8 commits intoBigMindLab:mainfrom
DanielGarbozo:dev

Conversation

@DanielGarbozo
Copy link
Collaborator

  • Build the main clustering/analysis function - build_net():
    • Reads all .gmt files from a user-specified directory.
    • Filters enrichment results by a configurable FDR column and threshold.
    • Intersects pathways from the results with gene set names present in the GMT files.
    • Computes a Jaccard similarity matrix between the selected gene sets.
    • Performs hierarchical clustering using the ward.D2 method.
    • Determines the optimal number of clusters using the silhouette index.
    • Returns a structured list with: clusters, jaccard_matrix, hclust_obj, optimal_k, and silhouette_data.
  • Build the visualization function - view_net():
    • Builds an igraph network from the Jaccard similarity matrix returned by build_net().
    • Applies an edge_threshold on Jaccard similarity to keep only sufficiently strong connections.
    • Maps cluster membership from build_net() output to node groups for visualization.
    • Produces a static igraph object and an interactive visNetwork object.
    • Optionally saves the interactive visualization as an HTML file when save_html_path is provided.
  • Added roxygen2 documentation for both functions (build_net() and view_net()):
    • Documented all parameters, returns, and internal logic.
    • Included example usage with read.csv for loading enrichment results.
  • Notes:
    • No library() calls; all external functions are referenced via :: and @importFrom.
    • No hard-coded paths, thresholds, or output directories; these are exposed as function arguments.
    • No automatic writing of CSVs or plot files; file I/O is delegated to the caller.
  • Implemented basic input validation and error handling:
    • Checks existence of gmt_path and presence of .gmt files.
    • Verifies that results_df includes the columns defined by fdr_col and pathway_col.
    • Ensures that filtered pathways overlap with GMT gene set names.
    • Handles edge cases with too few gene sets to form at least two clusters.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements gene set enrichment network analysis and visualization capabilities for pathway clustering analysis. The changes introduce two new main functions (build_net() and view_net()) for computing Jaccard similarity-based gene set networks and generating interactive visualizations, while also refactoring existing pathway analysis functions to be more general-purpose.

  • Added build_net() function to compute gene set clusters from GMT files using hierarchical clustering with silhouette-based optimal cluster selection
  • Added view_net() function to generate static igraph and interactive visNetwork visualizations from clustering results
  • Refactored function names from GSEA-specific to general Pathway Analysis (PA) naming convention

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
R/build_net.R New function implementing GMT parsing, Jaccard similarity computation, hierarchical clustering, and silhouette-based cluster optimization
R/view_net.R New function creating network visualizations from clustering output with configurable edge thresholds and HTML export
R/plot_PA.R Refactored from plot_GSEA with unified interface supporting single/multiple comparison modes and configurable themes
R/merge_PA.R Renamed from merge_GSEA with added globalVariables declaration inside the function
R/heatmap_PA.R Renamed from heatmap_GSEA with added globalVariables declaration inside the function
man/*.Rd Updated documentation files reflecting renamed functions and new network analysis capabilities
NAMESPACE Updated exports and imports to support new clustering and visualization functions
DESCRIPTION Added igraph, visNetwork, and cowplot to Imports; updated RoxygenNote version
Comments suppressed due to low confidence (2)

R/merge_PA.R:19

  • utils::globalVariables() should be called at the package level (typically in a separate R file like zzz.R or globals.R), not inside individual functions. When called inside a function, it executes every time the function runs, which is inefficient and not the intended usage pattern.
    R/heatmap_PA.R:29
  • utils::globalVariables() should be called at the package level (typically in a separate R file like zzz.R or globals.R), not inside individual functions. When called inside a function, it executes every time the function runs, which is inefficient and not the intended usage pattern.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

axis.ticks.length = grid::unit(params$tick_length, "cm"),
strip.text = element_text(size = params$strip_text_size),
panel.spacing = grid::unit(params$panel_spacing_multi, "lines")) +
geom_hline(yintercept = 0, size = params$hline_size) + expand_limits(y = 0) +
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size parameter in geom_hline() has been deprecated in ggplot2 3.4.0 in favor of linewidth. Consider using linewidth = params$hline_size instead to avoid deprecation warnings.

Suggested change
geom_hline(yintercept = 0, size = params$hline_size) + expand_limits(y = 0) +
geom_hline(yintercept = 0, linewidth = params$hline_size) + expand_limits(y = 0) +

Copilot uses AI. Check for mistakes.
R/build_net.R Outdated
#' @importFrom utils read.delim
#' @importFrom cluster silhouette
#' @importFrom dplyr filter pull distinct
#' @importFrom dplyr .data
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both dplyr and rlang export .data, but you're importing it from both packages. This creates redundant imports. You only need to import .data once, preferably from rlang as it's the original source.

Suggested change
#' @importFrom dplyr .data

Copilot uses AI. Check for mistakes.
strip.text = element_text(size = params$strip_text_size),
panel.spacing = grid::unit(params$panel_spacing_multi, "lines")) +
geom_hline(yintercept = 0, size = params$hline_size) + expand_limits(y = 0) +
facet_wrap(~ .data$New_name, ncol = ncol_wrap,
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code references a column New_name that is not present in the input data and is not documented in the function parameters. This will cause an error when the else branch (multiple comparisons mode) is executed. You need to either add New_name as a parameter or document that the input data must contain this column.

Copilot uses AI. Check for mistakes.
DESCRIPTION Outdated
patchwork,
cowplot,
igraph,
visNetwork
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster package is missing from the Imports or Suggests section. The function build_net() uses cluster::silhouette() (line 212 in R/build_net.R), but this dependency is not declared in DESCRIPTION. This will cause the package to fail when users try to use the build_net() function.

Suggested change
visNetwork
visNetwork,
cluster

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +86
gmt_path,
fdr_threshold = 0.25,
fdr_col,
pathway_col,
max_k_ratio = 2.5) {
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function parameter indentation is inconsistent - parameters on lines 82-86 have excessive indentation (many spaces) compared to the function name on line 81. Standard R style suggests aligning parameters with the opening parenthesis or using consistent 2-4 space indentation.

Suggested change
gmt_path,
fdr_threshold = 0.25,
fdr_col,
pathway_col,
max_k_ratio = 2.5) {
gmt_path,
fdr_threshold = 0.25,
fdr_col,
pathway_col,
max_k_ratio = 2.5) {

Copilot uses AI. Check for mistakes.
scale_y_discrete(position = "right") + facet_grid(Collection ~ ., scales = "free_y", space = "free_y") +
theme_bw() + labs(x = "NES", y = "") +
theme(axis.text.y = element_blank(), strip.background = element_rect(fill = "white", color = "black", linewidth = 1),
axis.ticks.y = element_line(size = params$tick_size),
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size parameter in element_line() has been deprecated in ggplot2 3.4.0 in favor of linewidth. Consider using linewidth = params$tick_size instead to avoid deprecation warnings.

Suggested change
axis.ticks.y = element_line(size = params$tick_size),
axis.ticks.y = element_line(linewidth = params$tick_size),

Copilot uses AI. Check for mistakes.
axis.title.y = element_text(size = params$axis_title_size),
axis.text.x = element_text(size = params$axis_text_size_x),
axis.text.y = element_text(size = params$axis_text_size_y),
axis.ticks = element_line(size = params$tick_size),
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size parameter in element_line() has been deprecated in ggplot2 3.4.0 in favor of linewidth. Consider using linewidth = params$tick_size instead to avoid deprecation warnings.

Suggested change
axis.ticks = element_line(size = params$tick_size),
axis.ticks = element_line(linewidth = params$tick_size),

Copilot uses AI. Check for mistakes.
save_html_path = NULL) {
# Basic input checks
if (!is.list(clustering_output)) {
stop("'clustering_output' must be a list as returned by 'calculate_geneset_clusters'.")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message references 'calculate_geneset_clusters' but the actual function name is 'build_net'. This inconsistency will confuse users if the error is triggered.

Suggested change
stop("'clustering_output' must be a list as returned by 'calculate_geneset_clusters'.")
stop("'clustering_output' must be a list as returned by 'build_net'.")

Copilot uses AI. Check for mistakes.
patchwork::plot_layout(ncol = 6, widths = params$panel_widths)
} else {
final_plot <- ggplot(data, aes(x = .data[[Comparison]], y = .data[[axis_y]], fill = .data$logFDR)) +
geom_bar(stat = "identity", color = params$bar_col, size = params$bar_size,
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size parameter in geom_bar() has been deprecated in ggplot2 3.4.0 in favor of linewidth for line widths. Consider using linewidth = params$bar_size instead to avoid deprecation warnings.

Suggested change
geom_bar(stat = "identity", color = params$bar_col, size = params$bar_size,
geom_bar(stat = "identity", color = params$bar_col, linewidth = params$bar_size,

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +78
edge_threshold = 0.25,
save_html_path = NULL) {
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function parameter indentation is inconsistent - parameters on lines 77-78 have excessive indentation (many spaces) compared to the function name on line 76. Standard R style suggests aligning parameters with the opening parenthesis or using consistent 2-4 space indentation.

Suggested change
edge_threshold = 0.25,
save_html_path = NULL) {
edge_threshold = 0.25,
save_html_path = NULL) {

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants