Skip to content

Address some of the reviewers' comments#80

Open
TuomasBorman wants to merge 1 commit intodevelfrom
review
Open

Address some of the reviewers' comments#80
TuomasBorman wants to merge 1 commit intodevelfrom
review

Conversation

@TuomasBorman
Copy link
Contributor

No description provided.

Comment on lines +354 to +385
# Computational environments {#sec-computational_environments}

Several computational environments are available for microbiome data science,
each distinguished by its underlying paradigms, user communities, and scope of
functionality. For example, tools such as QIIME 2 [@bolyen2019], Anvi'o
[@eren2021], and Mothur [@schloss2009] emphasize standardized, workflow-oriented
approaches tailored to microbiome analysis. While these platforms provide robust
and well-validated solutions for microbiome data, they are typically designed
around specific data types and predefined pipelines, which can limit flexibility
when integrating heterogeneous data modalities within a unified analytical
framework. In addition, their workflow-centric design may not fully accommodate
the iterative nature of exploratory microbiome research.
Python-based ecosystems built around libraries such as Biopython [@cock2009]
and scikit-bio [@rideout2023] provide flexible, general-purpose toolkits, but
are less comprehensive than Bioconductor. As a result, users are often required
to implement more custom scripting and assemble their own analytical workflows.

Bioconductor provides a comprehensive ecosystem of tools spanning multiple areas
of bioinformatics, extending beyond microbiome research. It facilitates
interoperability across domains by leveraging a shared data infrastructure,
enabling more seamless integration of multi-omics data. Rather than focusing on
predefined workflows, Bioconductor is oriented towards exploratory and
statistical data analysis, with components that can be flexibly combined into
custom workflows. While Python-based ecosystems benefit from rapidly evolving
machine learning and artificial intelligence tools, Bioconductor’s strengths lie
in robust statistical modeling and strong interoperability with other
computational environments. In addition, it is supported by a large, active
community and a mature, well-curated package ecosystem. Although this breadth
and complexity can introduce challenges, such as dependency management and
package compatibility, these are actively addressed through community standards,
coordinated release cycles, and continuous integration practices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reviewer stated that we do not compare Bioconductor to other ecosystems enough. They felt that we imply Bioconductor is superior to others without providing justification.

Comment on lines +394 to 401
isolation. However, this growing complexity introduces additional overhead, as
researchers must track samples and features across multiple tables and manage an
increasing number of data elements. Moreover, input data formats vary across
different methods, leading to technical complications and time lost on data
wrangling. Bioconductor addresses these challenges through standardized data
structures supported by an ecosystem of interoperable methods, allowing the
analyst to invest more time in core analytical tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aims to clarify more the idea of data containers

yet flexible workflows. In particular, the direct interoperability with the
yet flexible workflows, as TreeSE inherits full compatibility with methods
designed for SE. In particular, the direct interoperability with the
widely adopted SE data science ecosystem distinguishes our TreeSE-based
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights that all methods that are using SE can be used with TreeSE.

We could consider highlighting SummarizedExperiment more (instead of TreeSE, we could talk about SE). It would emphasize that this framework is integrated to SE ecosystem

Comment on lines +850 to +869
[@lawrence2013]. Linking data across modalities presents an additional technical
challenge, as multi-omics data are often sparse and only partially matched
across experiments. For example, one study may include 100 samples with
metagenomics sequencing data but only 40 with matched metabolomics. In other
cases, a single sample in one modality may link to two or more samples in
another modality (*e.g.*, due to technical or spatial replication)
[see @husso2023]. These scenarios require flexible sample mappings and the
ability to accommodate heterogeneous data containers across modalities.
The MultiAssayExperiment (MAE) data container [@ramos2017] addresses these
challenges by introducing a formal mapping layer (sampleMap) that links
biological samples to their corresponding measurements in each assay
(@fig-datacontainers b). This design supports one-to-one, one-to-many, and
partially overlapping sample relationships across experiments. Importantly, the
mapping is used during downstream operations: when subsetting an MAE object
(*e.g.*, selecting samples with both microbiome and metabolome data), the
framework automatically resolves the intersection of available samples and
ensures consistent alignment across all assays. Such structured handling of
sample relationships is particularly important for integrative methods, as it
reduces the risk of sample mismatches, a common source of irreproducibility in
ad hoc multi-table analyses. For instance, diagonal integration methods
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aims to clarify how MultiAssayExperiment handles the mapping and benefits of using it

Comment on lines +1260 to +1265
omics data sets [@huber2015; @ramos2017; @amezquita2020]. By leveraging
standardized multi-assay data structures, users can apply a growing
number of integrative methods for multi-omics analysis. This eliminates the
need for manual data wrangling, reduces the risk of errors (*e.g.*, sample
mismatches), and enables the creation of modular, efficient, and reproducible
workflows. Such integrative approaches in microbiome
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights more the benefits of MultiAssayExperiment

Comment on lines +1520 to 1526
@barker2022; @hocquet2024]. Harmonized data structures and access to validated
methods help shape such best practices by enabling transparent benchmarking and
fostering community-driven consensus, while allowing users to flexibly select
and evaluate methods appropriate for their specific data and research questions.
Community formation is actively supported through
online training resources and communication channels. Together with the
interactive applications, these resources provide accessible entry points for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer stated that we could add more on best practices. We cannot list them as they are evolving and depends on research questions etc. But we can clarify how the framework relates to best practices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant