-
Notifications
You must be signed in to change notification settings - Fork 2
1. BioNix Overview
Welcome to BioNix! This is a brief overview of what BioNix is, what it does, and why we are working on this project. For more resources to go through after reading this page, proceed to the onboarding checklist.
In bioinformatics, researchers often have to automate, organize, and execute data analysis tasks. People may assume that the output of data analysis is reproducible, meaning that providing the same input would yield the same results every time since analysis is executed by computers. However, running a data analysis like different operating systems, or different software versions a data analysis was run with can actually cause different results.
To achieve reproducible data analysis, bioinformaticians typically make use of the following in conjunction:
1. Workflow engines: Since data analysis is typically made up of a long set of steps, workflow engines like Snakemake, which represent data analyses as workflows, are often used. As the name implies, it represents different steps in data analysis as components. The system allows organization and coordination of these components, making it easy to manage data analysis workflows.
2. Containers/virtual machines: Virtual machines and containerization platforms like Docker allow software, servers, etc. to be executed on various computing environments (e.g. different operating systems). This is pivotal to reproducibility since the computing environment affects the output of a data analysis workflow. Overall, this makes it easier to share, install and execute bioinformatics tools.
3. Package managers: Various software are utilized throughout the stages of a data analysis workflow; however, using different software versions and different versions of software dependencies result in different workflow outputs. Package managers help manage versions and dependencies of software used to analyze data and provide a central repository of tools to manage installation on a user’s system.
Though current practices try to achieve reproducibility by combining those 3 technologies, actually coordinating them to result in a reproducible workflow is challenging. This is where BioNix comes in - it is a lightweight library built on the Nix package management system. As such, we can think of BioNix consisting of 2 layers - the Nix layer and the actual BioNix layer:
- The Nix Layer: The Nix package management system is mainly responsible for the reproducibility part of BioNix. It provides a consistent computing environment to run a workflow.
- The BioNix Layer: A library storing tools often used in bioinformatics workflows. These tools are defined such that they are easy to use with simple parameters, making it easy to create a workflow from composing functions of various tools together.
Since how BioNix works requires a rather technical explanation, proceed to "2. How BioNix Works" for more information.
| Link | Description |
|---|---|
| BioNix Journal Article | Journal article with in-depth explanation about what BioNix does and how it works. |
| Reprohackathons: promoting reproducibility in bioinformatics through training | Journal article explaining why reproducibility in bioinformatics is important. |
| A workflow reproducibility scale for automatic validation of biological interpretation results | Journal article explaining importance of reproducibility of data analysis workflows, and things being done to achieve it. |