bito, or "Bayesian Inference of Trees via Optimization", is a Python-interface C++ library for phylogenetic variational inference so that you can express interesting parts of your phylogenetic model in Python/TensorFlow/PyTorch/etc and let bito handle the tree structure and likelihood computations for you.
"Bito" is also the name of a tree native to Africa that produces medicinal oil.
We pronounce "bito" with a long /e/ sound ("bito" rhymes with "burrito").
This library is in an experimental state. This library was formerly known as "libsbn".
- If you are on linux, install gcc >= 7.5, which is standard in Debian Buster and Ubuntu 18.04
- If you are on OS X, use a recent version of Xcode and install command line tools
We suggest using anaconda and the associated conda environment file, which will nicely install relevant dependencies:
conda env create -f environment.yml
conda activate bito
(Very optional) The notebooks require R, IRKernel, rpy2 >=3.1.0, and some R packages such as ggplot and cowplot.
For your first build, do
git submodule update --init --recursivemake
This will install the bito Python module.
You can build and run tests using make test and make fasttest (the latter excludes some slow tests).
Note that make accepts -j flags for multi-core builds: e.g. -j20 will build with 20 jobs.
- (Optional) If you modify the lexer and parser, call
make bison. This assumes that you have installed Bison >= 3.4 (conda install -c conda-forge bison). - (Optional) If you modify the test preparation scripts, call
make prep. This assumes that you have installed ete3 (conda install -c etetoolkit ete3).
The following two papers will explain what this repository is about:
- Zhang & Matsen IV, NeurIPS 2018. Generalizing Tree Probability Estimation via Bayesian Networks; 👉🏽 blog post.
- Zhang & Matsen IV, ICLR 2019. Variational Bayesian Phylogenetic Inference; 👉🏽 blog post.
Our documentation consists of:
- Online documentation
- Derivations in
doc/tex, which explain what's going on in the code.
We welcome your contributions! Please see our detailed contribution guidelines.
- Erick Matsen (@matsen): implementation, design, janitorial duties
- Dave H. Rich (@DaveRich): core developer
- Ognian Milanov (@ognian-): core developer
- Mathieu Fourment (@4ment): implementation of substitution models and likelihoods/gradients, design
- Seong-Hwan Jun (@junseonghwan): generalized pruning design and implementation, implementation of SBN gradients, design
- Hassan Nasif (@hrnasif): hot start for generalized pruning; gradient descent for generalized pruning
- Anna Kooperberg (@annakooperberg): refactoring the subsplit DAG
- Sho Kiami (@shokiami): refactoring the subsplit DAG
- Tanvi Ganapathy (@tanviganapathy): refactoring the subsplit DAG
- Lucy Yang (@lucyyang01): subsplit DAG visualization
- Cheng Zhang (@zcrabbit): concept, design, algorithms
- Christiaan Swanepoel (@christiaanjs): design
- Xiang Ji (@xji3): gradient expertise and node height code
- Marc Suchard (@msuchard): gradient expertise and node height code
- Michael Karcher (@mdkarcher): SBN expertise
- Eric J. Isaac (@EricJIsaac): C++ wisdom
If you are citing this library, please cite the NeurIPS and ICLR papers listed above. We require BEAGLE, so please also cite these papers:
- Jaime Huerta-Cepas: several tree traversal functions are copied from ete3
- Thomas Junier: parts of the parser are copied from newick_utils
- The parser driver is derived from the Bison C++ example
In addition to the packages mentioned above we also employ:
- cxx-prettyprint STL container pretty printing
- Eigen
- fast-cpp-csv-parser
- Progress-CPP progress bar