This repository contains the data and code used to generate results and corresponding figures of the recently released preprint "What makes the effect of protein mutations difficult to predict?".
predictability can be run on a standard computer without extensive hardware configurations.
GPU availability is not necessary, but will greatly speed up the rita.ipynb notebook
executions.
The predictability package is supported for macOS and Linux and tested on
macOS Ventura 13.5.2.
predictability requires python ≥ 3.8. All requirements and the corresponding
versions are listed in the requirements.txt file.
All experiments and processing of results are organized in notebooks, which can be run by
installing the predictability package. A simple demo can be bound under
notebooks/demo.ipynb.
Clone the repository and install with
git clone https://github.com/florisvdf/mutation-predictability.git
cd mutation-predictability
pip install .
The Potts Regressor model of the predictability package makes
use of gremlin_cpp.
To use the Potts Regressor, make sure that gremlin_cpp is installed
and is added to $PATH.
Installation on a typical computer should take no longer than 10 minutes.
Results can be reproduced by simply executing all notebooks under the notebooks
directory. Plots can be generated by executing the notebooks/figures_for_publication
notebook. Different sample assignment to train and test folds can be achieved by
executing the notebooks while changing the variable seed in the second cell.