FitMut2 is an algorithm developed for identifying adaptive mutations that established in barcoded evolution experiments, and inferring their mutational parameters (fitness effect and establishment time). It is preceded by FitMut1, which was developed in S. F. Levy, et al. Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature, 519(7542): 181-186 (2015) and originally implemented in Mathematica. In this repository we have reimplemented FitMut1 in Python and additionally adapted it for higher accuracy in situations with lower sequencing coverage. If you use this software, please reference our preprint. (Codes and results for this paper are store in a shared folder in Google Drive here)
FitMut2 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This repository has two main scripts (aside from the implementation of FitMut1):
fitmutsimu_run.pysimulates the experimental process of a barcode-sequencing (bar-seq) evolution experiment. This can be used to test the inference algorithm on simulated data where the ground truth is known.fitmut2_run.pyidentifies adaptive mutations (as well as inferring their fitness effects and establishment times) that established in bar-seq evolution experiments from read count time series data.
A walk-through is included as the jupyter notebook here.
- Python 3 is required. This version has been tested on a MacBook Pro (Apple M1 Chip, 8 GB Memory), with Python 3.8.5.
- Clone this repository by running
git clone https://github.com/FangfeiLi05/FitMut2.gitin terminal. cdto the root directory of the project (the folder containingREADME.md).- Install dependencies by running
pip install -r requirements.txtin terminal.
fitmutsimu_run.py simulates the entire experimental process of barcode-sequencing (bar-seq) evolution experiment with serial dilution of a barcoded cell population. This simulation models all sources of noise, including growth noise, noise from cell transfers, DNA extraction, PCR, and sequencing, as Poisson randomness with the appropriate multiplicative factor.
--lineage_numberor-l: number of lineages to simulate. Each lineage begins the evolution experiment with an average size of 100 cells, where the spread is determined by variability in the pregrowth phase.--t_seqor-t: a .csv file, with- 1st column: sequenced time points measured in number of generations
- 2nd+ columns: average number of reads per barcode for each sequenced time point (accepts multiple columns for multiple sequencing replicates with e.g. variable coverage)
--mutation_fitnessor-s: a .csv file, with- 1st column: total beneficial mutation rate, Ub
- 2nd column: bin edges of the arbitrary DFE
- 3rd column: normalized counts in each bin of the 2nd column
--maximum_mutation_numberor-max_mut_num: maximum number of mutations allowed in each single cell (default: 1)--t_pregrowthor-t_pre: number of generations in pre-growth (default: 16)--cell_num_average_bottleneckor-n_b: average number of cells per barcode transferred at each bottleneck (default: 100)--cor-c: half of variance introduced by cell growth and cell transfer (default: 1)--dna_copiesor-d: average genome template copies per barcode in PCR (default: 500)--pcr_cyclesor-p: number of cycles in PCR (default: 25)--output_filenameor-o: prefix of output files (default: output)
simu_output_EvoSimulation_Read_Number.csv: read number per barcode for each time pointsimu_output_EvoSimulation_Mutation_Info.csv: information of adaptive mutations that establishedsimu_output_EvoSimulation_Other_Info.csv: a record of some inputs (also fraction of mutant cells of the population)simu_output_EvoSimulation_Bottleneck_Cell_Number.csv: bottleneck cell number per barcode for each time pointsimu_output_EvoSimulation_Bottleneck_Cell_Number_Neutral.csv: bottleneck neutral cell number per barcode for each time pointsimu_output_EvoSimulation_Saturated_Cell_Number.csv: saturated cell number per barcode for each time pointsimu_output_EvoSimulation_Saturated_Cell_Number_Neutral.csv: saturated neutral cell number per barcode for each time point
python fitmutsimu_run.py --help
python fitmutsimu_run.py -l 10000 -t simu_input_time_points.csv -s simu_input_mutation_fitness.csv -o test
fitmut2_run.py identifies adaptive mutations in barcoded evolution experiments from read-count time series data, and estimates their fitness effects and establishment times.
--inputor-i: a .csv file, with each column being the read number per barcode at each sequenced time point--t_seqor-t: a .csv file, with- 1st column: sequenced time points evaluated in number of generations
- 2nd column: number of cells transferred at each sequenced time point, multiplied by the time (in generations) between time points. This is what we call effective cell number.
--mutation_rateor-u: total beneficial mutation rate per generation per cell (default chosen from expectation in S. cerevisiae). This choice affects the prior distribution, and using the default value in most cases should be fine. (default: 1e-5)--delta_tor-dt: number of generations between bottlenecks. This is approximately given by the logarithm (base 2) of the dilution factor between transfers. (default: 8)--cor-c: half of variance introduced by cell growth and cell transfer. In most cases the default value should suffice, unless the experimental value is measureable. (default: 1)--maximum_iteration_numberor-n: maximum number of iterations in the self consistent estimation of mean fitness and lineage fitnesses (default: 50)--opt_algorithmor-a: optimization algorithm (direct search, Nelder-Mead or differential evolution) (default: direct_search)--parallelizeor-p: whether to use Python multiprocess module to parallelize inference across lineages (default: True)--save_stepsor-s: whether to save the data files after each iteration of inference (default: False)--output_filenameor-o: prefix of output files (default: output)
output_MutSeq_Result.csv: a .csv file, with- 1st column of .csv: estimated fitness effect of each lineage
- 2nd column of .csv: estimated establishment time of each lineage
- 3rd column of .csv: uncertainty in fitness effect
- 4th column of .csv: uncertainty in establishment time
- 5th column of .csv: probability of each lineage containing an adaptive mutation
- 6th column of .csv: estimated mean fitness per sequenced time point
- 7th column of .csv: estimated kappa (noise parameter, see preprint for definition) per sequenced time point
- 8th column of .csv: estimated fraction of mutant cells of the population per sequenced time point
output_Mean_fitness_Result.csv: estimated mean fitness at each iterationoutput_Cell_Number_Mutant_Estimated.csv: estimated effective number of mutant cells per barcode for each time pointoutput_Cell_Number.csv: effective number of cells per barcode for each time point
python fitmut2_run.py --help
python fitmut2_run.py -i simu_test_EvoSimulation_Read_Number.csv -t fitmut_input_time_points.csv -o test