This repository contains the training/inference code for
.
The implementation focuses on predicting drug-pair synergy vs antagonism across multiple bacterial strains.
Most large datasets, embeddings, and trained checkpoints are intentionally not committed to git (see .gitignore and data/README.md).
Transformer_evo/: main model + training scriptsTransformer_evo/model.py: PyTorch model definitionsTransformer_evo/train_main.py: train/evaluate on predefined splitsTransformer_evo/train_ind_test.py: train on the independent training set and run inference on an external test setTransformer_evo/process_data.py: utilities to build training CSVs from the supplement excel table
analyze_and_label.py: add human-readable label names and compute per-bacteria statisticsdata/README.md: expected input files (not versioned)
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtOptional utilities (Excel export + attribution scripts):
pip install -r requirements-optional.txtPlace your datasets and split CSVs under data/ as described in data/README.md.
If you have the supplement excel table and want to generate train_data.csv / unannotated.csv:
python Transformer_evo/process_data.py \
--train-xlsx data/train_data.xlsx \
--kmer-dir kmer_result \
--out-train-csv data/train_data.csv \
--out-unannotated-csv data/unannotated.csvThe main entrypoint is Transformer_evo/train_main.py. It expects split files like:
data/random_split_train.csv and data/random_split_test.csv.
python Transformer_evo/train_main.py --split random_split --data-aug --epochs 100Outputs:
- confidence scores:
conf_scores_best/ - (other scripts may write checkpoints under
models_best/)
python Transformer_evo/train_ind_test.py --data-aug --epochs 100By default, this script looks for:
data/independent_train.csvdata/cleaned_positive_adjuvant_model_controls_04_25_2022.csv(expects columnactual_strain)ind_data1/drug1_base.npyandind_data2/drug2_base.npy
python analyze_and_label.py \
--input data/independent_train.csv \
--output overall_train_labeled.csv \
--stats-csv bacteria_statistics.csv \
--stats-xlsx bacteria_statistics.xlsxIf you use this code, please cite the paper:
TBD
If you need access to data, pretrained/fine-tuned checkpoints used in our experiments, please contact the repository owner via email.
