Drug Combination Discovery

This repository contains the training/inference code for .

The implementation focuses on predicting drug-pair synergy vs antagonism across multiple bacterial strains.

Most large datasets, embeddings, and trained checkpoints are intentionally not committed to git (see .gitignore and data/README.md).

Model Overview

Repository layout

Transformer_evo/: main model + training scripts
- Transformer_evo/model.py: PyTorch model definitions
- Transformer_evo/train_main.py: train/evaluate on predefined splits
- Transformer_evo/train_ind_test.py: train on the independent training set and run inference on an external test set
- Transformer_evo/process_data.py: utilities to build training CSVs from the supplement excel table
analyze_and_label.py: add human-readable label names and compute per-bacteria statistics
data/README.md: expected input files (not versioned)

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Optional utilities (Excel export + attribution scripts):

pip install -r requirements-optional.txt

Data preparation

Place your datasets and split CSVs under data/ as described in data/README.md.

If you have the supplement excel table and want to generate train_data.csv / unannotated.csv:

python Transformer_evo/process_data.py \
  --train-xlsx data/train_data.xlsx \
  --kmer-dir kmer_result \
  --out-train-csv data/train_data.csv \
  --out-unannotated-csv data/unannotated.csv

Training (Transformer)

The main entrypoint is Transformer_evo/train_main.py. It expects split files like: data/random_split_train.csv and data/random_split_test.csv.

python Transformer_evo/train_main.py --split random_split --data-aug --epochs 100

Outputs:

confidence scores: conf_scores_best/
(other scripts may write checkpoints under models_best/)

Independent training + external inference

python Transformer_evo/train_ind_test.py --data-aug --epochs 100

By default, this script looks for:

data/independent_train.csv
data/cleaned_positive_adjuvant_model_controls_04_25_2022.csv (expects column actual_strain)
ind_data1/drug1_base.npy and ind_data2/drug2_base.npy

Dataset inspection: labels + statistics

python analyze_and_label.py \
  --input data/independent_train.csv \
  --output overall_train_labeled.csv \
  --stats-csv bacteria_statistics.csv \
  --stats-xlsx bacteria_statistics.xlsx

Citation

If you use this code, please cite the paper:

TBD

If you need access to data, pretrained/fine-tuned checkpoints used in our experiments, please contact the repository owner via email.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drug Combination Discovery

Model Overview

Repository layout

Setup

Data preparation

Training (Transformer)

Independent training + external inference

Dataset inspection: labels + statistics

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Transformer_evo		Transformer_evo
assets		assets
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze_and_label.py		analyze_and_label.py
create_stats.py		create_stats.py
requirements-optional.txt		requirements-optional.txt
requirements.txt		requirements.txt
save_statistics.py		save_statistics.py

License

JunboShen/DeepSAD

Folders and files

Latest commit

History

Repository files navigation

Drug Combination Discovery

Model Overview

Repository layout

Setup

Data preparation

Training (Transformer)

Independent training + external inference

Dataset inspection: labels + statistics

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages