Standalone folder for:
- training a ranking model from your own data
- optional reaction-context model
- optional few-shot fine-tuning on in-house measurements
data/all_experiment.csvinhouse-experiment.csvexperiments_4-methoxystyrene_eda_clean.csv01Q_seq-candidates.csv01Q_seq-candidates_merged_4-methoxystyrene-eda.csv
db_structures/train_general_ranker.pytrain_general_with_reaction_ranker.pyfinetune_01q_hybrid_ranker.pyrdkit_morgan_featurize.pymodels/(outputs)outputs/(predictions/eval outputs)
pip install -r requirements.txtFor reaction featurization, RDKit is required. If needed, install RDKit via conda and pass --rdkit-env <env_name>.
General training CSV needs:
amino_acid_sequence(orsequence)normalized_fitness(or your chosen target column)parent_experiment(group column for grouped split)db_structure_path(orstructure_path)
Reaction-context training additionally needs:
smiles_reaction
Fine-tuning needs:
- candidates CSV with
id, sequence column - base prediction CSV with
id,predicted_score - measured CSV with
id, measured target column
python train_general_ranker.py train \
--dataset data/all_experiment.csv \
--target-col normalized_fitness \
--group-col parent_experiment \
--model-class extratrees \
--out-model models/general_ranker.joblib \
--out-metrics models/general_ranker_metrics.jsonScore candidates:
python train_general_ranker.py score \
--model models/general_ranker.joblib \
--candidates data/01Q_seq-candidates.csv \
--out outputs/general_candidate_scores.csvpython train_general_with_reaction_ranker.py train \
--dataset data/all_experiment.csv \
--target-col normalized_fitness \
--group-col parent_experiment \
--reaction-col smiles_reaction \
--model-class extratrees \
--out-model models/general_with_reaction_ranker.joblib \
--out-metrics models/general_with_reaction_ranker_metrics.jsonIf RDKit is in a conda env:
python train_general_with_reaction_ranker.py train \
--dataset data/all_experiment.csv \
--reaction-col smiles_reaction \
--rdkit-env debase \
--model-class extratreesScore candidates with optional fixed reaction:
python train_general_with_reaction_ranker.py score \
--model models/general_with_reaction_ranker.joblib \
--candidates data/01Q_seq-candidates.csv \
--reaction-smiles "C=CC1=CC=C(OC)C=C1.O=C(OCC)C=[N+]=[N-]>>COC2=CC=C(C=C2)[C@@H]3[C@@H](C(OCC)=O)C3" \
--out outputs/general_with_reaction_candidate_scores.csvpython finetune_01q_hybrid_ranker.py \
--candidates data/01Q_seq-candidates.csv \
--actual data/01Q_seq-candidates_merged_4-methoxystyrene-eda.csv \
--base-pred outputs/general_with_reaction_candidate_scores.csv \
--support-size 24 \
--seed 0 \
--out-full outputs/hybrid_full.csv \
--out-eval outputs/hybrid_eval.csv \
--out-support outputs/hybrid_support.csv \
--out-metrics outputs/hybrid_metrics.jsonUse built-in help for complete flags:
python train_general_ranker.py --help
python train_general_ranker.py train --help
python train_general_ranker.py score --help
python train_general_with_reaction_ranker.py --help
python train_general_with_reaction_ranker.py train --help
python train_general_with_reaction_ranker.py score --help
python finetune_01q_hybrid_ranker.py --help