Enzyme Ranking Repo

Standalone folder for:

training a ranking model from your own data
optional reaction-context model
optional few-shot fine-tuning on in-house measurements

Folder Layout

data/
- all_experiment.csv
- inhouse-experiment.csv
- experiments_4-methoxystyrene_eda_clean.csv
- 01Q_seq-candidates.csv
- 01Q_seq-candidates_merged_4-methoxystyrene-eda.csv
db_structures/
train_general_ranker.py
train_general_with_reaction_ranker.py
finetune_01q_hybrid_ranker.py
rdkit_morgan_featurize.py
models/ (outputs)
outputs/ (predictions/eval outputs)

Install

pip install -r requirements.txt

For reaction featurization, RDKit is required. If needed, install RDKit via conda and pass --rdkit-env <env_name>.

Input Schema

General training CSV needs:

amino_acid_sequence (or sequence)
normalized_fitness (or your chosen target column)
parent_experiment (group column for grouped split)
db_structure_path (or structure_path)

Reaction-context training additionally needs:

smiles_reaction

Fine-tuning needs:

candidates CSV with id, sequence column
base prediction CSV with id, predicted_score
measured CSV with id, measured target column

1) Train General Model (Seq + Structure)

python train_general_ranker.py train \
  --dataset data/all_experiment.csv \
  --target-col normalized_fitness \
  --group-col parent_experiment \
  --model-class extratrees \
  --out-model models/general_ranker.joblib \
  --out-metrics models/general_ranker_metrics.json

Score candidates:

python train_general_ranker.py score \
  --model models/general_ranker.joblib \
  --candidates data/01Q_seq-candidates.csv \
  --out outputs/general_candidate_scores.csv

2) Train General + Reaction Model (Optional)

python train_general_with_reaction_ranker.py train \
  --dataset data/all_experiment.csv \
  --target-col normalized_fitness \
  --group-col parent_experiment \
  --reaction-col smiles_reaction \
  --model-class extratrees \
  --out-model models/general_with_reaction_ranker.joblib \
  --out-metrics models/general_with_reaction_ranker_metrics.json

If RDKit is in a conda env:

python train_general_with_reaction_ranker.py train \
  --dataset data/all_experiment.csv \
  --reaction-col smiles_reaction \
  --rdkit-env debase \
  --model-class extratrees

Score candidates with optional fixed reaction:

python train_general_with_reaction_ranker.py score \
  --model models/general_with_reaction_ranker.joblib \
  --candidates data/01Q_seq-candidates.csv \
  --reaction-smiles "C=CC1=CC=C(OC)C=C1.O=C(OCC)C=[N+]=[N-]>>COC2=CC=C(C=C2)[C@@H]3[C@@H](C(OCC)=O)C3" \
  --out outputs/general_with_reaction_candidate_scores.csv

3) Few-Shot Fine-Tune on In-House Measurements (Optional)

python finetune_01q_hybrid_ranker.py \
  --candidates data/01Q_seq-candidates.csv \
  --actual data/01Q_seq-candidates_merged_4-methoxystyrene-eda.csv \
  --base-pred outputs/general_with_reaction_candidate_scores.csv \
  --support-size 24 \
  --seed 0 \
  --out-full outputs/hybrid_full.csv \
  --out-eval outputs/hybrid_eval.csv \
  --out-support outputs/hybrid_support.csv \
  --out-metrics outputs/hybrid_metrics.json

All CLI Flags

Use built-in help for complete flags:

python train_general_ranker.py --help
python train_general_ranker.py train --help
python train_general_ranker.py score --help

python train_general_with_reaction_ranker.py --help
python train_general_with_reaction_ranker.py train --help
python train_general_with_reaction_ranker.py score --help

python finetune_01q_hybrid_ranker.py --help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enzyme Ranking Repo

Folder Layout

Install

Input Schema

1) Train General Model (Seq + Structure)

2) Train General + Reaction Model (Optional)

3) Few-Shot Fine-Tune on In-House Measurements (Optional)

All CLI Flags

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
db_structures		db_structures
models		models
outputs		outputs
.gitignore		.gitignore
README.md		README.md
finetune_01q_hybrid_ranker.py		finetune_01q_hybrid_ranker.py
rdkit_morgan_featurize.py		rdkit_morgan_featurize.py
requirements.txt		requirements.txt
train_general_ranker.py		train_general_ranker.py
train_general_with_reaction_ranker.py		train_general_with_reaction_ranker.py

YuemingLong/EnzRanker_Kit

Folders and files

Latest commit

History

Repository files navigation

Enzyme Ranking Repo

Folder Layout

Install

Input Schema

1) Train General Model (Seq + Structure)

2) Train General + Reaction Model (Optional)

3) Few-Shot Fine-Tune on In-House Measurements (Optional)

All CLI Flags

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages