HybriTE is a graph neural network for mRNA translation efficiency (TE) prediction. It models each transcript as a heterogeneous graph that combines sequence composition, RNA secondary structure from RNAplfold, and biochemical priors (RBP binding, modifications). This repository contains the code used in the HybriTE paper, including data preparation, graph construction, training, prediction, and explainability.
- Python 3.10
- ViennaRNA (RNAplfold)
Quick setup (conda):
git clone https://github.com/turgaybulut/HybriTE.git && cd HybriTE
conda env create -f environment.yaml
conda activate hybrite
pip install -e .Download the prepared CSV files from this Google Drive folder and place them here:
data/te/human_te_data_with_biochemicals.csvdata/te/mouse_te_data_with_biochemicals.csv
These paths are used by the commands below.
Follow these steps in order. Paths below match the repository layout.
Convert the CSV into NumPy arrays for targets and biochemical features.
python scripts/prepare_data.py \
--input_csv data/te/human_te_data_with_biochemicals.csv \
--out_dir data/te/human \
--select_k 100For mouse, reuse the human-selected biochemical feature set:
python scripts/prepare_data.py \
--input_csv data/te/mouse_te_data_with_biochemicals.csv \
--out_dir data/te/mouse \
--feature_meta data/te/human/meta.jsonOutputs per species:
data/te/<species>/target.npydata/te/<species>/feature.npydata/te/<species>/meta.json
Requires RNAplfold in your PATH.
python scripts/generate_graphs.py \
data/te/human_te_data_with_biochemicals.csv \
data/te/human/human_graph.pt
python scripts/generate_graphs.py \
data/te/mouse_te_data_with_biochemicals.csv \
data/te/mouse/mouse_graph.ptFor the no-structure ablation:
python scripts/generate_graphs_no_structure.py \
data/te/human_te_data_with_biochemicals.csv \
data/te/human/human_graph_nostruct.ptModel settings are loaded from config.yaml and plugins/hybrite/config.yaml.
Update plugins/hybrite/config.yaml to switch between human and mouse:
data.root: data/te/humanordata/te/mousedata.graphs_pt: human_graph.ptormouse_graph.pt
Single split:
python train.py --config config.yamlK-fold cross-validation:
python train_fold.py --config config.yamlCheckpoints and logs are written under results/hybrite/.
Run inference from a saved checkpoint.
python predict.py \
--checkpoint results/hybrite/fold_00/checkpoints/<checkpoint>.ckpt \
--config config.yaml \
--output_dir results/hybrite/fold_00Run inference for all folds:
./predict_all_folds.sh results/hybrite/Outputs:
predictions.csvground_truth.csvmetrics.csvtarget_metrics.csv(per-target)
Generate SHAP and graph importance plots.
python scripts/explain_model.py \
--checkpoint results/hybrite/fold_00/checkpoints/<checkpoint>.ckpt \
--species human \
--data_dir data/te/human \
--output_dir figures/shapUse a model trained on one species to predict another.
python predict_cross_species.py \
--checkpoint results/hybrite/fold_00/checkpoints/<checkpoint>.ckpt \
--train_config config.yaml \
--test_inputs data/te/mouse/mouse_graph.pt \
--test_targets data/te/mouse/target.npy \
--test_biochemical_features data/te/mouse/feature.npy \
--suffix human_to_mouse