Skip to content
/ gmn Public

GraphletMolNet is a graphlet-based model to predict molecular interactions of proteins and ligands.

Notifications You must be signed in to change notification settings

matiollipt/gmn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GraphletMolNet: Graphlet-based Prediction of Protein Interactions

Dataset

The PPI dataset from Gold-standard PPI splits (Bernett et al. 2024) has the following properties:

  • Splits: Intra-1 (163,192 training points), Intra-0 (59,260 validation points), Intra-2 (52,048 test points)
  • No direct data leakage between splits
  • Minimized sequence similarity w.r.t. length-normalized bitscores between training, validation, test
  • Redundancy-reduction with CD-HIT (no sequence similarity >40% between proteins in each split)

GraphletMolNet baseline

Graphlet graphs for each protein should be placed under data/external/graphlets. Use the training script to fit the baseline GCN model:

python scripts/train_graphlet_model.py

Preprocess the dataset and fetch AlphaFold structures with:

python scripts/preprocess_dataset.py
python scripts/build_alphafold_db.py

See .reports/structural_pipeline.md for details.

A summary of the model is provided in .reports/report.md.

Directory structure

project/
├─ fetch.py
├─ logger_setup.py
├─ paths.py          ← sole path registry
├─ datasets.py       ← GraphDataset + graphlet loaders
├─ models.py
├─ graphlet_model.py
├─ pdb_utils.py
└─ protein_embedding.py

About

GraphletMolNet is a graphlet-based model to predict molecular interactions of proteins and ligands.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published