GraphletMolNet: Graphlet-based Prediction of Protein Interactions

Dataset

The PPI dataset from Gold-standard PPI splits (Bernett et al. 2024) has the following properties:

Splits: Intra-1 (163,192 training points), Intra-0 (59,260 validation points), Intra-2 (52,048 test points)
No direct data leakage between splits
Minimized sequence similarity w.r.t. length-normalized bitscores between training, validation, test
Redundancy-reduction with CD-HIT (no sequence similarity >40% between proteins in each split)

GraphletMolNet baseline

Graphlet graphs for each protein should be placed under data/external/graphlets. Use the training script to fit the baseline GCN model:

python scripts/train_graphlet_model.py

Preprocess the dataset and fetch AlphaFold structures with:

python scripts/preprocess_dataset.py
python scripts/build_alphafold_db.py

See .reports/structural_pipeline.md for details.

A summary of the model is provided in .reports/report.md.

Directory structure

project/
├─ fetch.py
├─ logger_setup.py
├─ paths.py          ← sole path registry
├─ datasets.py       ← GraphDataset + graphlet loaders
├─ models.py
├─ graphlet_model.py
├─ pdb_utils.py
└─ protein_embedding.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
gmn		gmn
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphletMolNet: Graphlet-based Prediction of Protein Interactions

Dataset

GraphletMolNet baseline

Directory structure

About

Uh oh!

Releases

Packages

Languages

matiollipt/gmn

Folders and files

Latest commit

History

Repository files navigation

GraphletMolNet: Graphlet-based Prediction of Protein Interactions

Dataset

GraphletMolNet baseline

Directory structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages