TKG-DTI is a knowledge graph-based framework for predicting high-confidence drug-target interactions (DTIs) in cancer therapeutics. The system integrates multiple biomedical data sources into a heterogeneous knowledge graph and uses graph-based machine learning models (ComplEx² and Path-GNN) to predict novel DTIs.
- Knowledge Graph Construction: Automated pipeline to build heterogeneous biomedical knowledge graphs from multiple curated sources (Targetome, CTD, OmniPath, LINCS L1000)
- ComplEx² Model: Heterogeneous adaptation of the ComplEx² knowledge graph embedding model with normalized log-probabilities
- Path-GNN Model: Diffusion-based graph neural network for interpretable DTI predictions
- Cross-Validation: K-fold cross-validation framework with aggregated prediction scoring
- Reproducible Workflows: Snakemake pipelines for end-to-end reproducibility
# Clone the repository
git clone https://github.com/biodev/TKG-DTI.git
cd TKG-DTI
# Create conda environment
mamba env create -f environment.yml
mamba activate tkgdti
# Install the package
pip install -e .There are a few helper functions that we borrow from the GSNN repo, in the future we will remove this dependency. To install GSNN:
$ pip install git+https://github.com/nathanieljevans/GSNN See data_availability.md for complete data requirements.
# Download publicly available datasets (CTD, UniProt, BeatAML)
# Edit ROOT path in script first
bash scripts/get_tkg_raw_files.shNote: The Expanded Cancer Targetome and LINCS L1000 data require separate download. See data_availability.md for access instructions.
Edit the configuration file for your workflow:
# Edit paths and parameters
vim workflow/full-tkg/config.yamlcd workflow/full-tkg
snakemake -j 1 # Use -j N for N parallel jobsTKG-DTI/
├── tkgdti/ # Python package
│ ├── data/ # Data loading and graph construction
│ ├── models/ # ComplEx² and GNN model implementations
│ ├── train/ # Training utilities
│ ├── eval/ # Evaluation metrics
│ └── embed/ # Drug/protein embedding utilities
├── workflow/ # Snakemake workflows
│ ├── full-tkg/ # Full TKG workflow
│ ├── aml-tkg/ # AML-focused workflow
│ ├── hetero-a/ # HeteroA baseline
│ └── scripts/ # KG construction scripts (steps 01-10)
├── scripts/ # Utility scripts
├── docs/ # Documentation
├── environment.yml # Conda environment specification
└── data_availability.md # Data sources and access instructions
- TKG-DTI Methods --- Detailed workflow and model documentation
- GNN Architecture --- Path-based GNN approach for interpretable predictions
- Aggregation Guide --- Cross-fold prediction aggregation and filtering
- KG Design --- Original knowledge graph design proposal
- Data Availability --- Required datasets and access instructions
If you use TKG-DTI in your research, please cite:
Bottomly, D.*, Evans, N.* & McWeeney, S., K. Expanding the set of high evidence drug-target interactions in the Cancer Targetome. In preparation.
Copyright (C) 2026. Oregon Health & Science University Knight Cancer Institute
This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details.
This work was supported by the Edward P. Evans Foundation (EvansMDS Artificial Intelligence and Machine Learning in MDS award)
For questions or issues, please open a GitHub issue or contact the maintainer at evansna@ohsu.edu.