This repository contains the source code for the course project "Relation-Aware Attention and Path-Aware Encoding for Knowledge Graph Completion".
src/
data_loader.py # generic KG dataset loader (train/valid/test .txt)
model_rgcn.py # baseline R-GCN with DistMult decoder
model_pargcn.py # proposed PA-RGCN (relation-attention + path encoder)
train.py # training / evaluation loop
results/
experiment_plan.json # canonical run list for the project
external_baselines.json # reference baselines not implemented here
results.json # metrics produced by src/train.py
logs/ # per-run training logs written by the runner
make_results.py # generates LaTeX tables from actual outputs
scripts/
run_experiments.py # executes the experiment plan
run_pipeline.py # train -> regenerate report tables -> optional PDF
report/
pa_rgcn_report.pdf # final EMNLP-style report (6+ pages)
pa_rgcn_report.tex # LaTeX source for the report
generated/ # auto-generated tables consumed by the report
All three datasets are released as triples in tab-separated files
(train.txt, valid.txt, test.txt) with lines of the form
head relation tail
Recommended sources (all are small and CPU-friendly):
| Dataset | # Entities | # Relations | # Triples | Source |
|---|---|---|---|---|
| UMLS | 135 | 46 | ~6,500 | https://github.com/TimDettmers/ConvE/tree/master/UMLS |
| Kinships | 104 | 25 | ~10,700 | https://github.com/TimDettmers/ConvE/tree/master/kinship |
| WN18RR | 40,943 | 11 | ~93,000 | https://github.com/TimDettmers/ConvE/tree/master/WN18RR |
Put each dataset under data/<name>/:
data/umls/train.txt
data/umls/valid.txt
data/umls/test.txt
python >= 3.9
torch >= 1.13
No other dependencies.
# run the full experiment plan and regenerate report tables
python3 scripts/run_pipeline.py
# regenerate report tables only from the current results.json
python3 scripts/run_pipeline.py --skip-train
# run just one dataset or one named run
python3 scripts/run_experiments.py --only-dataset umls
python3 scripts/run_experiments.py --only-run "PA-RGCN (full)"
# optionally compile the report PDF after regenerating the tables
python3 scripts/run_pipeline.py --compile-reportThe canonical experiment list lives in results/experiment_plan.json.
The report tables are generated automatically from results/results.json
plus results/external_baselines.json. Missing experiment outputs are
rendered as -- in the LaTeX tables until the corresponding runs finish.
The current training script reports filtered MRR and Hits@{1,3,10} for tail prediction and selects the best validation checkpoint across the periodic evaluations it performs.
Default values are tuned on the UMLS validation set and reused across datasets:
| Hyperparameter | Value |
|---|---|
| hidden dim | 128 |
| # bases | 4 |
| layers | 2 |
| dropout | 0.2 |
| optimizer | Adam, lr=1e-2 |
| batch size | 1024 |
| # negatives | 10 |
| max path length | 2 |
| max paths per triple | 3 |
If anything is unclear, see the project report in report/ for
additional details.