PA-RGCN: Path-Aware Relational GCN for Knowledge Graph Completion

This repository contains the source code for the course project "Relation-Aware Attention and Path-Aware Encoding for Knowledge Graph Completion".

src/
  data_loader.py     # generic KG dataset loader (train/valid/test .txt)
  model_rgcn.py      # baseline R-GCN with DistMult decoder
  model_pargcn.py    # proposed PA-RGCN (relation-attention + path encoder)
  train.py           # training / evaluation loop
results/
  experiment_plan.json       # canonical run list for the project
  external_baselines.json    # reference baselines not implemented here
  results.json               # metrics produced by src/train.py
  logs/                      # per-run training logs written by the runner
  make_results.py            # generates LaTeX tables from actual outputs
scripts/
  run_experiments.py         # executes the experiment plan
  run_pipeline.py            # train -> regenerate report tables -> optional PDF
report/
  pa_rgcn_report.pdf # final EMNLP-style report (6+ pages)
  pa_rgcn_report.tex # LaTeX source for the report
  generated/         # auto-generated tables consumed by the report

Datasets

All three datasets are released as triples in tab-separated files (train.txt, valid.txt, test.txt) with lines of the form

head    relation    tail

Recommended sources (all are small and CPU-friendly):

Dataset	# Entities	# Relations	# Triples	Source
UMLS	135	46	~6,500	https://github.com/TimDettmers/ConvE/tree/master/UMLS
Kinships	104	25	~10,700	https://github.com/TimDettmers/ConvE/tree/master/kinship
WN18RR	40,943	11	~93,000	https://github.com/TimDettmers/ConvE/tree/master/WN18RR

Put each dataset under data/<name>/:

data/umls/train.txt
data/umls/valid.txt
data/umls/test.txt

Requirements

python >= 3.9
torch >= 1.13

No other dependencies.

Reproducing the results

# run the full experiment plan and regenerate report tables
python3 scripts/run_pipeline.py

# regenerate report tables only from the current results.json
python3 scripts/run_pipeline.py --skip-train

# run just one dataset or one named run
python3 scripts/run_experiments.py --only-dataset umls
python3 scripts/run_experiments.py --only-run "PA-RGCN (full)"

# optionally compile the report PDF after regenerating the tables
python3 scripts/run_pipeline.py --compile-report

The canonical experiment list lives in results/experiment_plan.json. The report tables are generated automatically from results/results.json plus results/external_baselines.json. Missing experiment outputs are rendered as -- in the LaTeX tables until the corresponding runs finish.

The current training script reports filtered MRR and Hits@{1,3,10} for tail prediction and selects the best validation checkpoint across the periodic evaluations it performs.

Hyperparameters

Default values are tuned on the UMLS validation set and reused across datasets:

Hyperparameter	Value
hidden dim	128
# bases	4
layers	2
dropout	0.2
optimizer	Adam, lr=1e-2
batch size	1024
# negatives	10
max path length	2
max paths per triple	3

Contact

If anything is unclear, see the project report in report/ for additional details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
report		report
results		results
scripts		scripts
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PA-RGCN: Path-Aware Relational GCN for Knowledge Graph Completion

Contents

Datasets

Requirements

Reproducing the results

Hyperparameters

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PA-RGCN: Path-Aware Relational GCN for Knowledge Graph Completion

Contents

Datasets

Requirements

Reproducing the results

Hyperparameters

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages