Skip to content

lynnhakim/Path-Aware-Relational-Graph-Convolutional-Network

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PA-RGCN: Path-Aware Relational GCN for Knowledge Graph Completion

This repository contains the source code for the course project "Relation-Aware Attention and Path-Aware Encoding for Knowledge Graph Completion".

Contents

src/
  data_loader.py     # generic KG dataset loader (train/valid/test .txt)
  model_rgcn.py      # baseline R-GCN with DistMult decoder
  model_pargcn.py    # proposed PA-RGCN (relation-attention + path encoder)
  train.py           # training / evaluation loop
results/
  experiment_plan.json       # canonical run list for the project
  external_baselines.json    # reference baselines not implemented here
  results.json               # metrics produced by src/train.py
  logs/                      # per-run training logs written by the runner
  make_results.py            # generates LaTeX tables from actual outputs
scripts/
  run_experiments.py         # executes the experiment plan
  run_pipeline.py            # train -> regenerate report tables -> optional PDF
report/
  pa_rgcn_report.pdf # final EMNLP-style report (6+ pages)
  pa_rgcn_report.tex # LaTeX source for the report
  generated/         # auto-generated tables consumed by the report

Datasets

All three datasets are released as triples in tab-separated files (train.txt, valid.txt, test.txt) with lines of the form

head    relation    tail

Recommended sources (all are small and CPU-friendly):

Dataset # Entities # Relations # Triples Source
UMLS 135 46 ~6,500 https://github.com/TimDettmers/ConvE/tree/master/UMLS
Kinships 104 25 ~10,700 https://github.com/TimDettmers/ConvE/tree/master/kinship
WN18RR 40,943 11 ~93,000 https://github.com/TimDettmers/ConvE/tree/master/WN18RR

Put each dataset under data/<name>/:

data/umls/train.txt
data/umls/valid.txt
data/umls/test.txt

Requirements

python >= 3.9
torch >= 1.13

No other dependencies.

Reproducing the results

# run the full experiment plan and regenerate report tables
python3 scripts/run_pipeline.py

# regenerate report tables only from the current results.json
python3 scripts/run_pipeline.py --skip-train

# run just one dataset or one named run
python3 scripts/run_experiments.py --only-dataset umls
python3 scripts/run_experiments.py --only-run "PA-RGCN (full)"

# optionally compile the report PDF after regenerating the tables
python3 scripts/run_pipeline.py --compile-report

The canonical experiment list lives in results/experiment_plan.json. The report tables are generated automatically from results/results.json plus results/external_baselines.json. Missing experiment outputs are rendered as -- in the LaTeX tables until the corresponding runs finish.

The current training script reports filtered MRR and Hits@{1,3,10} for tail prediction and selects the best validation checkpoint across the periodic evaluations it performs.

Hyperparameters

Default values are tuned on the UMLS validation set and reused across datasets:

Hyperparameter Value
hidden dim 128
# bases 4
layers 2
dropout 0.2
optimizer Adam, lr=1e-2
batch size 1024
# negatives 10
max path length 2
max paths per triple 3

Contact

If anything is unclear, see the project report in report/ for additional details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors