Imagine is a post-hoc local explainability tool for Link Prediction (LP) on Knowledge Graphs (KGs) through embedding-based models. It explains a prediction
It provides Additive Counterfactual Explanations (ACEs) consisting of additional triples. The additional triples are those triples that are neither explicitly stated in, nor entailed by, the KG, yet are not assumed to be false under Open World Assumption. It explains a prediction
Imagine is structured on four components:
- Triple Builder: generates additional triples featuring s
- Pre-Filter: selects the most useful additional triples
- Explanation Builder: combines the pre-filtered triples into candidate explanations and identifies sufficiently relevant ones
- Relevance Engine: estimates the relevance of a candidate explanation
Check also:
# Clone the repository
git clone https://github.com/rbarile17/imagine.git
# Navigate to the repository directory
cd imagine
# Install the required dependencies
pip install -r requirements.txtFollow the steps in this section to run the pipeline.
All commands require the parameters dataset, model.
Find in data the datasets DB50K, DB100K, and YAGO4-20.
You can also experiment with your own datasets! (structured as explained in data README)
Instead, the supported models are: ComplEx, ConvE, TransE.
You can extend the class Model to add models!
Run the commands with the --help option to inspect the possible values for all the parameters!
Create a <model>_<dataset>.json in configs specifying the config for training, explanation, and evaluation. Check out configs README for information and examples on configurations.
python -m src.link_prediction.train --dataset <dataset> --model <model> --valid <validation_epochs><valid> is the frequency (in epochs) of evaluation of the model on the validation set to determine whether to apply early stopping
python -m src.link_prediction.test --dataset <dataset> --model <model>python -m src.selec_preds --dataset <dataset> --model <model> --pred-rank <pred_rank><pred_rank> specifies which predictions to select based on their rank, choose between:
anyfirstnotfirst
The commands in this section also require:
<method>: the explanation method, choose between:<summarization>(to specify solely if the method is one ofimagine,wimagine,ikelpie++,kelpie++) is the summarization solution to adopt in the Explanation Builder, choose between the following values (ordered by increasing granularity):simulationbisimulationno(default)
python -m src.explain --method <method> --dataset <dataset> --model <model> --mode <mode> --summarization <summarization>python -m src.evaluation.verify_explanations --method <method> --dataset <dataset> --model <model> --mode <mode> --summarization <summarization>python -m src.evaluation.compute_metrics --method <method> --dataset <dataset> --model <model> --mode <mode> --summarization <summarization>To reproduce the experiments in the paper use:
- the datasets DB50K, DB100K, YAGO4-20
- our configs specifying the hyperparameters found as described in Appendix B of the paper
- our pre-trained models
- our sampled correct preds
We report the hyper-parameters that we adopted in all phases of the experimental evaluation.
| Model | Parameter | DB50K | DB100K | YAGO4-20 |
|---|---|---|---|---|
| TransE | 64 | 64 | 6 | |
| 2 | 1 | 2 | ||
| 60 | 165 | 45 | ||
| 0.003 | 0.002 | 0.042 | ||
| 10 | 2 | 2 | ||
| 5 | 15 | 10 | ||
| ConvE | 200 | 200 | 200 | |
| 0.1 | 0 | 0.2 | ||
| 0 | 0.1 | 0 | ||
| 0 | 0.2 | 0 | ||
| 65 | 210 | 210 | ||
| 0.030 | 0.013 | 0.007 | ||
| ComplEx | 256 | 256 | 256 | |
| 39 | 259 | 149 | ||
| 0.046 | 0.029 | 0.015 |
Note that:
-
$D$ is the embedding dimension, in the models that we adopted entity and relation embeddings always have the same dimension -
$p$ is the exponent of the$p$ -norm -
$Lr$ is the learning rate -
$B$ is the batch size -
$Ep$ is the number of epochs -
$\gamma$ is the margin in the Pairwise Ranking Loss -
$N$ is the number of negative triples generated for each positive triple -
$\omega$ is the size of the convolutional kernels -
$Drop$ is the training dropout rate, specifically:-
$in$ is the input dropout -
$h$ is the dropout applied after a hidden layer -
$feat$ is the feature dropout
-
We adopted Random Search to find the values of the hyper-parameters. Exceptions are given by
├── README.md <- The top-level README for developers using this project. ├── data ├── notebooks <- Jupyter notebooks. ├── requirements.txt <- The requirements file for reproducing the environment │ └── src <- Source code.