PAM: Paraphrase AMR-Centric Evaluation Metric

This repo contains the code for the paper PAM: Paraphrase AMR-Centric Evaluation Metric, by Afonso Sousa & Henrique Lopes Cardoso (ACL Findings 2025).

Paraphrasing is rooted in semantics, which makes evaluating paraphrase generation systems hard. Current paraphrase generators are typically evaluated using borrowed metrics from adjacent text-to-text tasks, like machine translation or text summarization. These metrics tend to have ties to the surface form of the reference text. This is not ideal for paraphrases as we typically want variation in the lexicon while persisting semantics. To address this problem, and inspired by learned similarity evaluation on plain text, we propose PAM, a Paraphrase AMR-Centric Evaluation Metric. This metric uses AMR graphs extracted from the input text, which consist of semantic structures agnostic to the text surface form, making the resulting evaluation metric more robust to variations in syntax or lexicon. Additionally, we evaluated \pam on different semantic textual similarity datasets and found that it improves the correlations with human semantic scores when compared to other AMR-based metrics.

Installation

First, to create a fresh conda environment with all the used dependencies run:

conda env create -f environment.yml

Additionally, for most scripts you will need the pretrained AMR parser. We used parse_xfm_bart_large from here. Download it, rename it to amr_parser, and place it in the root directory.

Preprocess data

Go to data/README and extract the third-party data into /data folder.

data
└── dataset_name
    └── main
        └──raw
           │ src.dev.amr
           │ src.test.amr
           │ tgt.dev.amr
           │ tgt.test.amr

Then use merge_dataset.sh to merge the information into a json file. For the aforementioned example, the output file should be placed under /main.

Train and test models

To train/test PAM or any other model refered to in the paper you can run the corresponding script. For example:

sh ./scripts/train_pam.sh

sh ./scripts/test_pam.sh

Further finetune

To further finetune the trained model on Quora Question Pairs (QQP), run:

sh ./scripts/paraphrase_finetune.sh

Other experiments reported in the paper

For many experiments reported in the paper, we used third-party libraries integrated into our source code, which require you to extract them to the root directory and potentially install the respective packages -- for example, AlignScore.

Others, like WWLK, were computed using the original source code.

Some files were used for smaller, single experiments:

Acknowledgements

This project used code and took inspiration from the following open source projects:

License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
metrics		metrics
models		models
preprocess		preprocess
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
amr_utils.py		amr_utils.py
compare_static_embeddings.py		compare_static_embeddings.py
computational_cost.py		computational_cost.py
contrastive_dataloader.py		contrastive_dataloader.py
create_reframed_dataset.py		create_reframed_dataset.py
ct_dataloader.py		ct_dataloader.py
ct_loss.py		ct_loss.py
ct_pretrain.py		ct_pretrain.py
environment.yml		environment.yml
evaluate_metrics.py		evaluate_metrics.py
evaluator.py		evaluator.py
inference.py		inference.py
input_example.py		input_example.py
pam_vs_sbert.py		pam_vs_sbert.py
paraphrase_finetune.py		paraphrase_finetune.py
preprocess.py		preprocess.py
test.py		test.py
test_amr_sim_metrics.py		test_amr_sim_metrics.py
test_etpc.py		test_etpc.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAM: Paraphrase AMR-Centric Evaluation Metric

Installation

Preprocess data

Train and test models

Further finetune

Other experiments reported in the paper

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PAM: Paraphrase AMR-Centric Evaluation Metric

Installation

Preprocess data

Train and test models

Further finetune

Other experiments reported in the paper

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages