Seahorse dataset

Seahorse is a dataset for multilingual, multifaceted summarization evaluation. It contains 96K summaries with human ratings along 6 quality dimensions: comprehensibility, repetition, grammar, attribution, main ideas, and conciseness, covering 6 languages, 9 systems and 4 datasets.

More details can be found in the paper, which can be cited as follows:

@misc{clark2023seahorse,
      title={SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation}, 
      author={Elizabeth Clark and Shruti Rijhwani and Sebastian Gehrmann and Joshua Maynez and Roee Aharoni and Vitaly Nikolaev and Thibault Sellam and Aditya Siddhant and Dipanjan Das and Ankur P. Parikh},
      year={2023},
      eprint={2305.13194},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

The Seahorse dataset is released under the CC-BY 4.0 license.

You can download the dataset here: https://storage.googleapis.com/seahorse-public/seahorse_data.zip

Dataset description

The dataset is split into 3 .tsv files: the train, validation, and test sets.

Each file contains the following information:

gem_id The ID corresponding to the article that was used to generate the summary (see Retrieving articles from GEM for more details)
worker_lang The language ID (de, es-ES, en-US, ru, tr, vi)
summary The generated summary
model The source of the summary (either reference or the summarization model)
question1-6 6 columns with annotator ratings, corresponding to the 6 dimensions of quality (comprehensibility, repetition, grammar, attribution, main idea(s), and conciseness). If question1= No, then there will be no ratings for the remaining questions.

Here is an example entry:

xlsum_english-validation-6416	en-US	Schools in England, Wales and Scotland are being urged to bring back overseas exchange trips.	t5_base	Yes	Yes	Yes	Yes	Yes	Yes

There is also a directory called duplicates, which contains the items that received multiple annotations. Note that this data should NOT be used for training metrics, as there may be overlap between the train/dev/test sets.

Retrieving articles from GEM

If you would like to access the articles that the Seahorse summaries are based on, you will need to retrieve them using their GEM ids.

The xsum, mlsum, and xlsum articles can all be retrieved through GEM on HuggingFace. The gem_id column points to the article in the GEM datasets.

The wikilingua article ids come from a previous version of the GEM dataset and should be retrieved using TensorFlow datasets. Here's an example of how to load the English wikilingua dataset into a dataframe:

import tensorflow_datasets as tfds

lang = 'english_en'
orig_split = 'validation'

ds, info = tfds.load(f'huggingface:gem/wiki_lingua_{lang}', split=orig_split, with_info=True)
hfdf = tfds.as_dataframe(ds,info)

Leaderboard

We are maintaining a leaderboard with official results on our test set.

We ask you to not incorporate any part of the Seahorse validation set into the training data, and only use it for validation/hyperparameter tuning as development sets are typically used.

We report results on two metrics: Pearson correlation ($\rho$) and area under the ROC curve (roc).

		Q1		Q2		Q3		Q4		Q5		Q6
Model	Link	$\rho$	roc	$\rho$	roc	$\rho$	roc	$\rho$	roc	$\rho$	roc	$\rho$	roc
mT5-seahorse	[Clark et al. 2023]	0.52	0.90	0.86	0.98	0.45	0.84	0.59	0.85	0.50	0.80	0.52	0.81
mT5-XNLI	[Honovich et al. 2022, Conneau et al. 2018]	-	-	-	-	-	-	0.43	0.78	-	-	-	-
ROUGE-L	[Lin et al. 2004]	0.04	0.54	0.06	0.54	-0.03	0.43	0.13	0.55	0.03	0.54	0.02	0.54
Majority Class	-	-	0.5	-	0.5	-	0.5	-	0.5	-	0.5	-	0.5

Leaderboard Submission

If you want to submit to the leaderboard, please send an email to the contact email below with your results.

Contact

Please email eaclark@google.com if you have any questions about the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seahorse dataset

Dataset description

Retrieving articles from GEM

Leaderboard

Leaderboard Submission

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Seahorse dataset

Dataset description

Retrieving articles from GEM

Leaderboard

Leaderboard Submission

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages