Skip to content
This repository was archived by the owner on Jan 29, 2024. It is now read-only.
This repository was archived by the owner on Jan 29, 2024. It is now read-only.

Add more than one acceptable answer in our Question-Answering dataset #617

@FrancescoCasalegno

Description

@FrancescoCasalegno

Context

  • In Question-Answering: collect example questions + run first analysis with pre-trained QA models #612 we started the creation of a Question-Answering dataset for Extractive QA task evaluation/training.
  • This was done following the style of popular datasets for Extractive QA like SQuAD.
  • However, for the sake of simplicity, we only annotated one ground-truth answer for each sample.
  • In SQuAD, more than one ground-truth answer is annotated, and we should do the same because otherwise our evaluation (EM score in particular, but also F1) is biased by the lack of a complete set of acceptable answers.
  • For instance
    • Question: "It is estimated that about 200,000 people live in Geneva."
    • Context: "What is the population size of Geneva?"
    • Acceptable Answers: ["200,000", "about 200,000", "200,000 people", "about 200,000 people"]

Actions

  • Add to our QA dataset more than one acceptable answer for each sample.
  • Re-run evaluation and compare results. We should expect higher scores for all models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions