Add more than one acceptable answer in our Question-Answering dataset

## Context 
- In #612 we started the creation of a Question-Answering dataset for Extractive QA task evaluation/training.
- This was done following the style of popular datasets for Extractive QA like SQuAD.
- However, for the sake of simplicity, we only annotated one ground-truth answer for each sample.
- In SQuAD, more than one ground-truth answer is annotated, and we should do the same because otherwise our evaluation (EM score in particular, but also F1) is biased by the lack of a complete set of acceptable answers.
- For instance
  - **Question**: `"It is estimated that about 200,000 people live in Geneva."`
  - **Context**: `"What is the population size of Geneva?"`
  - **Acceptable Answers**: `["200,000", "about 200,000", "200,000 people", "about 200,000 people"]`

## Actions
- [ ] Add to our QA dataset more than one acceptable answer for each sample.
- [ ] Re-run evaluation and compare results. We should expect higher scores for all models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add more than one acceptable answer in our Question-Answering dataset #617

Context

Actions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add more than one acceptable answer in our Question-Answering dataset #617

Description

Context

Actions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions