Evaluation function for our dataset. Start with all question types separately, but add a function for full evaluation.
Evaluation function for our dataset. Start with all question types separately, but add a function for full evaluation.