Paper: FIBER: A Multilingual Evaluation Resource for Factual Inference Bias
FIBER: Factual Inference Bias Evaluation Resource
FIBER is a high-quality dataset designed to evaluate language-model factual inference bias across three languages — English (en), Italian (it), and Turkish (tr).
It contains both single-entity and multi-entity question–answer sets derived from structured world knowledge domains (e.g., capitals, car brands, time zones, official languages).
For each prompt, all possible candidate answers are divided into a gold set (true answers) and a surface set (all possible answers). For each candidate in the surface set, the model's token-level log probabilities are computed sequentially--the model predicts each token's probability given the previously generated ones, and these probabilities are summed to produce an overall score for that candidate. After scoring all candidates, they are ranked by their total scores, and Average Precision (AP) is calculates based on this ranking. This procedure is repeated for every subject, topic, query type, and query target in the dataset.
The following Python libraries are required for evaluation:
torchtransformershuggingface-hub
To install them automatically, run the following command in your terminal: make requirements
Provide the following before model tests in the config.json
- Hugging Face token (hugging_face_token)
- Hugging Face ID of the model you want to test (model_id)
- Input directory (dataset_dir) (leave it as
datasetif you have not changed the file structure) - Output directory (results_dir) (
results/<MODEL_NAME>is suggested)
Example entries are provided below:
- hugging_face_token :
YOUR_HUGGING_FACE_TOKEN - model_id :
google/gemma-3-27b-it - dataset_dir :
dataset - output_dir :
results/gemma-3-27b
In paths, don't leave any forward slashes at the end (e.g. dataset/ or results/gemma-3-27b/).
Before running the script, make sure that you have downloaded dependencies and filled out the config information. To start evaluation, run the following command in your terminal: make run
Each file follows the structure:
| Component | Example | Meaning |
|---|---|---|
<topic> |
countries_official_languages |
Knowledge domain |
<language> |
en, tr, it |
Dataset language |
<index> |
0, 1, 2, 3, or 0_0, 0_1 |
Split or subset index |
Turkish (_tr_) topics include subsets (_0_0, _0_1) for grammar purposes.
The file structure, i.e. tree, of the dataset is as follows.
.
├── multi_entity # Queries with multiple correct answers.
│ ├── car_brands
│ │ ├── car_brands_en_0.json
│ │ ├── car_brands_en_1.json
│ │ ├── car_brands_en_2.json
│ │ ├── car_brands_en_3.json
│ │ ├── car_brands_it_0.json
│ │ ├── car_brands_it_1.json
│ │ ├── car_brands_it_2.json
│ │ ├── car_brands_it_3.json
│ │ ├── car_brands_tr_0_0.json
│ │ ├── car_brands_tr_0_1.json
│ │ ├── car_brands_tr_1.json
│ │ ├── car_brands_tr_2.json
│ │ └── car_brands_tr_3.json
│ ├── countries_heritages
│ │ ├── countries_heritages_en_0.json
│ │ ├── countries_heritages_en_1.json
│ │ ├── countries_heritages_en_2.json
│ │ ├── countries_heritages_en_3.json
│ │ ├── countries_heritages_it_0.json
│ │ ├── countries_heritages_it_1.json
│ │ ├── countries_heritages_it_2.json
│ │ ├── countries_heritages_it_3.json
│ │ ├── countries_heritages_tr_0_0.json
│ │ ├── countries_heritages_tr_0_1.json
│ │ ├── countries_heritages_tr_1.json
│ │ ├── countries_heritages_tr_2.json
│ │ └── countries_heritages_tr_3.json
│ ├── countries_neighbors
│ │ ├── countries_neighbors_en_0.json
│ │ ├── countries_neighbors_en_1.json
│ │ ├── countries_neighbors_en_2.json
│ │ ├── countries_neighbors_en_3.json
│ │ ├── countries_neighbors_it_0.json
│ │ ├── countries_neighbors_it_1.json
│ │ ├── countries_neighbors_it_2.json
│ │ ├── countries_neighbors_it_3.json
│ │ ├── countries_neighbors_tr_0_0.json
│ │ ├── countries_neighbors_tr_0_1.json
│ │ ├── countries_neighbors_tr_1.json
│ │ ├── countries_neighbors_tr_2.json
│ │ └── countries_neighbors_tr_3.json
│ ├── countries_official_languages
│ │ ├── countries_official_languages_en_0.json
│ │ ├── countries_official_languages_en_1.json
│ │ ├── countries_official_languages_en_2.json
│ │ ├── countries_official_languages_en_3.json
│ │ ├── countries_official_languages_it_0.json
│ │ ├── countries_official_languages_it_1.json
│ │ ├── countries_official_languages_it_2.json
│ │ ├── countries_official_languages_it_3.json
│ │ ├── countries_official_languages_tr_0_0.json
│ │ ├── countries_official_languages_tr_0_1.json
│ │ ├── countries_official_languages_tr_1.json
│ │ ├── countries_official_languages_tr_2.json
│ │ └── countries_official_languages_tr_3.json
│ ├── countries_timezones
│ │ ├── countries_timezones_en_0.json
│ │ ├── countries_timezones_en_1.json
│ │ ├── countries_timezones_en_2.json
│ │ ├── countries_timezones_en_3.json
│ │ ├── countries_timezones_it_0.json
│ │ ├── countries_timezones_it_1.json
│ │ ├── countries_timezones_it_2.json
│ │ ├── countries_timezones_it_3.json
│ │ ├── countries_timezones_tr_0_0.json
│ │ ├── countries_timezones_tr_0_1.json
│ │ ├── countries_timezones_tr_1.json
│ │ ├── countries_timezones_tr_2.json
│ │ └── countries_timezones_tr_3.json
│ ├── mobile_network_operators
│ │ ├── mobile_network_operators_en_0.json
│ │ ├── mobile_network_operators_en_1.json
│ │ ├── mobile_network_operators_en_2.json
│ │ ├── mobile_network_operators_en_3.json
│ │ ├── mobile_network_operators_it_0.json
│ │ ├── mobile_network_operators_it_1.json
│ │ ├── mobile_network_operators_it_2.json
│ │ ├── mobile_network_operators_it_3.json
│ │ ├── mobile_network_operators_tr_0_0.json
│ │ ├── mobile_network_operators_tr_0_1.json
│ │ ├── mobile_network_operators_tr_1.json
│ │ ├── mobile_network_operators_tr_2.json
│ │ └── mobile_network_operators_tr_3.json
│ ├── polyglot_celebs
│ │ ├── polyglot_celebs_en_0.json
│ │ ├── polyglot_celebs_en_1.json
│ │ ├── polyglot_celebs_en_2.json
│ │ ├── polyglot_celebs_en_3.json
│ │ ├── polyglot_celebs_it_0.json
│ │ ├── polyglot_celebs_it_1.json
│ │ ├── polyglot_celebs_it_2.json
│ │ ├── polyglot_celebs_it_3.json
│ │ ├── polyglot_celebs_tr_0_0.json
│ │ ├── polyglot_celebs_tr_0_1.json
│ │ ├── polyglot_celebs_tr_1.json
│ │ ├── polyglot_celebs_tr_2.json
│ │ └── polyglot_celebs_tr_3.json
│ └── top_500_universities
│ ├── top_500_universities_en_0.json
│ ├── top_500_universities_en_1.json
│ ├── top_500_universities_en_2.json
│ ├── top_500_universities_en_3.json
│ ├── top_500_universities_it_0.json
│ ├── top_500_universities_it_1.json
│ ├── top_500_universities_it_2.json
│ ├── top_500_universities_it_3.json
│ ├── top_500_universities_tr_0_0.json
│ ├── top_500_universities_tr_0_1.json
│ ├── top_500_universities_tr_1.json
│ ├── top_500_universities_tr_2.json
│ └── top_500_universities_tr_3.json
└── single_entity # Queries with a single correct answer.
├── atomic_numbers
│ ├── atomic_numbers_en_0.json
│ ├── atomic_numbers_en_1.json
│ ├── atomic_numbers_it_0.json
│ ├── atomic_numbers_it_1.json
│ ├── atomic_numbers_tr_0_0.json
│ ├── atomic_numbers_tr_0_1.json
│ └── atomic_numbers_tr_1.json
├── capital_cities
│ ├── capital_cities_en_0.json
│ ├── capital_cities_en_1.json
│ ├── capital_cities_it_0.json
│ ├── capital_cities_it_1.json
│ ├── capital_cities_tr_0_0.json
│ ├── capital_cities_tr_0_1.json
│ └── capital_cities_tr_1.json
├── ccTLD
│ ├── ccTLD_en_0.json
│ ├── ccTLD_en_1.json
│ ├── ccTLD_it_0.json
│ ├── ccTLD_it_1.json
│ ├── ccTLD_tr_0_0.json
│ ├── ccTLD_tr_0_1.json
│ └── ccTLD_tr_1.json
├── chemical_symbols
│ ├── chemical_symbols_en_0.json
│ ├── chemical_symbols_en_1.json
│ ├── chemical_symbols_it_0.json
│ ├── chemical_symbols_it_1.json
│ ├── chemical_symbols_tr_0_0.json
│ ├── chemical_symbols_tr_0_1.json
│ └── chemical_symbols_tr_1.json
├── founding_locations
│ ├── founding_locations_en_0.json
│ ├── founding_locations_en_1.json
│ ├── founding_locations_it_0.json
│ ├── founding_locations_it_1.json
│ ├── founding_locations_tr_0_0.json
│ ├── founding_locations_tr_0_1.json
│ └── founding_locations_tr_1.json
├── locations_of_sites
│ ├── locations_of_sites_en_0.json
│ ├── locations_of_sites_en_1.json
│ ├── locations_of_sites_it_0.json
│ ├── locations_of_sites_it_1.json
│ ├── locations_of_sites_tr_0_0.json
│ ├── locations_of_sites_tr_0_1.json
│ └── locations_of_sites_tr_1.json
├── original_langs_of_books
│ ├── original_langs_of_books_en_0.json
│ ├── original_langs_of_books_en_1.json
│ ├── original_langs_of_books_it_0.json
│ ├── original_langs_of_books_it_1.json
│ ├── original_langs_of_books_tr_0_0.json
│ ├── original_langs_of_books_tr_0_1.json
│ └── original_langs_of_books_tr_1.json
└── product_maker
├── product_maker_en_0.json
├── product_maker_en_1.json
├── product_maker_it_0.json
├── product_maker_it_1.json
├── product_maker_tr_0_0.json
├── product_maker_tr_0_1.json
└── product_maker_tr_1.json
If you use FIBER in your research, please cite:
@misc{munis2025fibermultilingualevaluationresource,
title = {FIBER: A Multilingual Evaluation Resource for Factual Inference Bias},
author = {Evren Ayberk Munis and Deniz Yılmaz and Arianna Muti and Çağrı Toraman},
year = {2025},
eprint = {2512.11110},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2512.11110},
}