Skip to content

Duplicates in Qrels yield incorrect nDCG scores #52

@lgienapp

Description

@lgienapp

When duplicated rows are present in qrel data, the ideal ranking of nDCG is severely overestimated, since the duplicates are treated as separate docs (and thus positive scores) when sorting. This inflates the iDCG score and causes incorrect results. MWE with TREC-DL20 data attached.

import trectools
import ir_datasets

qrels_df = (
    pd.DataFrame(ir_datasets.load("msmarco-passage/trec-dl-2020/judged").qrels_iter())
    .loc[:, ["query_id", "doc_id", "relevance"]]
    .rename({"query_id": "query", "doc_id": "docid", "relevance": "rel"}, axis="columns")
)

qrels = trectools.TrecQrel()
qrels.qrels_data = qrels_df

run = trectools.TrecRun("../data/external/dl20-passages-runs/input.1.gz") # replace with your local path if needed; randomly chosen run

print("Normal Qrels: ", trectools.TrecEval(run, qrels).get_ndcg())

qrels.qrels_data = pd.concat([qrels_df, qrels_df])

print("Duplicated Qrels: ", trectools.TrecEval(runs[0], qrels).get_ndcg())

Normal Qrels: 0.18245952802407694
Duplicated Qrels: 0.006991538267262153

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions