-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Labels
Description
When duplicated rows are present in qrel data, the ideal ranking of nDCG is severely overestimated, since the duplicates are treated as separate docs (and thus positive scores) when sorting. This inflates the iDCG score and causes incorrect results. MWE with TREC-DL20 data attached.
import trectools
import ir_datasets
qrels_df = (
pd.DataFrame(ir_datasets.load("msmarco-passage/trec-dl-2020/judged").qrels_iter())
.loc[:, ["query_id", "doc_id", "relevance"]]
.rename({"query_id": "query", "doc_id": "docid", "relevance": "rel"}, axis="columns")
)
qrels = trectools.TrecQrel()
qrels.qrels_data = qrels_df
run = trectools.TrecRun("../data/external/dl20-passages-runs/input.1.gz") # replace with your local path if needed; randomly chosen run
print("Normal Qrels: ", trectools.TrecEval(run, qrels).get_ndcg())
qrels.qrels_data = pd.concat([qrels_df, qrels_df])
print("Duplicated Qrels: ", trectools.TrecEval(runs[0], qrels).get_ndcg())Normal Qrels: 0.18245952802407694
Duplicated Qrels: 0.006991538267262153