DrugBench is a comprehensive benchmark comprising more than 8000 CoT-driven multiple-choice questions (MCQs) derived from 50,000+ PubMed abstracts across the full life-cycle drug RD between 2020 and 2025.
This repository hosts the dataset, code, and pipeline for constructing DrugBench, a large-scale benchmark designed to evaluate large language models (LLMs) in the domain of drug discovery. The benchmark is built from expert-curated and PubMed-derived question–answer pairs, combining chain-of-thought (CoT) reasoning generation, clustering-based abstract selection, and iterative expert–LLM refinement to ensure both scientific rigor and domain relevance.
Upon acceptance of the related manuscript, all datasets and source code will be made publicly available here to support reproducibility, benchmarking, and further research in AI for drug discovery.