DrugBench: A Data-Mining Pipeline for Generating CoT-driven LLM Benchmark in Drug Discovery

DrugBench is a comprehensive benchmark comprising more than 8000 CoT-driven multiple-choice questions (MCQs) derived from 50,000+ PubMed abstracts across the full life-cycle drug RD between 2020 and 2025.

This repository hosts the dataset, code, and pipeline for constructing DrugBench, a large-scale benchmark designed to evaluate large language models (LLMs) in the domain of drug discovery. The benchmark is built from expert-curated and PubMed-derived question–answer pairs, combining chain-of-thought (CoT) reasoning generation, clustering-based abstract selection, and iterative expert–LLM refinement to ensure both scientific rigor and domain relevance.

Upon acceptance of the related manuscript, all datasets and source code will be made publicly available here to support reproducibility, benchmarking, and further research in AI for drug discovery.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CoT_MCQ_Example		CoT_MCQ_Example
Enhanced_MCQ_Example		Enhanced_MCQ_Example
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DrugBench: A Data-Mining Pipeline for Generating CoT-driven LLM Benchmark in Drug Discovery

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DrugBench: A Data-Mining Pipeline for Generating CoT-driven LLM Benchmark in Drug Discovery

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages