Skip to content

VEO37/mmrag-strategy-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mmrag-strategy-bench

A lightweight benchmark to compare three multimodal RAG retrieval patterns on the same corpus.

Why this project

This repo distills ideas from small open-source multimodal RAG demos and turns them into a reproducible benchmark:

  • Strategy A (raw_multimodal): retrieve with text + table + image summary signals directly.
  • Strategy B (summary_first): convert all modalities into text-like summaries, then retrieve as pure text.
  • Strategy C (hybrid_rerank): summary-first coarse retrieval, then multimodal rerank.

This mirrors common industry choices when building practical multimodal RAG systems for PDFs/reports.

Highlights

  • Pure Python, no heavy model dependency.
  • Supports JSONL corpora with text/table/image_summary fields.
  • Built-in retrieval metrics: Recall@K and MRR.
  • CLI for search and evaluation.

Install

python -m venv .venv
source .venv/bin/activate
pip install -e .

Data format

Each line of corpus JSONL:

{
  "id": "doc-1",
  "source": "paper-A.pdf#p2",
  "text": "...",
  "table": "...",
  "image_summary": "...",
  "image_tags": ["chart", "accuracy"]
}

QA file (JSON list):

[
  {
    "question": "Which model gets the best OCR F1 in 2025 benchmark?",
    "gold_doc_ids": ["doc-2"]
  }
]

Quickstart

Run a search:

mmrag-bench search \
  --corpus data/sample_corpus.jsonl \
  --query "Which model has the best OCR F1?" \
  --strategy hybrid_rerank \
  --top-k 3

Run evaluation:

mmrag-bench eval \
  --corpus data/sample_corpus.jsonl \
  --qa data/sample_qa.json \
  --top-k 3

Example output

strategy=raw_multimodal recall@3=1.0000 mrr=1.0000
strategy=summary_first recall@3=1.0000 mrr=0.8333
strategy=hybrid_rerank recall@3=1.0000 mrr=1.0000

Repository layout

  • src/mmrag_strategy_bench/ core package
  • data/ sample corpus and sample QA set
  • tests/ sanity tests

Acknowledgement

Conceptually inspired by:

This implementation is original and intentionally lightweight for educational benchmarking.

License

MIT

About

Lightweight benchmark for multimodal RAG retrieval strategies

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages