LocalBioRAG

Local biomedical retrieval and grounded question answering over PubMed.

Companion repository for the paper: Integrating AI and IR Paradigms for Sustainable and Trustworthy Accurate Access to Large Scale Biomedical Information — presented at ECIR 2026, Delft, The Netherlands.

Overview

Accessing reliable biomedical information at scale is a critical challenge. Researchers and clinicians need precise, evidence-grounded answers to their questions, drawn from the vast and continuously growing body of published literature. General-purpose LLMs can hallucinate facts; purely keyword-based search engines return documents but do not extract or synthesise the answer.

LocalBioRAG bridges this gap. It is a fully local, end-to-end Retrieval-Augmented Generation (RAG) system designed for biomedical question answering over the entire PubMed corpus (~38 million abstracts). Unlike cloud-based solutions, the complete pipeline — retrieval, passage extraction, and answer generation — runs on your own infrastructure with no external API calls, ensuring data privacy, reproducibility, and full control over every component.

What This Repository Provides

What	Why it matters
Complete application code	A working Flask web app that you can deploy on a GPU server and use immediately.
Hybrid retrieval pipeline	Combines BM25 sparse search with BGE-M3 dense re-ranking for high-recall, high-precision document retrieval.
LLM-based snippet extraction	A fine-tuned LoRA adapter on Llama 3.1 8B extracts exact supporting passages from each retrieved abstract.
Evidence-grounded answer generation	A second LoRA adapter generates concise answers using a context-windowing strategy that preserves snippet provenance.
Indexing guidance	Step-by-step instructions to reproduce the full PubMed index in Qdrant, so you can rebuild the system from scratch.
Web UI	A clean, responsive interface with example queries, Excel export, and a loading experience designed for live demos.

Architecture

The pipeline consists of three stages executed sequentially for each user query:

Stage 1 — Hybrid Retrieval

The user query is sent to Qdrant which performs a BM25 sparse search and returns the top 100 candidate documents.
Each candidate already carries a pre-computed BGE-M3 dense vector. The query is encoded with the same model and cosine similarity is computed against all 100 candidates.
A hybrid score (BM25_normalised × cosine_similarity) is calculated. The top 10 documents are selected.

Stage 2 — Snippet Extraction

For each of the 10 documents, both title and abstract are passed to Llama 3.1 8B Instruct equipped with a LoRA adapter fine-tuned for snippet extraction (trained on BioASQ data).
The model returns exact text spans enclosed in [BS] / [ES] tags. Documents without any extracted snippet are filtered out.

Stage 3 — Answer Generation (on demand)

The user can request an evidence-grounded answer. The top 3 documents are formatted using a context-windowing strategy: for each snippet, up to 20 words before and 10 words after are kept from the original abstract.
A second LoRA adapter (multi-task, also trained on BioASQ) generates a concise (≤ 200 words) answer grounded in the extracted evidence.

Data: PubMed Processing and Qdrant Indexing

To reproduce the full system you need to index PubMed into a Qdrant collection. Below are the detailed steps.

1. Download PubMed

PubMed distributes its complete set of abstracts as XML files via the NLM FTP server:

Baseline files (annual full snapshot): https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/
Update files (daily incremental): https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/
Documentation: https://pubmed.ncbi.nlm.nih.gov/download/

Download all .xml.gz files from the baseline directory. Each file contains thousands of <PubmedArticle> records.

2. Parse and extract fields

From each record, extract the following fields:

Field name	Source in XML	Description
`pubmed_id`	`<PMID>`	Unique PubMed identifier
`article_title`	`<ArticleTitle>`	Title of the article
`abstract_text`	`<AbstractText>`	Full abstract (concatenate all `<AbstractText>` elements)
`year`	`<PubDate><Year>`	Publication year

3. Compute dense embeddings

Title and abstract must be encoded separately with BGE-M3 and then combined into a single 1024-dimensional vector using a weighted average (0.2 × title + 0.8 × abstract):

import numpy as np
from FlagEmbedding import BGEM3FlagModel

model = BGEM3FlagModel("BAAI/bge-m3", use_fp16=True)

title_emb = model.encode([article_title], return_dense=True,
                         return_sparse=False, return_colbert_vecs=False)
abstract_emb = model.encode([abstract_text], return_dense=True,
                            return_sparse=False, return_colbert_vecs=False)

dense_vector = (
    0.2 * np.array(title_emb["dense_vecs"][0])
  + 0.8 * np.array(abstract_emb["dense_vecs"][0])
).tolist()   # list of 1024 floats

Why a weighted average? Biomedical abstracts carry the bulk of the factual content, while titles are shorter and more general. The 80/20 split reflects this asymmetry and was found to yield better retrieval quality on BioASQ evaluation data.

4. Create the Qdrant collection

Install Qdrant following the official documentation. A Docker deployment is the easiest path:

docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant

Create a collection with the required vector configuration. The collection must be named pubmed_BGE (or update the COLLECTION_NAME constant in retrieval_logic.py):

from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient(host="localhost", port=6333)

client.create_collection(
    collection_name="pubmed_BGE",
    vectors_config={
        "dense_vector_BGE": models.VectorParams(
            size=1024,
            distance=models.Distance.COSINE,
            on_disk=True,            # recommended for ~38M vectors
        ),
    },
    sparse_vectors_config={
        "sparse_vector": models.SparseVectorParams(
            modifier=models.Modifier.IDF,
        ),
    },
)

5. Index documents

Upload each record as a Qdrant point containing:

The pre-computed dense vector under "dense_vector_BGE".
A sparse BM25 vector under "sparse_vector", computed server-side by Qdrant. Pass the concatenated title + abstract as a models.Document with model="Qdrant/bm25". The avg_len option sets the average document length (in characters) used by BM25 length normalisation; the value 792.69 was measured on the full PubMed corpus.
The payload fields needed by the application.

import uuid

AVG_DOCUMENT_LENGTH = 792.69   # avg chars of title + abstract across PubMed

full_text = f"{article_title} {abstract_text}"

client.upsert(
    collection_name="pubmed_BGE",
    points=[
        models.PointStruct(
            id=str(uuid.uuid4()),
            vector={
                "dense_vector_BGE": dense_vector,
                "sparse_vector": models.Document(
                    text=full_text,
                    model="Qdrant/bm25",
                    options={"avg_len": AVG_DOCUMENT_LENGTH},
                ),
            },
            payload={
                "pubmed_id": pubmed_id,
                "article_title": article_title,
                "abstract_text": abstract_text,
                "year": year,
            },
        )
    ],
)

Tip: Process files in batches (e.g. 100 points per upsert call) and keep a checkpoint of processed files to allow resuming after interruptions. The full PubMed baseline (~38 M records) can take several hours to index.

Summary of required Qdrant schema:

Type Name Details

Named dense vector dense_vector_BGE 1024-dim, cosine distance, on-disk (BGE-M3 weighted avg)

Named sparse vector sparse_vector BM25 with IDF modifier (avg_len = 792.69)

Payload field pubmed_id string

Payload field article_title string

Payload field abstract_text string

Payload field year string

Models

Base model

Model	Link
Llama 3.1 8B Instruct	unsloth/Llama-3.1-8B-Instruct

Embedding model

Model	Link
BGE-M3	BAAI/bge-m3

LoRA adapters (fine-tuned on BioASQ)

Adapter	Purpose	Link
Snippet Extraction LoRA	Extracts relevant passages from titles and abstracts	sag-uniroma2/bioasq-snippet-extraction-lora
Answer Generation LoRA	Generates evidence-grounded answers from extracted passages	sag-uniroma2/bioasq-answer-generation-lora

Note: Replace the placeholder links above with the actual Hugging Face repository URLs once the adapters are published.

Installation and Setup

Prerequisites

Python 3.10+
A CUDA-capable GPU (recommended: ≥ 24 GB VRAM for Llama 3.1 8B with LoRA)
A running Qdrant instance with the indexed PubMed collection (see above)

Install dependencies

git clone https://github.com/your-org/LocalBioRAG.git
cd LocalBioRAG
pip install -r requirements.txt

Configure paths

Open retrieval_logic.py and update the configuration block at the top of the file:

QDRANT_HOST = "localhost"                          # your Qdrant server
QDRANT_PORT = 6333
MODEL_PATH  = "/path/to/Llama-3.1-8B-Instruct"    # or HuggingFace repo ID
LORA_PATH_EXTR = "/path/to/snippet-extraction-lora"
LORA_PATH_ANSW = "/path/to/answer-generation-lora"
EMBEDDING_MODEL_PATH = "/path/to/bge-m3"

(Optional) Set authentication credentials

export APP_USERNAME="your_username"
export APP_PASSWORD="your_password"

If not set, the defaults (sagdemo / Demo2026!) are used.

Running the Application

python app.py

The server starts on http://0.0.0.0:5000. Open it in your browser and authenticate with the configured credentials.

Demo

demo1.mp4

demo2.mp4

Repository Structure

LocalBioRAG/
├── app.py                  # Flask web server (routes, auth)
├── retrieval_logic.py      # Hybrid retrieval, snippet extraction, answer generation
├── requirements.txt        # Python dependencies
├── templates/
│   └── index.html          # Main UI template
├── static/
│   ├── style.css           # Stylesheet
│   └── script.js           # Client-side logic (search, modal, Excel export)
├── assets/
│   └── architecture.png    # Architecture diagram
└── README.md               # This file

Citation

If you use this code or system in your research, please cite:

@InProceedings{10.1007/978-3-032-21324-2_31,
author="Borazio, Federico
and Labbate, Francesco
and Croce, Danilo
and Basili, Roberto",
editor="Campos, Ricardo
and Jatowt, Adam
and Lan, Yanyan
and Aliannejadi, Mohammad
and Bauer, Christine
and MacAvaney, Sean
and Anand, Avishek
and Ren, Zhaochun
and Verberne, Suzan
and Bai, Nan
and Mansoury, Masoud",
title="Integrating AI and IR Paradigms for Sustainable and Trustworthy Accurate Access to Large Scale Biomedical Information",
booktitle="Advances in Information Retrieval",
year="2026",
publisher="Springer Nature Switzerland",
address="Cham",
pages="398--412",
isbn="978-3-032-21324-2"
}

License

This project is released under the MIT License.

Developed by SAG · University of Rome Tor Vergata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalBioRAG

Table of Contents

Overview

What This Repository Provides

Architecture

Stage 1 — Hybrid Retrieval

Stage 2 — Snippet Extraction

Stage 3 — Answer Generation (on demand)

Data: PubMed Processing and Qdrant Indexing

1. Download PubMed

2. Parse and extract fields

3. Compute dense embeddings

4. Create the Qdrant collection

5. Index documents

Models

Base model

Embedding model

LoRA adapters (fine-tuned on BioASQ)

Installation and Setup

Prerequisites

Install dependencies

Configure paths

(Optional) Set authentication credentials

Running the Application

Demo

Repository Structure

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
static		static
templates		templates
.DS_Store		.DS_Store
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
retrieval_logic.py		retrieval_logic.py

Type	Name	Details
Named dense vector	`dense_vector_BGE`	1024-dim, cosine distance, on-disk (BGE-M3 weighted avg)
Named sparse vector	`sparse_vector`	BM25 with IDF modifier (`avg_len` = 792.69)
Payload field	`pubmed_id`	string
Payload field	`article_title`	string
Payload field	`abstract_text`	string
Payload field	`year`	string

Folders and files

Latest commit

History

Repository files navigation

LocalBioRAG

Table of Contents

Overview

What This Repository Provides

Architecture

Stage 1 — Hybrid Retrieval

Stage 2 — Snippet Extraction

Stage 3 — Answer Generation (on demand)

Data: PubMed Processing and Qdrant Indexing

1. Download PubMed

2. Parse and extract fields

3. Compute dense embeddings

4. Create the Qdrant collection

5. Index documents

Models

Base model

Embedding model

LoRA adapters (fine-tuned on BioASQ)

Installation and Setup

Prerequisites

Install dependencies

Configure paths

(Optional) Set authentication credentials

Running the Application

Demo

Repository Structure

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages