PDF RAG Chatbot

This is a Streamlit-based RAG app for chatting with a PDF.

The app reads a PDF, breaks it into sentence-based chunks, stores embeddings in FAISS, reranks the retrieved chunks, and then uses OpenRouter for the final answer. The goal is to keep the answers tied to the uploaded document instead of letting the model guess.

What it does

extracts text from PDF files with PyMuPDF
uses semantic chunking with overlap instead of tiny fixed chunks
retrieves multiple relevant chunks instead of just one
reranks the retrieved chunks before sending context to the model
returns a grounded answer with citations and a confidence score
refuses when there is not enough support in the document

Pipeline Flow

flowchart TD
    A["User uploads PDF"] --> B["PDF text extraction - PyMuPDF"]
    B --> C["Sentence splitting and text cleanup"]
    C --> D["Semantic chunking with overlap"]
    D --> E["Embedding generation - BAAI/bge-small-en-v1.5"]
    E --> F["FAISS vector index"]

    G["User asks a question"] --> H["Query embedding"]
    H --> I["Top-k retrieval from FAISS"]
    F --> I

    I --> J["Cross-encoder reranking - ms-marco-MiniLM-L-6-v2"]
    J --> K["Best context chunks selected"]

    K --> L["LLM synthesis - OpenRouter + Qwen"]
    G --> L

    L --> M["Structured JSON output"]
    M --> N["Answer"]
    M --> O["Citations"]
    M --> P["Confidence score"]
    M --> Q["Refusal if evidence is weak"]

Stack

Streamlit
FAISS
BAAI/bge-small-en-v1.5 for embeddings
cross-encoder/ms-marco-MiniLM-L-6-v2 for reranking
OpenRouter with qwen/qwen3.6-plus:free by default

Run locally

pip install -r requirements.txt
streamlit run app.py

Set your API key before starting the app, or paste it in the sidebar after launch.

Windows PowerShell:

$env:OPENROUTER_API_KEY="your-key"
$env:OPENROUTER_MODEL="qwen/qwen3.6-plus:free"
streamlit run app.py

macOS / Linux:

export OPENROUTER_API_KEY="your-key"
export OPENROUTER_MODEL="qwen/qwen3.6-plus:free"
streamlit run app.py

How the pipeline works

Read the PDF
Split the text into sentence spans
Build semantic chunks with overlap
Embed the chunks and store them in FAISS
Retrieve top-k chunks for the user query
Rerank those chunks with a cross-encoder
Send the best chunks to OpenRouter for the final answer
Show the answer, citations, and retrieval diagnostics

Notes

The OpenRouter call is only used for final answer generation.
Retrieval, chunking, and reranking all run locally.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
rag		rag
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF RAG Chatbot

What it does

Pipeline Flow

Stack

Run locally

How the pipeline works

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF RAG Chatbot

What it does

Pipeline Flow

Stack

Run locally

How the pipeline works

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages