Skip to content

Inline090/RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF RAG Chatbot

This is a Streamlit-based RAG app for chatting with a PDF.

The app reads a PDF, breaks it into sentence-based chunks, stores embeddings in FAISS, reranks the retrieved chunks, and then uses OpenRouter for the final answer. The goal is to keep the answers tied to the uploaded document instead of letting the model guess.

What it does

  • extracts text from PDF files with PyMuPDF
  • uses semantic chunking with overlap instead of tiny fixed chunks
  • retrieves multiple relevant chunks instead of just one
  • reranks the retrieved chunks before sending context to the model
  • returns a grounded answer with citations and a confidence score
  • refuses when there is not enough support in the document

Pipeline Flow

flowchart TD
    A["User uploads PDF"] --> B["PDF text extraction - PyMuPDF"]
    B --> C["Sentence splitting and text cleanup"]
    C --> D["Semantic chunking with overlap"]
    D --> E["Embedding generation - BAAI/bge-small-en-v1.5"]
    E --> F["FAISS vector index"]

    G["User asks a question"] --> H["Query embedding"]
    H --> I["Top-k retrieval from FAISS"]
    F --> I

    I --> J["Cross-encoder reranking - ms-marco-MiniLM-L-6-v2"]
    J --> K["Best context chunks selected"]

    K --> L["LLM synthesis - OpenRouter + Qwen"]
    G --> L

    L --> M["Structured JSON output"]
    M --> N["Answer"]
    M --> O["Citations"]
    M --> P["Confidence score"]
    M --> Q["Refusal if evidence is weak"]

Loading

Stack

  • Streamlit
  • FAISS
  • BAAI/bge-small-en-v1.5 for embeddings
  • cross-encoder/ms-marco-MiniLM-L-6-v2 for reranking
  • OpenRouter with qwen/qwen3.6-plus:free by default

Run locally

pip install -r requirements.txt
streamlit run app.py

Set your API key before starting the app, or paste it in the sidebar after launch.

Windows PowerShell:

$env:OPENROUTER_API_KEY="your-key"
$env:OPENROUTER_MODEL="qwen/qwen3.6-plus:free"
streamlit run app.py

macOS / Linux:

export OPENROUTER_API_KEY="your-key"
export OPENROUTER_MODEL="qwen/qwen3.6-plus:free"
streamlit run app.py

How the pipeline works

  1. Read the PDF
  2. Split the text into sentence spans
  3. Build semantic chunks with overlap
  4. Embed the chunks and store them in FAISS
  5. Retrieve top-k chunks for the user query
  6. Rerank those chunks with a cross-encoder
  7. Send the best chunks to OpenRouter for the final answer
  8. Show the answer, citations, and retrieval diagnostics

Notes

  • The OpenRouter call is only used for final answer generation.
  • Retrieval, chunking, and reranking all run locally.

About

A grounded PDF RAG chatbot built with Streamlit, FAISS, semantic chunking, reranking, and OpenRouter for citation-based question answering.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages