This is a Streamlit-based RAG app for chatting with a PDF.
The app reads a PDF, breaks it into sentence-based chunks, stores embeddings in FAISS, reranks the retrieved chunks, and then uses OpenRouter for the final answer. The goal is to keep the answers tied to the uploaded document instead of letting the model guess.
- extracts text from PDF files with
PyMuPDF - uses semantic chunking with overlap instead of tiny fixed chunks
- retrieves multiple relevant chunks instead of just one
- reranks the retrieved chunks before sending context to the model
- returns a grounded answer with citations and a confidence score
- refuses when there is not enough support in the document
flowchart TD
A["User uploads PDF"] --> B["PDF text extraction - PyMuPDF"]
B --> C["Sentence splitting and text cleanup"]
C --> D["Semantic chunking with overlap"]
D --> E["Embedding generation - BAAI/bge-small-en-v1.5"]
E --> F["FAISS vector index"]
G["User asks a question"] --> H["Query embedding"]
H --> I["Top-k retrieval from FAISS"]
F --> I
I --> J["Cross-encoder reranking - ms-marco-MiniLM-L-6-v2"]
J --> K["Best context chunks selected"]
K --> L["LLM synthesis - OpenRouter + Qwen"]
G --> L
L --> M["Structured JSON output"]
M --> N["Answer"]
M --> O["Citations"]
M --> P["Confidence score"]
M --> Q["Refusal if evidence is weak"]
- Streamlit
- FAISS
BAAI/bge-small-en-v1.5for embeddingscross-encoder/ms-marco-MiniLM-L-6-v2for reranking- OpenRouter with
qwen/qwen3.6-plus:freeby default
pip install -r requirements.txt
streamlit run app.pySet your API key before starting the app, or paste it in the sidebar after launch.
Windows PowerShell:
$env:OPENROUTER_API_KEY="your-key"
$env:OPENROUTER_MODEL="qwen/qwen3.6-plus:free"
streamlit run app.pymacOS / Linux:
export OPENROUTER_API_KEY="your-key"
export OPENROUTER_MODEL="qwen/qwen3.6-plus:free"
streamlit run app.py- Read the PDF
- Split the text into sentence spans
- Build semantic chunks with overlap
- Embed the chunks and store them in FAISS
- Retrieve top-k chunks for the user query
- Rerank those chunks with a cross-encoder
- Send the best chunks to OpenRouter for the final answer
- Show the answer, citations, and retrieval diagnostics
- The OpenRouter call is only used for final answer generation.
- Retrieval, chunking, and reranking all run locally.