DocChat is a fully local, private, offline PDF chatbot built with RAG (Retrieval Augmented Generation). Upload one or more PDFs and ask questions — only the relevant parts of the document are sent to the model, not the whole thing.
No API keys. No data leaves your machine.
Instead of dumping the entire PDF into the prompt, DocChat uses a two-step RAG pipeline:
Indexing (on upload)
- Extract text from PDF using PyMuPDF
- Split text into overlapping chunks
- Embed each chunk into a vector using
sentence-transformers(runs locally on CPU) - Store chunks + vectors in ChromaDB
Retrieval (on each question)
- Embed the user's question using the same model
- Query ChromaDB for the 3 most semantically similar chunks
- Send only those chunks to Ollama as context
- Return the answer
This means DocChat works on large documents without overwhelming the model's context window.
- 100% local — no API keys, no internet required after setup
- Multi-document support — upload multiple PDFs and search across all of them
- RAG pipeline built from scratch — no LangChain, no n8n
- Source tracking — knows which chunk came from which file
- Model thinking display — see the model's reasoning process
- Clean, minimal dark UI
| Component | Library |
|---|---|
| UI | Gradio |
| PDF extraction | PyMuPDF |
| Chunking | plain Python |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
| Vector storage | ChromaDB |
| LLM inference | Ollama (qwen3.5:0.8b) |
-
Install requirements:
pip install -r requirements.txt
-
Start Ollama and pull the model:
ollama serve ollama pull qwen3.5:0.8b
-
Run the app:
python ui.py
-
Open
http://127.0.0.1:7860in your browser, upload a PDF, and start chatting.
docker build -t docchat .
docker run -p 7860:7860 --add-host=host.docker.internal:host-gateway docchatOllama must be running on your host machine.
| File | Purpose |
|---|---|
ui.py |
Gradio interface and app logic |
extract.py |
PDF extraction, chunking, embedding, ChromaDB, Ollama |
requirements.txt |
Python dependencies |
Dockerfile |
Container setup |
- Embedding runs on CPU — no GPU required
- Tested with
qwen3.5:0.8bbut any Ollama model works - ChromaDB stores in-memory per session — reloading the app clears the index