A local Retrieval-Augmented Generation (RAG) application that lets you chat with PDF documents using a fully local LLM stack — no cloud API keys required.
PDF files → Chunked → Embedded (MiniLM) → FAISS vector store
User question → Embed → Similarity search → Top-k chunks → LLM (Ollama/Llama3) → Answer
Conversation history is preserved per session using LangChain's RunnableWithMessageHistory, so the model maintains context across turns.
- Chat with one or more PDF documents via a Streamlit web UI or CLI
- Fully local — LLM runs via Ollama, embeddings via HuggingFace sentence-transformers
- Duplicate detection — new documents are only indexed if not already in the vector store
- Session memory — conversation history persisted to JSON and reloaded on next run
- File upload — drag-and-drop PDFs directly in the web UI
rag-ollama/
├── app.py # CLI entry point
├── webui.py # Streamlit web UI entry point
├── requirements.txt
├── data/ # Place PDF files here
├── db/ # FAISS vector store (auto-created, gitignored)
├── sessions/ # Session JSON history (auto-created, gitignored)
└── helpers/
├── __init__.py
├── chain_handler.py # LangChain RAG chain setup
├── docs_db_handler.py # FAISS init, load, dedup logic
├── embedder.py # HuggingFace embeddings wrapper
├── indexer.py # PDF loading + text splitting
├── retriever.py # Vector similarity retrieval
└── session_handler.py # Session history load/save
- Python 3.9+
- Ollama installed and running
- Llama 3 model pulled via Ollama
# 1. Clone
git clone https://github.com/techanvconsulting/rag-ollama.git
cd rag-ollama
# 2. Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Pull the LLM model via Ollama
ollama pull llama3streamlit run webui.pyOpen http://localhost:8501. Upload PDFs via the sidebar or drop them in data/ beforehand.
python app.pyType your question at the prompt. Type exit to quit.
| Setting | Location | Default |
|---|---|---|
| LLM model | helpers/chain_handler.py |
llama3 |
| Embedding model | app.py / webui.py |
sentence-transformers/all-MiniLM-L12-v2 |
| Chunk size | helpers/indexer.py |
1000 chars, 80 overlap |
| Retrieved docs (k) | app.py / webui.py |
5 |
| Ollama base URL | helpers/chain_handler.py |
http://127.0.0.1:11434 |
To swap the LLM, change the model name in helpers/chain_handler.py:
llm = ChatOllama(model="mistral", base_url="http://127.0.0.1:11434", keep_alive=-1)Any model available via ollama list works.
| Package | Purpose |
|---|---|
langchain-community |
LangChain integrations (Ollama, FAISS, loaders) |
langchain-huggingface |
HuggingFace embeddings |
faiss-cpu |
Local vector store |
sentence-transformers |
Embedding model |
streamlit |
Web UI |
pypdf |
PDF parsing |
langchainhub |
Prompt hub access |
| Resource | Link |
|---|---|
| Ollama | ollama.com |
| Ollama docs | docs.ollama.com |
| LangChain Python | docs.langchain.com |
| LangChain FAISS integration | python.langchain.com/docs/integrations/vectorstores/faiss |
| LangChain ChatOllama | python.langchain.com/docs/integrations/chat/ollama |
| RunnableWithMessageHistory | python.langchain.com/docs/how_to/message_history |
| FAISS (Facebook Research) | github.com/facebookresearch/faiss |
| all-MiniLM-L12-v2 model card | huggingface.co/sentence-transformers/all-MiniLM-L12-v2 |
| Streamlit docs | docs.streamlit.io |
| pypdf docs | pypdf.readthedocs.io |
connection refused on Ollama — Start the server first: ollama serve
Empty / "I don't know" answers — Retrieved chunks may not contain relevant content. Add more PDFs or reduce chunk size in helpers/indexer.py.
ModuleNotFoundError — Always run from the project root (rag-ollama/), not from inside helpers/.
Slow first run — all-MiniLM-L12-v2 downloads ~120 MB from HuggingFace on first use.
- Streamlit web UI
- Conversation memory (per-session JSON)
- Duplicate document detection
- Proper Python package structure (
helpers/as package) - Support for
.txt,.md,.docxfiles - Model selector in UI
- Source citation in answers
Contributions welcome. Fork the repo, create a branch, and open a pull request. For larger changes, open an issue first.
Open an issue at github.com/techanvconsulting/rag-ollama.