Skip to content

techanvconsulting/rag-ollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG App with Ollama + LangChain + FAISS

A local Retrieval-Augmented Generation (RAG) application that lets you chat with PDF documents using a fully local LLM stack — no cloud API keys required.

How It Works

PDF files → Chunked → Embedded (MiniLM) → FAISS vector store
User question → Embed → Similarity search → Top-k chunks → LLM (Ollama/Llama3) → Answer

Conversation history is preserved per session using LangChain's RunnableWithMessageHistory, so the model maintains context across turns.

Features

  • Chat with one or more PDF documents via a Streamlit web UI or CLI
  • Fully local — LLM runs via Ollama, embeddings via HuggingFace sentence-transformers
  • Duplicate detection — new documents are only indexed if not already in the vector store
  • Session memory — conversation history persisted to JSON and reloaded on next run
  • File upload — drag-and-drop PDFs directly in the web UI

Project Structure

rag-ollama/
├── app.py                  # CLI entry point
├── webui.py                # Streamlit web UI entry point
├── requirements.txt
├── data/                   # Place PDF files here
├── db/                     # FAISS vector store (auto-created, gitignored)
├── sessions/               # Session JSON history (auto-created, gitignored)
└── helpers/
    ├── __init__.py
    ├── chain_handler.py    # LangChain RAG chain setup
    ├── docs_db_handler.py  # FAISS init, load, dedup logic
    ├── embedder.py         # HuggingFace embeddings wrapper
    ├── indexer.py          # PDF loading + text splitting
    ├── retriever.py        # Vector similarity retrieval
    └── session_handler.py  # Session history load/save

Prerequisites

  • Python 3.9+
  • Ollama installed and running
  • Llama 3 model pulled via Ollama

Installation

# 1. Clone
git clone https://github.com/techanvconsulting/rag-ollama.git
cd rag-ollama

# 2. Create and activate virtual environment
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Pull the LLM model via Ollama
ollama pull llama3

Running

Web UI (recommended)

streamlit run webui.py

Open http://localhost:8501. Upload PDFs via the sidebar or drop them in data/ beforehand.

CLI

python app.py

Type your question at the prompt. Type exit to quit.

Configuration

Setting Location Default
LLM model helpers/chain_handler.py llama3
Embedding model app.py / webui.py sentence-transformers/all-MiniLM-L12-v2
Chunk size helpers/indexer.py 1000 chars, 80 overlap
Retrieved docs (k) app.py / webui.py 5
Ollama base URL helpers/chain_handler.py http://127.0.0.1:11434

To swap the LLM, change the model name in helpers/chain_handler.py:

llm = ChatOllama(model="mistral", base_url="http://127.0.0.1:11434", keep_alive=-1)

Any model available via ollama list works.

Dependencies

Package Purpose
langchain-community LangChain integrations (Ollama, FAISS, loaders)
langchain-huggingface HuggingFace embeddings
faiss-cpu Local vector store
sentence-transformers Embedding model
streamlit Web UI
pypdf PDF parsing
langchainhub Prompt hub access

Reference Documentation

Resource Link
Ollama ollama.com
Ollama docs docs.ollama.com
LangChain Python docs.langchain.com
LangChain FAISS integration python.langchain.com/docs/integrations/vectorstores/faiss
LangChain ChatOllama python.langchain.com/docs/integrations/chat/ollama
RunnableWithMessageHistory python.langchain.com/docs/how_to/message_history
FAISS (Facebook Research) github.com/facebookresearch/faiss
all-MiniLM-L12-v2 model card huggingface.co/sentence-transformers/all-MiniLM-L12-v2
Streamlit docs docs.streamlit.io
pypdf docs pypdf.readthedocs.io

Troubleshooting

connection refused on Ollama — Start the server first: ollama serve

Empty / "I don't know" answers — Retrieved chunks may not contain relevant content. Add more PDFs or reduce chunk size in helpers/indexer.py.

ModuleNotFoundError — Always run from the project root (rag-ollama/), not from inside helpers/.

Slow first runall-MiniLM-L12-v2 downloads ~120 MB from HuggingFace on first use.

Roadmap

  • Streamlit web UI
  • Conversation memory (per-session JSON)
  • Duplicate document detection
  • Proper Python package structure (helpers/ as package)
  • Support for .txt, .md, .docx files
  • Model selector in UI
  • Source citation in answers

Contributing

Contributions welcome. Fork the repo, create a branch, and open a pull request. For larger changes, open an issue first.

Contact

Open an issue at github.com/techanvconsulting/rag-ollama.

About

Local RAG app — chat with PDFs using Ollama, LangChain, FAISS & Streamlit. No cloud APIs.

Topics

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors