A privacy-focused, local-first RAG (Retrieval-Augmented Generation) application for chatting with your documents and external knowledge sources using a local LLM. Upload PDFs, eBooks, Markdown, or text files and ask questions - all processing happens on your machine.
Connect to external knowledge via Model Context Protocol (MCP) servers like Microsoft Learn, security compliance frameworks, or export control regulations.
- 📚 Multi-Book Support: Upload up to 5 books per session (PDF/EPUB/MD/TXT/RST/HTML, up to 100MB each)
- 📄 Enhanced Parsing: Optional Docling integration for better PDF parsing and Office document support (DOCX/PPTX/XLSX)
- 🔌 MCP Integration: Connect to external knowledge sources via Model Context Protocol
- Aegis MCP: Security compliance frameworks (NIST 800-53, OWASP, DOE)
- Microsoft Learn MCP: Microsoft documentation and training content
- Export Control MCP: EAR, ITAR, sanctions screening, classification assistance
- 🔒 Privacy First: All data stays local - no cloud services or external APIs (MCP sources optional)
- 💬 Interactive Chat: Ask questions and get answers with source citations
- 🎯 Source Attribution: See exactly which book passages informed each answer
- 🤖 Model Switcher: Choose the best Ollama model for your content type
- 🚀 Modern Stack: FastAPI backend + Vue 3 frontend
- 🧠 Local LLM: Powered by Ollama (multiple models supported)
- 🔍 Vector Search: ChromaDB for semantic similarity search
- 📝 Session Management: Multiple users with isolated sessions
- 📖 Hierarchical Chunking: Optimized for large technical books
- 🔗 Contextual Retrieval: Neighboring chunks for better continuity
- Python 3.10+ with type hints
- FastAPI - Modern async web framework
- ChromaDB - Vector database for embeddings
- Ollama - Local LLM inference and embeddings
- sentence-transformers - Alternative CPU-based embeddings
- MCP SDK - Model Context Protocol for external knowledge sources
- pypdf & EbookLib - PDF/EPUB parsing
- Docling (optional) - Enhanced PDF parsing, Office docs, OCR
- Markdown, HTML, RST, TXT - Text format support
- Vue 3 with Composition API
- TypeScript - Type-safe frontend code
- Vite - Fast build tool with HMR
- Tailwind CSS - Utility-first styling
- Flat Structure - No layers, add abstractions only after Rule of Three
- TDD - Test business logic and edge cases
- Security by Design - OWASP guidelines built in from the start
- Python 3.10+
- Node.js 18+ and npm
- Ollama - Install from ollama.ai
- Hardware:
- 16GB+ RAM recommended
- GPU optional but recommended for faster inference
git clone <repository-url>
cd local-rag# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -e ".[dev]"
# Optional: Install enhanced parsing (adds ~2GB for Docling + dependencies)
pip install -e ".[enhanced]"cd frontend
npm install
cd ..Linux:
curl -fsSL https://ollama.ai/install.sh | shmacOS:
# Download from https://ollama.ai/download/mac
# Or use Homebrew
brew install ollamaWindows: Download the installer from https://ollama.ai/download/windows
ollama serveRecommended Setup:
# LLM Model - Best for RAG chat
ollama pull gemma3:12b
# Embedding Model - High quality semantic search
ollama pull mxbai-embed-largeAlternative Models:
# Good alternatives
ollama pull llama3.1:8b
ollama pull mistral:7b-instruct
# For code-heavy technical books
ollama pull codellama:7b
# Alternative embedding model (smaller)
ollama pull nomic-embed-textSee the Models Guide section below for detailed recommendations.
./start-servers.shThis script starts all services (Ollama, backend, frontend) and shows:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
Press Ctrl+C to stop all servers.
Backend:
source .venv/bin/activate
uvicorn src.api.main:app --reload --port 8000Frontend:
cd frontend
npm run dev- Upload Books: Drag and drop files (PDF, EPUB, MD, TXT, RST, HTML - max 100MB each, up to 5 books). With enhanced parsing enabled: also DOCX, PPTX, XLSX, and images (PNG/JPG/TIFF with OCR).
- Select Model: Click the model selector to choose the best LLM for your content
- Ask Questions: Type your question in the chat interface
- View Sources: Click on source citations to see the exact passages used
- Adjust Retrieval %: Control how much of the book to search (0.5-10%)
- 1-2%: Best for specific questions ("What is encapsulation?")
- 5-10%: Better for broad questions ("What is this book about?")
- Delete Books: Remove individual books or clear all with "Clear All"
local-rag/
├── src/
│ ├── models/ # Book, Chunk, Query, Response, exceptions
│ ├── parsers/ # PDF, EPUB, MD, TXT, HTML, RST + DocumentParser + chunker
│ ├── embeddings/ # Ollama, SentenceTransformer
│ ├── vectorstore/ # ChromaDB
│ ├── llm/ # Ollama client + prompt builder
│ ├── mcp/ # MCP client, adapters (Aegis, MSLearn), manager
│ ├── services/ # Ingestion, Query, Session
│ └── api/ # FastAPI (routes/, schemas/, middleware/)
├── tests/ # Mirrors src/ structure
├── frontend/
│ ├── src/
│ │ ├── components/ # Vue components
│ │ ├── composables/ # Reusable logic
│ │ └── types/ # TypeScript types
│ └── package.json
├── docs/ # Documentation
└── pyproject.toml # Python dependencies
# All tests
pytest
# Unit tests only
pytest tests/unit/ -v
# With coverage
pytest --cov=src --cov-report=term-missing# Linting
ruff check src/ tests/
# Type checking
mypy src/
# Security scan
bandit -r src/ -c pyproject.tomlGET /api/health- Health check (includes MCP source status)POST /api/books- Upload booksGET /api/books- List books in sessionDELETE /api/books/{book_id}- Delete a bookDELETE /api/books- Clear sessionPOST /api/chat- Send a queryPOST /api/chat/stream- Stream chat responses (SSE)
All endpoints require a session-id header for session isolation.
When sending a chat query, you can specify which knowledge source to use:
| Source | Description |
|---|---|
books |
Search only your uploaded books (default) |
compliance |
Search Aegis MCP (NIST, OWASP, DOE frameworks) |
mslearn |
Search Microsoft Learn documentation |
all |
Search all available sources |
For the best RAG experience, we recommend:
| Component | Model | Size | Why |
|---|---|---|---|
| LLM | gemma3:12b |
8.1GB | Excellent reasoning and context understanding |
| Embeddings | mxbai-embed-large |
669MB | High-quality 1024-dim embeddings |
| Model | Size | Best For | Notes |
|---|---|---|---|
| gemma3:12b | 8.1GB | General RAG (recommended) | Excellent reasoning, great with technical content |
| llama3.1:8b | 4.9GB | Balanced option | Great instruction following, good with context |
| mistral:7b-instruct | 4.4GB | Fast responses | Good quality, faster inference |
| codellama:7b | 3.8GB | Technical/code books | Specialized for programming content |
| deepseek-r1:14b | 9GB | Alternative | Deep reasoning, good with technical content |
| Model | Dimensions | Size | Notes |
|---|---|---|---|
| mxbai-embed-large | 1024 | 669MB | Best quality (recommended) |
| nomic-embed-text | 768 | 274MB | Good quality, smaller |
| all-MiniLM-L6-v2 | 384 | - | CPU-based, no Ollama needed |
Important: If you change embedding models, you must re-upload your books. Embeddings are not compatible across different dimensions.
# List installed models
ollama list
# Pull a model
ollama pull gemma3:12b
# Remove a model
ollama rm <model-name>- 8B models: 8GB RAM minimum, 16GB recommended, GPU helps significantly
- 12B models: 16GB RAM, benefits greatly from GPU
- 70B models: 64GB+ RAM, runs on CPU (slow) or needs 48GB+ VRAM
Configuration via environment variables or .env file:
# Paths
UPLOAD_DIR=./data/uploads
CHROMA_PERSIST_DIR=./data/chroma
# Limits
MAX_FILE_SIZE_MB=100
MAX_BOOKS_PER_SESSION=5
CHUNK_SIZE=512
CHUNK_OVERLAP=50
# Models (recommended settings)
EMBEDDING_MODEL=mxbai-embed-large
LLM_MODEL=gemma3:12b
OLLAMA_BASE_URL=http://localhost:11434
# RAG Settings
TOP_K_CHUNKS=5
NEIGHBOR_WINDOW=1
# Enhanced Parsing (requires pip install -e ".[enhanced]")
USE_DOCLING_PARSER=false
# MCP Integration (optional)
# Aegis MCP - Security compliance frameworks
AEGIS_MCP_TRANSPORT=http # 'http' or 'stdio'
AEGIS_MCP_URL=http://localhost:8765/mcp
# Microsoft Learn MCP
MSLEARN_MCP_ENABLED=true
MSLEARN_MCP_URL=https://learn.microsoft.com/api/mcp
# Export Control MCP - EAR, ITAR, sanctions screening
EXPORT_CONTROL_MCP_TRANSPORT=stdio # 'http' or 'stdio'
EXPORT_CONTROL_MCP_COMMAND=uv
EXPORT_CONTROL_MCP_ARGS=run python -m export_control_mcp.server
EXPORT_CONTROL_MCP_WORKING_DIR=/path/to/export-assist-mcpFor better document parsing quality and additional format support, you can enable Docling:
pip install -e ".[enhanced]"export USE_DOCLING_PARSER=trueOr add to your .env file:
USE_DOCLING_PARSER=true| Feature | Standard | With Docling |
|---|---|---|
| PDF parsing | Basic text extraction | Structure-aware with headings, tables |
| Table extraction | Often garbled | High-accuracy preservation |
| Chunking | Character-based | Token-based, semantic boundaries |
| DOCX support | Not available | Full support |
| PPTX support | Not available | Full support |
| XLSX support | Not available | Full support |
| Image OCR | Not available | PNG, JPG, TIFF with text extraction |
Standard (always available):
- PDF, EPUB, Markdown, TXT, RST, HTML
With Docling enabled:
- All standard formats (PDF uses enhanced parsing)
- Microsoft Office: DOCX, PPTX, XLSX
- Images with OCR: PNG, JPG, JPEG, TIFF
- Slower processing: Docling performs deeper analysis (first upload only)
- Larger install: Adds ~2GB for PyTorch and model dependencies
- Better quality: Significantly improved results for complex PDFs with tables, figures, and structured content
Note: Changing EMBEDDING_MODEL requires clearing ChromaDB (rm -rf ./data/chroma/*) and re-uploading books.
KnowledgeHub supports the Model Context Protocol (MCP) to connect to external knowledge sources alongside your local documents.
| Source | Description | Configuration |
|---|---|---|
| Aegis MCP | Security compliance (NIST 800-53, OWASP, DOE) | Requires local Aegis server |
| Microsoft Learn | Microsoft documentation and training | Public endpoint, no setup needed |
Microsoft Learn MCP provides access to Microsoft's documentation. To enable:
export MSLEARN_MCP_ENABLED=trueOr add to .env:
MSLEARN_MCP_ENABLED=trueThe frontend will automatically show "MS Learn" as a source option.
Aegis MCP provides security compliance frameworks. You'll need to run an Aegis MCP server:
# HTTP transport (recommended)
export AEGIS_MCP_TRANSPORT=http
export AEGIS_MCP_URL=http://localhost:8765/mcp
# Or stdio transport
export AEGIS_MCP_TRANSPORT=stdio
export AEGIS_MCP_COMMAND=aegis-mcpExport Control MCP provides access to export compliance regulations (EAR, ITAR), sanctions screening (OFAC, BIS), and classification assistance. See export-assist-mcp for setup.
# Clone and setup export-assist-mcp
git clone https://github.com/michaelalber/export-assist-mcp
cd export-assist-mcp
# Configure KnowledgeHub to use it
export EXPORT_CONTROL_MCP_TRANSPORT=stdio
export EXPORT_CONTROL_MCP_COMMAND=uv
export EXPORT_CONTROL_MCP_ARGS="run python -m export_control_mcp.server"
export EXPORT_CONTROL_MCP_WORKING_DIR=/path/to/export-assist-mcpThe frontend will automatically show "Export Control" as a source option when configured.
- Source Selection: Choose your knowledge source in the chat interface (Books, Compliance, MS Learn, Export Control, or All)
- Health Check: The
/api/healthendpoint reports which MCP sources are available - Combined Queries: Select "All" to search books and all configured MCP sources together
- Streaming: All sources support real-time streaming responses
┌──────────────────────────────────────────────────────┐
│ QueryService │
│ Routes queries based on source │
└─────────────────────────┬────────────────────────────┘
│
┌─────────────────────────▼────────────────────────────┐
│ MCPManager │
│ Registers and routes to adapters │
└────────┬──────────────────┬──────────────────┬───────┘
│ │ │
┌────────▼────────┐ ┌───────▼───────┐ ┌────────▼────────┐
│ AegisAdapter │ │MSLearnAdapter │ │ExportCtlAdapter │
│ (compliance) │ │ (mslearn) │ │ (export_control)│
└────────┬────────┘ └───────┬───────┘ └────────┬────────┘
│ │ │
┌────────▼──────────────────▼──────────────────▼───────┐
│ BaseMCPClient │
│ Handles stdio/HTTP transport │
└──────────────────────────────────────────────────────┘
- File type validation (PDF, EPUB, MD, TXT, RST, HTML; with Docling: DOCX, PPTX, XLSX, PNG, JPG, TIFF)
- MIME type verification via magic bytes (PDF:
%PDF, EPUB/Office:PK, images: format headers) or UTF-8 validation (text files) - File size limits (100MB default, configurable)
- Filename sanitization (path traversal prevention, special chars removed)
- Upload directory isolation
- CORS configuration for local development
- Session-based data isolation
- GPU Acceleration: Ollama will use GPU if available (significantly faster)
- Model Selection:
gemma3:12b: Recommended for RAG, excellent reasoningllama3.1:8b: Good balance of speed and qualitymistral:7b-instruct: Fastest inference
- Retrieval Percentage:
- 1-2%: Fast, focused answers for specific questions
- 5%: Good for topic summaries
- 10%: Comprehensive, best for "what is this book about" questions
- Embedding Model:
mxbai-embed-largeprovides better semantic matching than smaller models - Chunking: Hierarchical chunking with neighbor window provides better context continuity
Ollama Connection Error:
- Ensure Ollama is running:
ollama serve - Check the model is pulled:
ollama list - Verify Ollama is on port 11434:
curl http://localhost:11434/api/tags
Model Not Showing in Selector:
- Refresh the page after pulling new models
- Ensure Ollama service is running
- Check backend logs for Ollama connection errors
ChromaDB Issues:
- Delete
./data/chromadirectory to reset - Ensure sufficient disk space
- For large documents, ensure 5GB+ free space
Embedding Dimension Mismatch Error:
- This happens when you change embedding models after uploading books
- Clear ChromaDB:
rm -rf ./data/chroma/* - Clear browser session storage (F12 → Application → Session Storage → Clear)
- Re-upload your books
Large Document Upload Failures:
- Check file size limit in config (MAX_FILE_SIZE_MB)
- Ensure sufficient RAM for document processing
- Monitor backend logs for batch processing progress
Frontend API Errors:
- Verify backend is running on port 8000
- Check browser console for CORS issues
- Ensure session-id header is being sent
Slow Query Performance:
- Reduce Top-K value for faster queries
- Use smaller models (7B) for better speed
- Enable GPU acceleration in Ollama
- Consider reducing NEIGHBOR_WINDOW in config
MCP Source Not Available:
- Check
/api/healthendpoint for MCP source status - Verify environment variables are set correctly
- For Aegis: ensure the MCP server is running
- For MS Learn: check network connectivity
- Review backend logs for MCP connection errors
MIT License - see LICENSE file for details
This is a personal/educational project, but suggestions and feedback are welcome via issues.
- Built with FastAPI
- Powered by Ollama
- Vector storage by ChromaDB
- Embeddings by sentence-transformers
- External knowledge via Model Context Protocol