local-rag

A local RAG system that indexes documents from filesystem folders, builds a hybrid search index (dense vectors + keyword), and exposes retrieval via MCP to Claude Desktop and Claude Code. Everything runs locally -- a single Python process plus a Qdrant Docker container. No cloud infrastructure required.

Features

Hybrid search -- dense vector (BGE-M3) + keyword search with RRF fusion
ONNX cross-encoder reranking for high-precision results
MCP integration -- works as a tool for Claude Desktop and Claude Code, with server instructions, enriched tool descriptions, and built-in prompts (research, discover, catch-up)
Multi-format parsing -- PDF, DOCX, TXT, MD via Docling with OCR support
Filesystem watching -- automatic re-indexing when documents change
Document summarization -- shells out to any LLM CLI tool (claude, etc.)
Semantic chunking (opt-in) -- sentence-embedding boundary detection using simplified Max-Min algorithm with BGE-M3, as an alternative to fixed-size chunking
Auto-generated questions -- LLM generates 3 questions per chunk at index time, enriching dense vectors and keyword search for better recall
Fully local -- all models run on-device, no API keys needed for core search

Prerequisites

Python 3.11+
Docker (for Qdrant)
4GB+ RAM for embedding and reranker models
macOS or Linux

Quick Start

# 1. Clone and install
git clone https://github.com/dgourlay/local-rag.git
cd local-rag
make setup           # creates venv, installs deps, starts Qdrant

# 2. Activate venv and configure
source .venv/bin/activate
rag init             # interactive wizard: pick folders, detect LLM CLI

# 3. Index and search
rag index            # scans folders, parses, chunks, embeds (downloads models on first run)
rag search "your query here"

That's it. Models (BGE-M3 embeddings ~1.5GB, BGE reranker ~1.2GB) download automatically on first use.

To pre-download models so the first search is instant:

make download-models

MCP Integration

The main use case is as an MCP tool server so your LLM can search your documents.

Claude Code

Option A -- auto-install:

rag mcp-config --install claude-code

Option B -- manual: Add this to your ~/.claude.json (or project-level .mcp.json), replacing the Python path with your venv path:

{
  "mcpServers": {
    "local-rag": {
      "command": "/path/to/local-rag/.venv/bin/python",
      "args": ["-m", "rag.cli", "serve"]
    }
  }
}

To find your exact Python path, run rag mcp-config --print.

After adding the config, restart Claude Code. The tools will be available immediately -- Claude will automatically use them when you ask about your documents.

Claude Desktop

Option A -- auto-install:

rag mcp-config --install claude-desktop

This writes to ~/Library/Application Support/Claude/claude_desktop_config.json.

Option B -- manual: Open Claude Desktop settings, go to Developer > MCP Servers, and add:

{
  "local-rag": {
    "command": "/path/to/local-rag/.venv/bin/python",
    "args": ["-m", "rag.cli", "serve"]
  }
}

Restart Claude Desktop after adding the config.

Kiro

Option A -- auto-install:

rag mcp-config --install kiro

This writes to ~/.kiro/settings/mcp.json (user-level). For project-level config, use Option B with .kiro/settings/mcp.json in your project root.

Option B -- manual: Add to ~/.kiro/settings/mcp.json:

{
  "mcpServers": {
    "local-rag": {
      "command": "/path/to/local-rag/.venv/bin/python",
      "args": ["-m", "rag.cli", "serve"]
    }
  }
}

Option C -- kiro-cli:

kiro-cli mcp add \
  --name "local-rag" \
  --scope global \
  --command "/path/to/local-rag/.venv/bin/python" \
  --args "-m rag.cli serve"

Run rag mcp-config --print to get the exact Python path for your venv.

MCP prompts (slash commands)

The server registers three prompts, available as slash commands in Claude Code:

Prompt	Description
`/research`	Deep research on a topic -- scouts documents, extracts evidence, synthesizes findings
`/discover`	Explore what's in the index -- browse recent documents, discover topics
`/catch-up`	Summarize recent changes in a folder or across all indexed documents

The server also provides ~600 words of instructions guiding the LLM through a recommended scout-then-search-then-drill-down workflow, query tips, and a list of configured folders.

Available MCP tools

Once connected, your LLM has access to these tools (with enriched descriptions for better LLM tool selection):

Tool	Description
`search_documents`	Hybrid search with reranking. Accepts `query`, optional `folder_filter`, `date_filter`, `top_k`, `format` ("text" or "json"). Returns cited evidence passages with scores.
`quick_search`	Lightweight document-level scan returning document summaries. Faster than `search_documents` for broad queries.
`get_document_context`	Get document overview (summary + sections) by `doc_id`, or a chunk with surrounding context by `chunk_id`.
`list_recent_documents`	List recently indexed documents, optionally filtered by folder.
`get_sync_status`	Check indexing health: total files, indexed count, errors, per-folder breakdown.

Verifying the connection

After setup, ask your LLM something like:

"What documents do I have indexed?" (uses list_recent_documents)

"Search my documents for gate operations procedures" (uses search_documents)

If it responds with content from your documents, the MCP connection is working. If not, check rag doctor and verify the Python path in your MCP config points to the correct venv.

CLI Reference

Command	Description
`rag init`	Interactive setup wizard -- configures folders, LLM CLI, Qdrant
`rag index`	Full scan and process all documents in configured folders
`rag index --reindex`	Purge all index data and re-process everything from scratch
`rag index --reindex FILE`	Clear index state for a single file and re-process it
`rag serve`	Start the MCP server (stdio transport by default)
`rag watch`	Filesystem watcher -- auto-indexes on document changes
`rag status`	Dashboard showing document/chunk/error counts, MCP health, liveness
`rag doctor`	Health check -- verifies Qdrant, models, folders
`rag search "query"`	CLI search for testing (`--debug` for lane/weight details, `--top-k N`)
`rag mcp-config`	Print or install MCP config (`--print`, `--install`)

Configuration

Config is TOML, resolved in this order (first match wins):

RAG_CONFIG_PATH environment variable
./config.toml in the current directory
~/.config/local-rag/config.toml (default, created by rag init)

Only [folders].paths is required. Everything else has sensible defaults. See config.example.toml for all options with defaults.

Minimal config:

[folders]
paths = ["~/Documents"]

Full config with all defaults shown:

[folders]
paths = ["~/Documents", "~/Work"]
extensions = ["pdf", "docx", "txt", "md"]
ignore = ["**/node_modules", "**/.git", "**/venv", "**/__pycache__"]

[database]
path = "~/.local/share/local-rag/metadata.db"

[qdrant]
url = "http://localhost:6333"
collection = "documents"

[embedding]
model = "BAAI/bge-m3"
dimensions = 1024
batch_size = 32
cache_dir = "~/.cache/local-rag/models"

[reranker]
model_path = "~/.cache/local-rag/models/bge-reranker-v2-m3"
top_k_candidates = 30
top_k_final = 10

[summarization]
enabled = true
command = "claude"
args = ["--print", "--max-tokens", "2048"]
timeout_seconds = 60

[chunking]
strategy = "fixed"              # "fixed" (default) or "semantic"
similarity_threshold = 0.35     # semantic only: boundary detection threshold
max_chunk_tokens = 768          # semantic only: max tokens per chunk

[questions]
enabled = true                  # auto-generate 3 questions per chunk at index time

[mcp]
transport = "stdio"
host = "127.0.0.1"
port = 8080

[watcher]
poll_interval_seconds = 5
debounce_seconds = 2
use_polling = false
batch_window_seconds = 10

Architecture

Single Python process handles: filesystem watching (watchdog) -> Docling parsing (in subprocess for memory isolation) -> normalization -> dedup -> chunking (fixed 512-token or opt-in semantic via BGE-M3 sentence embeddings) -> embedding (BGE-M3, 1024-dim) -> auto-question generation (LLM generates 3 questions per chunk, prepended before embedding) -> summarization (LLM CLI) -> Qdrant indexing. Retrieval: 3-lane prefetch (document summaries, section summaries, chunks) -> RRF fusion with layer weighting -> recency boost -> ONNX cross-encoder reranker -> cited evidence returned to the calling LLM.

src/rag/
  cli.py               # CLI entry points (click)
  config.py            # TOML config loader
  init.py              # Setup wizard
  types.py             # Pydantic models, enums, type aliases
  protocols.py         # Protocol classes (Embedder, Summarizer, etc.)
  results.py           # Discriminated union Result types
  sync/                # Filesystem scanner + watcher
  pipeline/            # classify -> parse -> normalize -> dedup -> chunk -> embed -> questions -> summarize -> index
    parser/            # Docling (PDF/DOCX) + text fallback (TXT/MD)
    chunker_semantic.py # Semantic chunking (opt-in, Max-Min algorithm)
  retrieval/           # 3-lane prefetch + RRF + layer weighting + reranker + citations
  mcp/                 # MCP server (stdio + HTTP) + 5 tools + 3 prompts + server instructions
    prompts.py         # MCP prompts (research, discover, catch-up)
  db/                  # SQLite + Qdrant clients
migrations/            # SQL schema
tests/                 # Unit + e2e tests

Development

make lint       # ruff check + format check + mypy strict
make test       # unit tests (fast, no Docker)
make test-e2e   # end-to-end (requires Qdrant + models)
make test-all   # lint + test + test-e2e
make format     # auto-format with ruff

Cleanup & Uninstall

local-rag stores ML models, config, and data across several directories (~6.5 GB total, mostly models).

Storage locations

What	Path	Size
Embedding model (BGE-M3)	`~/.cache/local-rag/models/models--BAAI--bge-m3/`	~4.3 GB
Reranker model (ONNX)	`~/.cache/local-rag/models/bge-reranker-v2-m3/`	~2.1 GB
SQLite database	`~/.local/share/local-rag/metadata.db`	~1 MB
Config file	`~/.config/local-rag/config.toml`	< 1 KB
Qdrant data	Docker volume `local-rag_qdrant_data`	Varies

Full uninstall

# 1. Stop and remove Qdrant container + data
docker compose down -v

# 2. Remove cached models (~6.4 GB)
rm -rf ~/.cache/local-rag

# 3. Remove database and application data
rm -rf ~/.local/share/local-rag

# 4. Remove config
rm -rf ~/.config/local-rag

# 5. Remove MCP config entries (if installed)
#    Edit the relevant file and remove the "local-rag" entry:
#    Claude Code:    ~/.claude.json
#    Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json
#    Kiro:           ~/.kiro/settings/mcp.json

# 6. Uninstall the Python package
pip uninstall local-rag

Free disk space (keep local-rag installed)

# Remove models only (~6.4 GB, re-downloads on next use)
rm -rf ~/.cache/local-rag/models

# Re-download when ready
make download-models

Troubleshooting

"No config file found" -- Run rag init, or copy config.example.toml to ~/.config/local-rag/config.toml and edit it.

Qdrant connection refused -- Ensure Qdrant is running: docker compose up -d

Model download fails -- Check internet. Clear cache and retry:

rm -rf ~/.cache/local-rag/models/
rag index              # re-downloads everything

Search returns no results -- Check rag status. If zero documents, run rag index. If documents are indexed, try rag search "query" --debug to see what's happening at each stage.

MCP not working in Claude Desktop -- Run rag mcp-config --print and verify the Python path points to your venv. Restart Claude Desktop after config changes.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.claude		.claude
docs		docs
migrations		migrations
plan/local		plan/local
scripts		scripts
src/rag		src/rag
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.example.toml		config.example.toml
docker-compose.yml		docker-compose.yml
local-rag-vs-qmd.html		local-rag-vs-qmd.html
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

local-rag

Features

Prerequisites

Quick Start

MCP Integration

Claude Code

Claude Desktop

Kiro

MCP prompts (slash commands)

Available MCP tools

Verifying the connection

CLI Reference

Configuration

Architecture

Development

Cleanup & Uninstall

Storage locations

Full uninstall

Free disk space (keep local-rag installed)

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

local-rag

Features

Prerequisites

Quick Start

MCP Integration

Claude Code

Claude Desktop

Kiro

MCP prompts (slash commands)

Available MCP tools

Verifying the connection

CLI Reference

Configuration

Architecture

Development

Cleanup & Uninstall

Storage locations

Full uninstall

Free disk space (keep local-rag installed)

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages