Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 24 additions & 16 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Project Overview

obsidian-notes-rag is an MCP (Model Context Protocol) server that provides semantic search over Obsidian notes. It uses OpenAI embeddings by default (or Ollama for local processing) with ChromaDB for vector storage.
obsidian-notes-rag is an MCP (Model Context Protocol) server that provides semantic search over Obsidian notes. It uses OpenAI embeddings by default (or Ollama/LM Studio for local processing) with sqlite-vec for vector storage.

**PyPI:** https://pypi.org/project/obsidian-notes-rag/

Expand All @@ -21,43 +21,45 @@ uv run pytest -v
uv run pyright

# Interactive setup wizard
uv run obsidian-notes-rag setup
uv run obsidian-rag setup

# Index vault (manual refresh)
uv run obsidian-notes-rag index
uv run obsidian-rag index

# Run the MCP server (stdio transport)
uv run obsidian-notes-rag serve
uv run obsidian-rag serve

# Watch vault for changes
uv run obsidian-notes-rag watch
uv run obsidian-rag watch

# Search from CLI
uv run obsidian-notes-rag search "query"
uv run obsidian-rag search "query"
obsidian-rag similar "Path/To/Note.md"
obsidian-rag context "Path/To/Note.md"
```

## Architecture

### Data Flow

```
Obsidian Vault → VaultIndexer → Embedder (OpenAI/Ollama) → VectorStore (ChromaDB)
Obsidian Vault → VaultIndexer → Embedder (OpenAI/Ollama/LMStudio) → VectorStore (sqlite-vec)
MCP Client ← FastMCP Server ← search_notes/get_similar/etc.
```

### Key Components (src/obsidian_rag/)

- **config.py**: `Config` dataclass, `load_config()`/`save_config()` for TOML config file, cross-platform paths via `platformdirs`
- **indexer.py**: `VaultIndexer` scans markdown files, `chunk_by_heading()` splits content by `##`/`###` headings, `OpenAIEmbedder` and `OllamaEmbedder` generate embeddings, `create_embedder()` factory selects provider
- **store.py**: `VectorStore` wraps ChromaDB with cosine similarity search, handles upsert/delete by file path
- **indexer.py**: `VaultIndexer` scans markdown files, `chunk_markdown()` uses Chonkie RecursiveChunker with markdown-aware rules, `OpenAIEmbedder`/`OllamaEmbedder`/`LMStudioEmbedder` generate embeddings, `create_embedder()` factory selects provider
- **store.py**: `VectorStore` wraps sqlite-vec with KNN vector search, two tables (chunks metadata + chunks_vec virtual table), handles upsert/delete by file path
- **server.py**: FastMCP server exposing 5 tools: `search_notes`, `get_similar`, `get_note_context`, `get_stats`, `reindex`
- **watcher.py**: `VaultWatcher` uses watchdog with debouncing (default 2s) to incrementally re-index on file changes
- **cli.py**: Click-based CLI with `setup` wizard, `--provider` option, commands for indexing, searching, watching, and service management
- **cli.py**: Click-based CLI with `setup` wizard, `--provider` option, commands for indexing, searching, similar, context, watching, and service management

### Chunking Strategy

Files are split by `##` and `###` headings. Chunks smaller than `min_chunk_size` (default 100 chars) merge with the previous chunk. Files without headings become a single chunk.
Files are chunked using Chonkie's RecursiveChunker with markdown-aware splitting rules. Maximum chunk size is 1500 tokens with a minimum of 50 characters per chunk. The chunker splits by heading levels > paragraphs > lines > sentences > words.

### Metadata

Expand All @@ -73,15 +75,21 @@ Environment variables (override config file):
- `OPENAI_API_KEY` - OpenAI API key (required for default provider)
- `OBSIDIAN_RAG_PROVIDER` - `openai` (default) or `ollama`
- `OBSIDIAN_RAG_VAULT` - Path to Obsidian vault
- `OBSIDIAN_RAG_DATA` - ChromaDB storage path
- `OBSIDIAN_RAG_DATA` - sqlite-vec storage path
- `OBSIDIAN_RAG_OLLAMA_URL` - Ollama API (default: `http://localhost:11434`)
- `OBSIDIAN_RAG_MODEL` - Override embedding model

## Testing

Tests are in `tests/`. Current coverage focuses on `test_indexer.py` for frontmatter parsing and chunking logic.
Tests are in `tests/`:
- `test_store.py` - VectorStore contract tests (sqlite-vec backend)
- `test_indexer.py` - Frontmatter parsing and chunk_markdown tests
- `test_cli.py` - CLI command tests

```bash
# Run a specific test
uv run pytest tests/test_indexer.py::TestChunkByHeading::test_multiple_headings -v
# Run all tests
uv run pytest -v

# Run a specific test file
uv run pytest tests/test_store.py -v
```
Loading
Loading