Skip to content

feat!: replace ChromaDB with sqlite-vec, extend CLI#10

Merged
ernestkoe merged 12 commits intomainfrom
feat/cli-extension-and-skill
Feb 15, 2026
Merged

feat!: replace ChromaDB with sqlite-vec, extend CLI#10
ernestkoe merged 12 commits intomainfrom
feat/cli-extension-and-skill

Conversation

@ernestkoe
Copy link
Collaborator

Summary

  • Replace ChromaDB with sqlite-vec for fully local, telemetry-free vector storage (~200KB, pure C, no transitive deps like posthog/grpc/opentelemetry)
  • Replace custom chunk_by_heading() with Chonkie RecursiveChunker using markdown-aware splitting rules (headings > paragraphs > lines > sentences)
  • Add CLI commands: similar, context, --path-filter on index
  • Add obsidian-rag short alias alongside obsidian-notes-rag
  • Fix similar/context note lookup with direct SQL query (get_by_file) instead of broken KNN+filter pattern

Details

sqlite-vec migration

  • Two-table schema: chunks (metadata) + chunks_vec (vec0 virtual table for KNN)
  • Contract tests written first, then backend swapped — same 11 tests validate both implementations
  • Net dependency change: -1,417 lines in uv.lock, 63 packages removed

Chonkie chunker

  • Inline markdown rules (no huggingface_hub dependency)
  • Lazy singleton chunker instance for performance
  • Splits by: h1 > h2 > h3 > h4 > paragraphs > lines > sentences > words

CLI extensions

  • obsidian-rag search "query" — semantic search
  • obsidian-rag similar "Path/To/Note.md" — find related notes
  • obsidian-rag context "Path/To/Note.md" — show note + related context
  • obsidian-rag index --path-filter "Daily Notes/" — selective re-indexing

Test plan

  • 25 tests passing (test_store.py, test_indexer.py, test_cli.py)
  • Smoke tested: uv tool install --force, re-indexed 1203 files / 2162 chunks
  • Verified search, similar, context, stats, index --path-filter commands
  • Verified MCP server get_similar and get_note_context tools updated

🤖 Generated with Claude Code

ernestkoe and others added 12 commits February 14, 2026 16:15
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Uses the built-in markdown recipe which splits by heading levels,
then paragraphs, then lines, then sentences.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removes chromadb and its heavy transitive dependencies (fastapi, grpc,
opentelemetry, posthog telemetry). Defines markdown chunking rules inline
to avoid huggingface_hub dependency that was previously pulled in by chromadb.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The similar and context commands were failing because they embedded the
file path string as a vector query then filtered by file_path. With
sqlite-vec's post-filter KNN, the target file was rarely in the top-k
results. Added get_by_file() for direct SQL lookup on the indexed
file_path column - no vector search needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ernestkoe ernestkoe changed the title feat: replace ChromaDB with sqlite-vec, extend CLI feat!: replace ChromaDB with sqlite-vec, extend CLI Feb 15, 2026
@ernestkoe ernestkoe merged commit e0bc157 into main Feb 15, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant