Skip to content

feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139

Open
jtomaszon wants to merge 4 commits intoGentleman-Programming:mainfrom
scaledb-io:feature/vector-search
Open

feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139
jtomaszon wants to merge 4 commits intoGentleman-Programming:mainfrom
scaledb-io:feature/vector-search

Conversation

@jtomaszon
Copy link
Copy Markdown

Summary

Adds semantic search capability alongside existing FTS5 keyword search, following the architecture endorsed in #21 and #24.

  • New internal/embedding/ packageProvider interface with Ollama and OpenAI implementations. Pure-Go cosine similarity, binary serialization, and Reciprocal Rank Fusion (RRF) merge. No new dependencies beyond net/http.
  • Hybrid search — When an embedding provider is configured, Search() runs FTS5 + vector cosine similarity and merges results via RRF (k=60). Falls back to FTS5-only when no provider is set — zero overhead for existing users.
  • Async embedding on save/updateAddObservation and UpdateObservation trigger background embedding generation. Content changes re-embed automatically.
  • observation_embeddings table — Created on migration, nullable. Separate from observations to keep FTS5 scans fast.
  • CLI flags--embedding-provider=ollama|openai|none, --embedding-model, --embedding-url, plus ENGRAM_EMBEDDING_* env vars.
  • engram backfill-embeddings — Bulk-embeds existing observations that don't have embeddings yet.

Context

This implements the approach discussed by @Gentleman-Programming in #21 and #24:

  1. FTS5 stays default — zero overhead when no provider configured
  2. Vector search as opt-in via pluggable providers
  3. Hybrid search merging FTS5 + vector results

Design decisions

Decision Rationale
FTS5 stays default Existing behavior unchanged when no provider configured
Brute-force cosine Personal memory is typically <10K obs. Benchmark: 564ns/op on 768-dim (56ms for 100K obs)
RRF over weighted linear Rank-based, no need to normalize BM25 and cosine to same scale
Separate embeddings table Large BLOBs (~3KB each) would bloat observation table scans
Async embedding Network calls (50-500ms) shouldn't block observation saves
No new Go deps Providers use stdlib net/http only. CGO_ENABLED=0 preserved
Agent-agnostic providers Users plug Ollama (local/free) or OpenAI (cloud) via config

Stats

  • 1,495 lines added across 9 files (6 new + 3 modified)
  • 33 new tests (22 embedding + 11 store integration)
  • All 346 existing tests pass — zero regressions
  • Benchmark: BenchmarkCosineSimilarity768: 564.5 ns/op, 0 allocs

Refs #21, #24

Test plan

  • All existing 346 tests pass unchanged
  • New embedding provider tests with mock HTTP servers
  • Cosine similarity correctness (identical=1, orthogonal=0, opposite=-1)
  • RRF merge ordering verified with known inputs
  • Store integration: hybrid search returns results from both FTS5 and vector
  • Store integration: vector search respects project/scope/type filters
  • Backfill generates embeddings for all unembedded observations
  • No-provider path returns FTS5-only results (backward compatible)
  • Embedding table created on migration

🤖 Generated with Claude Code

Javier Zon and others added 4 commits March 31, 2026 21:58
Add semantic search capability alongside existing FTS5 keyword search.
When an embedding provider is configured, observations are embedded on
save/update and search results merge FTS5 and vector cosine similarity
via Reciprocal Rank Fusion (k=60). Falls back to FTS5-only when no
provider is configured — zero overhead for existing users.

New internal/embedding package:
- Provider interface with Ollama and OpenAI implementations
- Pure-Go cosine similarity and binary serialization (no CGO)
- RRF merge for combining ranked result lists

Store changes:
- observation_embeddings table (created on migration, nullable)
- Async embedding generation on AddObservation/UpdateObservation
- Hybrid Search: FTS5 → vector scan → RRF merge → unified results
- BackfillEmbeddings for bulk embedding existing observations

CLI:
- --embedding-provider, --embedding-model, --embedding-url flags
- ENGRAM_EMBEDDING_PROVIDER/MODEL/URL/API_KEY env vars
- engram backfill-embeddings command

Refs: Gentleman-Programming#21, Gentleman-Programming#24

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cmdServe was missing the configureEmbeddings call, so embedding
env vars (ENGRAM_EMBEDDING_PROVIDER etc.) were ignored when running
`engram serve`. Now both serve and mcp commands honor embedding config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nomic-embed-text has an 8192 token context window (~6K chars of mixed
prose/code). Observations exceeding this limit were silently failing.
Now we truncate to 6000 chars and log a clear warning with the original
and truncated sizes so users know to split large observations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each provider now reports its own maximum text length via MaxChars().
Ollama uses empirically tested limits per model (e.g., nomic-embed-text
6000 chars for mixed markdown/code). OpenAI uses token-based estimates.
Truncation logs a clear warning with model name, original and truncated
sizes.

This replaces the previous hardcoded 6000 char global constant with
provider-aware limits, so larger-context models (Voyage 32K, Cohere
128K) won't have their input unnecessarily truncated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant