feat(search): pluggable vector/embedding search with hybrid FTS5+RRF by jtomaszon · Pull Request #139 · Gentleman-Programming/engram

jtomaszon · 2026-04-01T02:02:44Z

Summary

Adds semantic search capability alongside existing FTS5 keyword search, following the architecture endorsed in #21 and #24.

New internal/embedding/ package — Provider interface with Ollama and OpenAI implementations. Pure-Go cosine similarity, binary serialization, and Reciprocal Rank Fusion (RRF) merge. No new dependencies beyond net/http.
Hybrid search — When an embedding provider is configured, Search() runs FTS5 + vector cosine similarity and merges results via RRF (k=60). Falls back to FTS5-only when no provider is set — zero overhead for existing users.
Async embedding on save/update — AddObservation and UpdateObservation trigger background embedding generation. Content changes re-embed automatically.
observation_embeddings table — Created on migration, nullable. Separate from observations to keep FTS5 scans fast.
CLI flags — --embedding-provider=ollama|openai|none, --embedding-model, --embedding-url, plus ENGRAM_EMBEDDING_* env vars.
engram backfill-embeddings — Bulk-embeds existing observations that don't have embeddings yet.

Context

This implements the approach discussed by @Gentleman-Programming in #21 and #24:

FTS5 stays default — zero overhead when no provider configured
Vector search as opt-in via pluggable providers
Hybrid search merging FTS5 + vector results

Design decisions

Decision	Rationale
FTS5 stays default	Existing behavior unchanged when no provider configured
Brute-force cosine	Personal memory is typically <10K obs. Benchmark: 564ns/op on 768-dim (56ms for 100K obs)
RRF over weighted linear	Rank-based, no need to normalize BM25 and cosine to same scale
Separate embeddings table	Large BLOBs (~3KB each) would bloat observation table scans
Async embedding	Network calls (50-500ms) shouldn't block observation saves
No new Go deps	Providers use stdlib `net/http` only. CGO_ENABLED=0 preserved
Agent-agnostic providers	Users plug Ollama (local/free) or OpenAI (cloud) via config

Stats

1,495 lines added across 9 files (6 new + 3 modified)
33 new tests (22 embedding + 11 store integration)
All 346 existing tests pass — zero regressions
Benchmark: BenchmarkCosineSimilarity768: 564.5 ns/op, 0 allocs

Refs #21, #24

Test plan

All existing 346 tests pass unchanged
New embedding provider tests with mock HTTP servers
Cosine similarity correctness (identical=1, orthogonal=0, opposite=-1)
RRF merge ordering verified with known inputs
Store integration: hybrid search returns results from both FTS5 and vector
Store integration: vector search respects project/scope/type filters
Backfill generates embeddings for all unembedded observations
No-provider path returns FTS5-only results (backward compatible)
Embedding table created on migration

🤖 Generated with Claude Code

Add semantic search capability alongside existing FTS5 keyword search. When an embedding provider is configured, observations are embedded on save/update and search results merge FTS5 and vector cosine similarity via Reciprocal Rank Fusion (k=60). Falls back to FTS5-only when no provider is configured — zero overhead for existing users. New internal/embedding package: - Provider interface with Ollama and OpenAI implementations - Pure-Go cosine similarity and binary serialization (no CGO) - RRF merge for combining ranked result lists Store changes: - observation_embeddings table (created on migration, nullable) - Async embedding generation on AddObservation/UpdateObservation - Hybrid Search: FTS5 → vector scan → RRF merge → unified results - BackfillEmbeddings for bulk embedding existing observations CLI: - --embedding-provider, --embedding-model, --embedding-url flags - ENGRAM_EMBEDDING_PROVIDER/MODEL/URL/API_KEY env vars - engram backfill-embeddings command Refs: Gentleman-Programming#21, Gentleman-Programming#24 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cmdServe was missing the configureEmbeddings call, so embedding env vars (ENGRAM_EMBEDDING_PROVIDER etc.) were ignored when running `engram serve`. Now both serve and mcp commands honor embedding config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nomic-embed-text has an 8192 token context window (~6K chars of mixed prose/code). Observations exceeding this limit were silently failing. Now we truncate to 6000 chars and log a clear warning with the original and truncated sizes so users know to split large observations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each provider now reports its own maximum text length via MaxChars(). Ollama uses empirically tested limits per model (e.g., nomic-embed-text 6000 chars for mixed markdown/code). OpenAI uses token-based estimates. Truncation logs a clear warning with model name, original and truncated sizes. This replaces the previous hardcoded 6000 char global constant with provider-aware limits, so larger-context models (Voyage 32K, Cohere 128K) won't have their input unnecessarily truncated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Javier Zon and others added 4 commits March 31, 2026 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139

feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139
jtomaszon wants to merge 4 commits intoGentleman-Programming:mainfrom
scaledb-io:feature/vector-search

jtomaszon commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jtomaszon commented Apr 1, 2026

Summary

Context

Design decisions

Stats

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant