feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139
Open
jtomaszon wants to merge 4 commits intoGentleman-Programming:mainfrom
Open
feat(search): pluggable vector/embedding search with hybrid FTS5+RRF#139jtomaszon wants to merge 4 commits intoGentleman-Programming:mainfrom
jtomaszon wants to merge 4 commits intoGentleman-Programming:mainfrom
Conversation
Add semantic search capability alongside existing FTS5 keyword search. When an embedding provider is configured, observations are embedded on save/update and search results merge FTS5 and vector cosine similarity via Reciprocal Rank Fusion (k=60). Falls back to FTS5-only when no provider is configured — zero overhead for existing users. New internal/embedding package: - Provider interface with Ollama and OpenAI implementations - Pure-Go cosine similarity and binary serialization (no CGO) - RRF merge for combining ranked result lists Store changes: - observation_embeddings table (created on migration, nullable) - Async embedding generation on AddObservation/UpdateObservation - Hybrid Search: FTS5 → vector scan → RRF merge → unified results - BackfillEmbeddings for bulk embedding existing observations CLI: - --embedding-provider, --embedding-model, --embedding-url flags - ENGRAM_EMBEDDING_PROVIDER/MODEL/URL/API_KEY env vars - engram backfill-embeddings command Refs: Gentleman-Programming#21, Gentleman-Programming#24 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cmdServe was missing the configureEmbeddings call, so embedding env vars (ENGRAM_EMBEDDING_PROVIDER etc.) were ignored when running `engram serve`. Now both serve and mcp commands honor embedding config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nomic-embed-text has an 8192 token context window (~6K chars of mixed prose/code). Observations exceeding this limit were silently failing. Now we truncate to 6000 chars and log a clear warning with the original and truncated sizes so users know to split large observations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each provider now reports its own maximum text length via MaxChars(). Ollama uses empirically tested limits per model (e.g., nomic-embed-text 6000 chars for mixed markdown/code). OpenAI uses token-based estimates. Truncation logs a clear warning with model name, original and truncated sizes. This replaces the previous hardcoded 6000 char global constant with provider-aware limits, so larger-context models (Voyage 32K, Cohere 128K) won't have their input unnecessarily truncated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds semantic search capability alongside existing FTS5 keyword search, following the architecture endorsed in #21 and #24.
internal/embedding/package —Providerinterface with Ollama and OpenAI implementations. Pure-Go cosine similarity, binary serialization, and Reciprocal Rank Fusion (RRF) merge. No new dependencies beyondnet/http.Search()runs FTS5 + vector cosine similarity and merges results via RRF (k=60). Falls back to FTS5-only when no provider is set — zero overhead for existing users.AddObservationandUpdateObservationtrigger background embedding generation. Content changes re-embed automatically.observation_embeddingstable — Created on migration, nullable. Separate fromobservationsto keep FTS5 scans fast.--embedding-provider=ollama|openai|none,--embedding-model,--embedding-url, plusENGRAM_EMBEDDING_*env vars.engram backfill-embeddings— Bulk-embeds existing observations that don't have embeddings yet.Context
This implements the approach discussed by @Gentleman-Programming in #21 and #24:
Design decisions
net/httponly. CGO_ENABLED=0 preservedStats
BenchmarkCosineSimilarity768: 564.5 ns/op, 0 allocsRefs #21, #24
Test plan
🤖 Generated with Claude Code