-
Notifications
You must be signed in to change notification settings - Fork 28
Startup should not block MCP availability on full embedding rebuild #27
Description
Summary
Starting the MCP server currently does a full reindexObservations() during startup and hot reload. On larger projects with API embeddings enabled, that couples MCP availability to corpus-wide embedding throughput.
In practice this can block the entire Memorix MCP from becoming available to clients like OpenCode, even though lexical/fulltext search could be ready immediately.
Why this matters
For large corpora, startup currently does all of the following before the MCP is usable:
- resets the Orama DB
- reloads all observations
- batch-generates embeddings for the full corpus
- waits on provider retries / rate limits before tool availability finishes
That is the wrong coupling. MCP startup should only need a usable search index, not a complete vector rebuild.
Observed impact
In an OpenCode session that depended on Memorix MCP, the session would stall during tool resolution until Memorix finished startup work. After switching startup to lexical hydration only, the same session recovered immediately:
2026-03-27T214400.log:274service=session.prompt status=started resolveTools2026-03-27T214400.log:907[memorix] Prepared search index for 8024 observations in project: RaviTharuma/opencode2026-03-27T214400.log:921service=mcp key=mcpm_memorix toolCount=34 create() successfully created client2026-03-27T214400.log:929service=session.prompt status=completed duration=9825 resolveTools
Before the patch, startup followed the heavy rebuild path and became sensitive to embedding-provider delays / rate limits on a roughly 8k to 10k observation corpus.
Root cause
createMemorixServer() currently uses the heavy rebuild path on startup and hot reload.
That path is appropriate for explicit rebuilds, but not for startup:
- startup only needs lexical/fulltext search to be available immediately
- vector enrichment can happen asynchronously afterward
Proposed fix
Introduce a lighter startup path that:
- resets the in-memory DB
- hydrates the lexical/BM25 index from persisted observations
- does not call batch embedding generation during startup
- queues active observations for existing background vector backfill when embeddings are enabled
Keep the current heavy full rebuild path for explicit reindex operations.
Verification from the patch branch
tests/memory/prepare-search-index.test.tsnpx vitest run tests/integration/release-blockers.test.tsbunx tsc --noEmitnpm run build
I already have a patch prepared on my fork and will open a PR linked to this issue.