Skip to content

Startup should not block MCP availability on full embedding rebuild #27

@RaviTharuma

Description

@RaviTharuma

Summary

Starting the MCP server currently does a full reindexObservations() during startup and hot reload. On larger projects with API embeddings enabled, that couples MCP availability to corpus-wide embedding throughput.

In practice this can block the entire Memorix MCP from becoming available to clients like OpenCode, even though lexical/fulltext search could be ready immediately.

Why this matters

For large corpora, startup currently does all of the following before the MCP is usable:

  • resets the Orama DB
  • reloads all observations
  • batch-generates embeddings for the full corpus
  • waits on provider retries / rate limits before tool availability finishes

That is the wrong coupling. MCP startup should only need a usable search index, not a complete vector rebuild.

Observed impact

In an OpenCode session that depended on Memorix MCP, the session would stall during tool resolution until Memorix finished startup work. After switching startup to lexical hydration only, the same session recovered immediately:

  • 2026-03-27T214400.log:274 service=session.prompt status=started resolveTools
  • 2026-03-27T214400.log:907 [memorix] Prepared search index for 8024 observations in project: RaviTharuma/opencode
  • 2026-03-27T214400.log:921 service=mcp key=mcpm_memorix toolCount=34 create() successfully created client
  • 2026-03-27T214400.log:929 service=session.prompt status=completed duration=9825 resolveTools

Before the patch, startup followed the heavy rebuild path and became sensitive to embedding-provider delays / rate limits on a roughly 8k to 10k observation corpus.

Root cause

createMemorixServer() currently uses the heavy rebuild path on startup and hot reload.

That path is appropriate for explicit rebuilds, but not for startup:

  • startup only needs lexical/fulltext search to be available immediately
  • vector enrichment can happen asynchronously afterward

Proposed fix

Introduce a lighter startup path that:

  • resets the in-memory DB
  • hydrates the lexical/BM25 index from persisted observations
  • does not call batch embedding generation during startup
  • queues active observations for existing background vector backfill when embeddings are enabled

Keep the current heavy full rebuild path for explicit reindex operations.

Verification from the patch branch

  • tests/memory/prepare-search-index.test.ts
  • npx vitest run tests/integration/release-blockers.test.ts
  • bunx tsc --noEmit
  • npm run build

I already have a patch prepared on my fork and will open a PR linked to this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions