Avoid blocking startup on corpus embedding rebuild#28
Avoid blocking startup on corpus embedding rebuild#28RaviTharuma wants to merge 2 commits intoAVIDS2:mainfrom
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
I reviewed this one manually as well. The startup direction is good and I agree with the goal: MCP availability should not be blocked on a corpus-wide embedding rebuild. I do think there is one issue to address before merge:
So my current read is:
Once that is fixed and rebased on the latest |
|
Thanks for iterating on this — the startup direction still looks right, and I agree with the goal of getting lexical/BM25 search available before a full embedding rebuild finishes. One blocker is still present though: That means after startup / hot reload, So from the maintainer side, I’m not comfortable merging this yet as-is. If you can fix that part and rebase on the latest |
Remove the active-only filter in hydrateIndex() so resolved and archived observations are indexed at startup. Status filtering belongs at query time (searchObservations), not index time. Add 4-case hydrate-index test covering: - indexes active + resolved + archived observations - stores status field faithfully for per-status queries - skips malformed observations gracefully - idempotent re-hydration is a no-op
0e2cd3c to
f8e7315
Compare
|
Rebased on current What changed:
All 4 hydration tests pass. Full suite shows no regressions (883 pass, same baseline). |
|
Thanks for the update — the startup direction is still the right one, and the active-only hydration issue I called out earlier is fixed. I do still see one follow-up problem before I'd merge this:
So from the maintainer side my current read is:
If you want to keep the lexical-first startup model, I think this needs one more pass so the backfill policy for non-active observations is explicit rather than an accidental regression. |
Summary
This changes Memorix startup and hot-reload to prepare the lexical search index without blocking on a corpus-wide embedding rebuild.
Fixes #27.
Problem
createMemorixServer()used the heavyreindexObservations()path during normal startup and file-watch reloads.On large projects with API embeddings enabled, that meant MCP availability depended on:
That is too expensive for startup. Clients need the MCP and lexical search to come up first.
What changed
prepareSearchIndex()insrc/memory/observations.tsprepareSearchIndex()instead of the heavy rebuild pathprepareSearchIndex()hydrates the lexical/BM25 index and skips batch embedding generationtests/memory/prepare-search-index.test.tsWhy this approach
This keeps the heavy rebuild path available for explicit reindexing while removing startup's dependency on embedding-provider latency.
That makes the behavior provider-agnostic:
Verification
npm test -- tests/memory/prepare-search-index.test.ts tests/memory/reindex-embeddings.test.tsnpx vitest run tests/integration/release-blockers.test.tsbunx tsc --noEmitnpm run buildNotes
Running the full
npm testsuite from a fork checkout still exposes two existing git-remote expectation tests (tests/git/extractor.test.ts,tests/git/subdir-scan.test.ts) that assert the upstream repo id instead of the current fork remote. I did not change that behavior in this PR.