Skip to content

Conversation

@justincasher
Copy link
Owner

Previously, BM25 indices were built from scratch at runtime by loading all 380K+ declaration names from the database. This caused slow MCP server startup (~30s just for BM25 building).

Changes

  • Add build_bm25_indices() to extraction pipeline
  • Extraction pipeline now builds both FAISS and BM25 indices
  • SearchEngine loads pre-built BM25 indices from disk
  • Add BM25 index paths to Config

Impact

MCP server startup is now faster - BM25 indices load instantly instead of being built from scratch.

Previously, BM25 indices were built from scratch at runtime by loading
all 380K+ declaration names from the database. This caused slow MCP
server startup times.

Changes:
- Add build_bm25_indices() to extraction pipeline (index.py)
- Extraction pipeline now builds both FAISS and BM25 indices
- SearchEngine loads pre-built BM25 indices from disk
- Add BM25 index paths to Config
- Update tests for new extraction_path parameter and BM25 mocking
@justincasher justincasher merged commit 49e5072 into main Jan 28, 2026
2 checks passed
@justincasher justincasher deleted the feat/prebuilt-bm25-indices branch January 28, 2026 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants