Skip to content

fix(astro): add batch-size override and ECONNRESET retry to incremental search indexer#133

Merged
chris-c-thomas merged 1 commit intomainfrom
fix/search-indexer-batch-and-retry
Apr 25, 2026
Merged

fix(astro): add batch-size override and ECONNRESET retry to incremental search indexer#133
chris-c-thomas merged 1 commit intomainfrom
fix/search-indexer-batch-and-retry

Conversation

@chris-c-thomas
Copy link
Copy Markdown
Owner

Summary

  • Add --batch-size <n> CLI flag and MEILI_BATCH_SIZE env var to override the default 500 docs/batch in index-search-incremental.ts
  • Add --verbose-batches flag that logs first/last doc ID of every flushed batch (with stdout force-flush so the last logged ID survives a crash) — pair with --batch-size 1 to bisect poison docs
  • Add flushWithRetry() to BatchIndexer that handles ECONNRESET during Meilisearch OOM crashes by waiting for /health and retrying on the original taskUid rather than resubmitting

Why

Meilisearch can silently restart mid-task under memory pressure. On the 7.6 GiB VPS we observed ~60s crash cycles during FR bulk upserts that would die outright on the first ECONNRESET — even though the submitted task is persisted in LMDB and would typically resume on server recovery. The new retry waits for /health to return available (up to 180s) and resumes the original task instead of giving up.

This work was field-tested on the production VPS but never made it back to mainapps/astro/CLAUDE.md:161 already documents these flags as if they exist. This PR brings the code in line with the docs.

Scope

Only apps/astro/scripts/index-search-incremental.ts is modified. The full-reindex sibling apps/astro/scripts/index-search.ts has the same BatchIndexer shape and the same OOM-vulnerable addDocuments + waitForTask pattern — it should get the same treatment as a follow-up. Scoped out here because only the incremental script was field-validated.

Test plan

  • npx tsx scripts/index-search-incremental.ts --batch-size abc rejects non-integer
  • MEILI_BATCH_SIZE=xyz npx tsx scripts/index-search-incremental.ts rejects non-integer
  • --batch-size 1 --verbose-batches --set-checkpoint --source fr parses flags and writes checkpoint
  • Typecheck passes (pre-existing gray-matter cache errors are unrelated and documented)
  • Run a real --source fr --batch-size 100 --verbose-batches after merge to confirm retry path on next deploy

…al search indexer

Meilisearch can silently restart mid-task under memory pressure (observed
~60s crash cycles during FR bulk upserts on the 7.6 GiB VPS), causing
ECONNRESET on either the addDocuments POST or the waitForTask polling
that follows. The previous indexer died outright on the first failure
even though the submitted task was already persisted in LMDB and would
typically resume on server recovery.

Changes (apps/astro/scripts/index-search-incremental.ts):

- New flushWithRetry() in BatchIndexer waits for /health to return
  "available" (up to 180s) and retries the wait on the original taskUid
  rather than resubmitting the batch. Up to 5 attempts per flush.
- New --batch-size <n> CLI flag and MEILI_BATCH_SIZE env var override
  the default of 500 docs/batch. Smaller batches reduce per-flush
  Meilisearch memory and let crash recovery happen between batches
  instead of inside one.
- New --verbose-batches flag prints the first/last doc ID of every
  flushed batch, with stdout force-flushed so the last logged ID is
  durable through a crash. Combined with --batch-size 1 this isolates
  poison documents.

apps/astro/CLAUDE.md already documents these flags; this commit brings
the code in line with the documentation. The full-reindex sibling
(index-search.ts) has the same OOM-vulnerable pattern and should get
the same treatment in a follow-up — scoped out of this PR because only
the incremental script was field-validated on the VPS.
@sonarqubecloud
Copy link
Copy Markdown

@chris-c-thomas chris-c-thomas self-assigned this Apr 25, 2026
@chris-c-thomas chris-c-thomas merged commit b92640a into main Apr 25, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant