diff --git a/docs/backlog.md b/docs/backlog.md index 083df4f..f26f931 100644 --- a/docs/backlog.md +++ b/docs/backlog.md @@ -99,6 +99,7 @@ | BL-036 | LanceDB ANN fast-path for large scopes | P2 | planned | TBD | TBD | 新增 `LANCEDB_OPENCODE_PRO_VECTOR_INDEX_THRESHOLD` (預設 1000);當 scope entries ≥ 閾值時自動建立 IVF_PQ 向量索引;`memory_stats` 揭露 `searchMode` 欄位;`pruneScope` 超過 `maxEntriesPerScope` 時發出警告日誌 [Surface: Plugin] | | BL-037 | Event table TTL / archival | P1 | done | bl-037-event-ttl-archival | openspec/changes/archive/2026-04-03-bl-037-event-ttl-archival/specs/event-ttl/ | 為 `effectiveness_events` 建立保留期與歸檔機制,降低長期 local store 成本 [Surface: Plugin] | | BL-048 | LanceDB 索引衝突修復與備份安全機制 | P1 | **done** | bl-048-lancedb-index-recovery | openspec/changes/bl-048-lancedb-index-recovery/ | 修復 ensureIndexes() 重試邏輯 + 可選定期備份 config [Surface: Plugin] v0.6.1 | +| BL-051 | FTS/Vector index concurrent-process race condition fix | P0 | **done** | fix-fts-index-race-condition | openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/ | 修復多进程同时启动时的 index 创建冲突;commit-conflict 检测 + re-verify + jitter + final-pass [Surface: Plugin] v0.6.1 | | BL-049 | Embedder 錯誤容忍與 graceful degradation | P1 | **done** | bl-049-embedder-error-tolerance | openspec/changes/archive/2026-04-03-bl-049-embedder-error-tolerance/ | embedder 失敗時的重試/延遲 + 搜尋時 BM25 fallback [Surface: Plugin] | | BL-050 | 內建 embedding 模型(transformers.js) | P1 | proposed | TBD | TBD | 新增 TransformersEmbedder,提供離線 embedding 能力 [Surface: Plugin] | diff --git a/docs/roadmap.md b/docs/roadmap.md index ee7889b..b4addf4 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -405,17 +405,18 @@ OpenCode 要從「有長期記憶的工具」進化成「會累積團隊工作 5. eval harness for learning quality(Surface: Test-infra)→ BL-032 6. A/B testing framework(Surface: Test-infra + Docs)→ BL-033 7. success pattern 的 playbook surface 強化(Surface: Plugin)→ BL-040 -8. `effectiveness_events` TTL / archival(Surface: Plugin)→ BL-037 +8. `effectiveness_events` TTL / archival(Surface: Plugin)→ BL-037 ✅ DONE 9. backoff / cooldown signal ingestion(Surface: Plugin;**blocked by upstream events**)→ BL-021 10. 條件式 user/team precedence(僅在多使用者需求成立時) 11. Tool registration 模組化拆分(Surface: Plugin)→ BL-041 ✅ DONE -12. Episodic 更新流程 DRY 化(Surface: Plugin)→ BL-043 +12. Episodic 更新流程 DRY 化(Surface: Plugin)→ BL-043 ✅ DONE 13. Duplicate consolidation 擴充性重構(Surface: Plugin)→ BL-044 ✅ DONE 14. Scope cache 記憶體治理(Surface: Plugin)→ BL-045 ✅ DONE -15. DB row runtime schema validation(Surface: Plugin + Test-infra)→ BL-046 +15. DB row runtime schema validation(Surface: Plugin + Test-infra)→ BL-046 ✅ DONE 16. LanceDB 索引衝突修復與備份安全機制(Surface: Plugin)→ BL-048 ✅ DONE v0.6.1 17. Embedder 錯誤容忍與 graceful degradation(Surface: Plugin)→ BL-049 ✅ DONE -18. 內建 embedding 模型(transformers.js)(Surface: Plugin)→ BL-050 ⚠️ 研究完成,待實作 +18. FTS/Vector index concurrent-process race condition fix(Surface: Plugin)→ BL-051 ✅ DONE v0.6.1 +19. 內建 embedding 模型(transformers.js)(Surface: Plugin)→ BL-050 ⚠️ 研究完成,待實作 ### P2 16. large-scope retrieval 的 ANN fast-path(Surface: Plugin)→ BL-036 diff --git a/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/.openspec.yaml b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/.openspec.yaml new file mode 100644 index 0000000..c551aea --- /dev/null +++ b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-04-05 diff --git a/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/design.md b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/design.md new file mode 100644 index 0000000..6bd0929 --- /dev/null +++ b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/design.md @@ -0,0 +1,49 @@ +## Context + +`MemoryStore.ensureIndexes()` creates vector and FTS indexes at startup via `createVectorIndexWithRetry()` and `createFtsIndexWithRetry()`. Both methods follow check-then-create: they call `listIndices()` to see if the index exists, then call `createIndex()` if not. When two or more processes initialize simultaneously against the same LanceDB table, both processes pass the check (no index found), both call `createIndex()`, and only one wins. The loser receives a `Retryable commit conflict` error from LanceDB. + +The current retry loop (3 attempts, exponential backoff) treats all errors identically. It does not detect the "another process created it" signal, so after exhausting retries it sets `indexState.fts = false` / `indexState.vector = false` even though the index was successfully created by the concurrent process. + +## Goals / Non-Goals + +**Goals:** +- Detect commit-conflict errors as a distinct, recoverable case +- Re-verify index existence immediately after catching a commit-conflict, and adopt the index as success if present +- Add per-attempt randomized jitter to backoff to reduce re-collision probability +- Perform a final existence check after all retries are exhausted before declaring failure + +**Non-Goals:** +- Process-level mutex or file-lock coordination across OS processes +- Changes to LanceDB API or its conflict semantics +- Changes to fallback search behavior when index is truly absent +- New configuration knobs exposed to users + +## Decisions + +### Decision 1: Re-verify after commit-conflict instead of a distributed lock + +**Chosen**: Catch the specific `Retryable commit conflict` string in the error message, then immediately call `listIndices()` to check if another process created the index. + +**Alternative considered**: File-based lock (e.g., `flock` on a sentinel file). Rejected because it adds OS-level complexity, is not portable across all deployment modes (Docker volumes, network mounts), and is overkill for a soft problem that resolves itself within milliseconds. + +**Rationale**: LanceDB itself labels this error as `Retryable`. The correct response is to retry verification, not to block. Re-reading `listIndices()` is cheap and stateless. + +### Decision 2: Jitter on backoff delay + +**Chosen**: Add `Math.random() * baseDelay` to the computed backoff delay for every retry attempt. + +**Rationale**: Without jitter, two processes with identical startup timing will collide again on every retry at the same moment (thundering herd). A simple uniform jitter breaks this synchronization. + +Formula: `delay = baseDelay * 2^attempt + Math.random() * baseDelay` + +### Decision 3: Final-pass existence check after all retries + +**Chosen**: After the retry loop completes without success, call `listIndices()` one more time before writing `indexState.{vector|fts} = false`. + +**Rationale**: The last retry may itself trigger a commit-conflict that creates the index via the other process. Without a final check, we declare failure on a successfully-created index. + +## Risks / Trade-offs + +- [`listIndices()` overhead`] Adds 1–3 extra `listIndices()` calls per startup per index when conflicts occur → Mitigation: calls only happen in the catch block and final-pass, not in the happy path. Cost is negligible. +- [`Error string matching`] Conflict detection relies on matching the LanceDB error message string, which could change across LanceDB versions → Mitigation: use substring check (`includes`), not exact match. If the string changes, the code degrades gracefully to current behavior (no regression). +- [`No cross-host guarantee`] If LanceDB is used against a shared network path with many hosts, jitter alone may be insufficient → Mitigation: out of scope; this fix targets the common single-machine multi-process case. diff --git a/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/proposal.md b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/proposal.md new file mode 100644 index 0000000..8a10e97 --- /dev/null +++ b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/proposal.md @@ -0,0 +1,25 @@ +## Why + +When multiple OpenCode processes start simultaneously (e.g., opening multiple tabs or sessions), each instance calls `ensureIndexes()` on the same LanceDB table. The existing check-then-create pattern has a TOCTOU (Time-Of-Check-Time-Of-Use) race window: two processes both see no index, both attempt creation, and the second one fails with `Retryable commit conflict for version N`. Because the retry logic does not re-verify index existence after a conflict failure, the index is incorrectly declared failed even though it was successfully created by the concurrent process — causing a permanent false-negative fallback to BM25-only or vector-only mode. + +## What Changes + +- Detect `Retryable commit conflict` errors as a distinct case from general failures +- After a commit-conflict catch, re-verify index existence before counting the attempt as failed +- If the index exists post-conflict, adopt it as success (treat as created by concurrent process) +- Add randomized jitter to retry backoff to prevent thundering-herd re-collision +- Final-pass existence check after all retries exhausted, to prevent false-negative reporting + +## Capabilities + +### New Capabilities + + +### Modified Capabilities +- `index-retry`: Add concurrent-process conflict resolution scenarios — commit-conflict errors must re-verify index existence before being counted as failures, and jitter must be added to retry backoff + +## Impact + +- `src/store.ts`: `createVectorIndexWithRetry()` and `createFtsIndexWithRetry()` — error handling in catch blocks, backoff delay formula +- No API surface changes, no config changes, no schema changes +- Fixes false-negative `indexState.fts = false` / `indexState.vector = false` when another process wins the race diff --git a/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/specs/index-retry/spec.md b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/specs/index-retry/spec.md new file mode 100644 index 0000000..2ac3680 --- /dev/null +++ b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/specs/index-retry/spec.md @@ -0,0 +1,58 @@ +## ADDED Requirements + +### Requirement: Commit-conflict errors trigger index re-verification + +When a `createIndex` call fails with a LanceDB commit-conflict error, the system SHALL re-verify index existence before counting the attempt as a failure, because the conflict indicates a concurrent process may have successfully created the index. + +Runtime Surface: internal-api +Entrypoint: `src/store.ts` -> `MemoryStore.createVectorIndexWithRetry()`, `MemoryStore.createFtsIndexWithRetry()` + +#### Scenario: Commit-conflict error — index created by concurrent process +- **WHEN** `createIndex()` throws an error whose message contains `"Retryable commit conflict"` or `"preempted by concurrent transaction"` +- **AND** a subsequent `listIndices()` call shows the index now exists +- **THEN** the system SHALL mark the index state as enabled (`indexState.vector = true` or `indexState.fts = true`) and return without counting the attempt as a failure + +#### Scenario: Commit-conflict error — index still absent after re-verification +- **WHEN** `createIndex()` throws a commit-conflict error +- **AND** a subsequent `listIndices()` call shows the index still does not exist +- **THEN** the system SHALL proceed to the next retry attempt (not mark as permanently failed) + +#### Scenario: Non-conflict transient error — no re-verification +- **WHEN** `createIndex()` throws an error that does NOT contain a commit-conflict message +- **THEN** the system SHALL proceed with the existing retry logic unchanged (no extra `listIndices()` call) + +--- + +### Requirement: Retry backoff includes randomized jitter + +To prevent thundering-herd re-collision when multiple processes retry simultaneously, the system SHALL add randomized jitter to each retry delay. + +Runtime Surface: internal-api +Entrypoint: `src/store.ts` -> `MemoryStore.createVectorIndexWithRetry()`, `MemoryStore.createFtsIndexWithRetry()` + +#### Scenario: Retry delay includes jitter +- **WHEN** an index creation attempt fails and a retry delay is computed +- **THEN** the delay SHALL be `baseDelay * 2^attempt + Math.random() * baseDelay` milliseconds, where `baseDelay = 500` + +#### Scenario: Jitter is non-deterministic across concurrent processes +- **WHEN** two processes compute retry delays for the same attempt number +- **THEN** their delays SHALL differ by a random amount up to `baseDelay` milliseconds, reducing the probability of simultaneous re-collision + +--- + +### Requirement: Final existence check before declaring index failure + +After exhausting all retry attempts without success, the system SHALL perform one final `listIndices()` check before writing a failure state, to prevent false-negative reporting when the last retry's conflict caused another process to succeed. + +Runtime Surface: internal-api +Entrypoint: `src/store.ts` -> `MemoryStore.createVectorIndexWithRetry()`, `MemoryStore.createFtsIndexWithRetry()` + +#### Scenario: Index found on final check — adopt as success +- **WHEN** all retry attempts complete without the local process succeeding +- **AND** a final `listIndices()` call shows the index now exists +- **THEN** the system SHALL mark the index state as enabled and log that the index was adopted from a concurrent process + +#### Scenario: Index absent on final check — declare failure +- **WHEN** all retry attempts complete without the local process succeeding +- **AND** a final `listIndices()` call shows the index still does not exist +- **THEN** the system SHALL mark the index state as disabled and log a structured error with the last error message diff --git a/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/tasks.md b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/tasks.md new file mode 100644 index 0000000..cfc0da2 --- /dev/null +++ b/openspec/changes/archive/2026-04-05-fix-fts-index-race-condition/tasks.md @@ -0,0 +1,24 @@ +## 1. Fix createVectorIndexWithRetry + +- [x] 1.1 Add commit-conflict detection helper: extract `isCommitConflict(errorMsg: string): boolean` inline check using `includes("Retryable commit conflict") || includes("preempted by concurrent transaction")` +- [x] 1.2 In the catch block, if `isCommitConflict`, call `listIndices()` and return success if the vector index now exists +- [x] 1.3 Replace fixed backoff delay `baseDelay * 2^attempt` with jittered formula `baseDelay * 2^attempt + Math.random() * baseDelay` +- [x] 1.4 After the retry loop exits without success, call `listIndices()` one final time and adopt the index as success if present; only then write `indexState.vector = false` + +## 2. Fix createFtsIndexWithRetry + +- [x] 2.1 Apply the same commit-conflict detection in the FTS catch block: call `listIndices()` on conflict and return success if the FTS index now exists +- [x] 2.2 Apply the same jittered backoff formula to FTS retry delays +- [x] 2.3 Apply the same final-pass existence check after FTS retry loop exhaustion + +## 3. Tests + +- [x] 3.1 Add unit test: `createVectorIndexWithRetry` — commit-conflict on first attempt, index exists on re-verify → `indexState.vector = true` +- [x] 3.2 Add unit test: `createFtsIndexWithRetry` — commit-conflict on first attempt, index exists on re-verify → `indexState.fts = true` +- [x] 3.3 Add unit test: `createFtsIndexWithRetry` — all attempts fail with non-conflict error, final check shows index absent → `indexState.fts = false` +- [x] 3.4 Add unit test: final-pass check — all retries exhausted, final `listIndices()` shows index present → `indexState.fts = true` + +## 4. Validation + +- [x] 4.1 Run `npm test` and confirm all new and existing tests pass +- [x] 4.2 Run `npm run build` and confirm zero TypeScript errors diff --git a/openspec/specs/index-retry/spec.md b/openspec/specs/index-retry/spec.md index 6a94e16..42c6fb5 100644 --- a/openspec/specs/index-retry/spec.md +++ b/openspec/specs/index-retry/spec.md @@ -1,7 +1,8 @@ # index-retry Specification ## Purpose -TBD - created by archiving change bl-048-lancedb-index-recovery. Update Purpose after archive. +Handles index creation retry logic for LanceDB tables, including detection and recovery from concurrent-process commit conflicts, exponential backoff with jitter, and graceful fallback when indexes cannot be created. + ## Requirements ### Requirement: Index retry with exponential backoff @@ -73,3 +74,62 @@ Entrypoint: `src/store.ts` -> `MemoryStore.searchMemories()` - **WHEN** `indexState.fts = false` - **THEN** the system SHALL fall back to vector-only search without error +--- + +### Requirement: Commit-conflict errors trigger index re-verification + +When a `createIndex` call fails with a LanceDB commit-conflict error, the system SHALL re-verify index existence before counting the attempt as a failure, because the conflict indicates a concurrent process may have successfully created the index. + +Runtime Surface: internal-api +Entrypoint: `src/store.ts` -> `MemoryStore.createVectorIndexWithRetry()`, `MemoryStore.createFtsIndexWithRetry()` + +#### Scenario: Commit-conflict error — index created by concurrent process +- **WHEN** `createIndex()` throws an error whose message contains `"Retryable commit conflict"` or `"preempted by concurrent transaction"` +- **AND** a subsequent `listIndices()` call shows the index now exists +- **THEN** the system SHALL mark the index state as enabled (`indexState.vector = true` or `indexState.fts = true`) and return without counting the attempt as a failure + +#### Scenario: Commit-conflict error — index still absent after re-verification +- **WHEN** `createIndex()` throws a commit-conflict error +- **AND** a subsequent `listIndices()` call shows the index still does not exist +- **THEN** the system SHALL proceed to the next retry attempt (not mark as permanently failed) + +#### Scenario: Non-conflict transient error — no re-verification +- **WHEN** `createIndex()` throws an error that does NOT contain a commit-conflict message +- **THEN** the system SHALL proceed with the existing retry logic unchanged (no extra `listIndices()` call) + +--- + +### Requirement: Retry backoff includes randomized jitter + +To prevent thundering-herd re-collision when multiple processes retry simultaneously, the system SHALL add randomized jitter to each retry delay. + +Runtime Surface: internal-api +Entrypoint: `src/store.ts` -> `MemoryStore.createVectorIndexWithRetry()`, `MemoryStore.createFtsIndexWithRetry()` + +#### Scenario: Retry delay includes jitter +- **WHEN** an index creation attempt fails and a retry delay is computed +- **THEN** the delay SHALL be `baseDelay * 2^attempt + Math.random() * baseDelay` milliseconds, where `baseDelay = 500` + +#### Scenario: Jitter is non-deterministic across concurrent processes +- **WHEN** two processes compute retry delays for the same attempt number +- **THEN** their delays SHALL differ by a random amount up to `baseDelay` milliseconds, reducing the probability of simultaneous re-collision + +--- + +### Requirement: Final existence check before declaring index failure + +After exhausting all retry attempts without success, the system SHALL perform one final `listIndices()` check before writing a failure state, to prevent false-negative reporting when the last retry's conflict caused another process to succeed. + +Runtime Surface: internal-api +Entrypoint: `src/store.ts` -> `MemoryStore.createVectorIndexWithRetry()`, `MemoryStore.createFtsIndexWithRetry()` + +#### Scenario: Index found on final check — adopt as success +- **WHEN** all retry attempts complete without the local process succeeding +- **AND** a final `listIndices()` call shows the index now exists +- **THEN** the system SHALL mark the index state as enabled and log that the index was adopted from a concurrent process + +#### Scenario: Index absent on final check — declare failure +- **WHEN** all retry attempts complete without the local process succeeding +- **AND** a final `listIndices()` call shows the index still does not exist +- **THEN** the system SHALL mark the index state as disabled and log a structured error with the last error message + diff --git a/src/store.ts b/src/store.ts index d6244f2..7cfea0e 100644 --- a/src/store.ts +++ b/src/store.ts @@ -2043,7 +2043,20 @@ export class MemoryStore { } /** - * Create vector index with exponential backoff retry and existence check + * Returns true if the error message indicates a LanceDB retryable commit conflict, + * meaning another concurrent process may have already created the same index. + */ + private isCommitConflict(errorMsg: string): boolean { + return ( + errorMsg.includes("Retryable commit conflict") || + errorMsg.includes("preempted by concurrent transaction") + ); + } + + /** + * Create vector index with exponential backoff retry and existence check. + * Handles concurrent-process commit conflicts by re-verifying index existence + * after each conflict error, and adds jitter to avoid thundering-herd re-collision. */ private async createVectorIndexWithRetry(table: LanceTable): Promise { const maxRetries = 3; @@ -2056,6 +2069,8 @@ export class MemoryStore { return; } + let lastErrorMsg = ""; + for (let attempt = 0; attempt < maxRetries; attempt++) { this.indexState.vectorRetries = attempt + 1; try { @@ -2064,21 +2079,44 @@ export class MemoryStore { this.indexState.vector = true; return; } catch (error) { - const errorMsg = error instanceof Error ? error.message : String(error); + lastErrorMsg = error instanceof Error ? error.message : String(error); + + // Commit conflict: another process may have just created the index — re-verify. + if (this.isCommitConflict(lastErrorMsg)) { + const updatedIndices = await table.listIndices(); + if (updatedIndices.some(idx => idx.name === "vector")) { + console.log(`[store] Vector index created by concurrent process, adopting it (attempt ${attempt + 1})`); + this.indexState.vector = true; + return; + } + } + if (attempt < maxRetries - 1) { - const delay = baseDelay * Math.pow(2, attempt); - console.warn(`[store] Vector index creation failed (attempt ${attempt + 1}/${maxRetries}): ${errorMsg}. Retrying in ${delay}ms...`); + // Jitter prevents thundering-herd re-collision among concurrent processes. + const delay = baseDelay * Math.pow(2, attempt) + Math.random() * baseDelay; + console.warn(`[store] Vector index creation failed (attempt ${attempt + 1}/${maxRetries}): ${lastErrorMsg}. Retrying in ${Math.round(delay)}ms...`); await new Promise((resolve) => setTimeout(resolve, delay)); - } else { - console.error(`[store] Vector index creation failed after ${maxRetries} attempts: ${errorMsg}. Falling back to in-memory search.`); - this.indexState.vector = false; } } } + + // Final-pass existence check: the last retry's conflict may have caused another + // process to succeed even though our own call threw. + const finalIndices = await table.listIndices(); + if (finalIndices.some(idx => idx.name === "vector")) { + console.log("[store] Vector index found on final check (created by concurrent process), adopting it"); + this.indexState.vector = true; + return; + } + + console.error(`[store] Vector index creation failed after ${maxRetries} attempts: ${lastErrorMsg}. Falling back to in-memory search.`); + this.indexState.vector = false; } /** - * Create FTS index with exponential backoff retry and existence check + * Create FTS index with exponential backoff retry and existence check. + * Handles concurrent-process commit conflicts by re-verifying index existence + * after each conflict error, and adds jitter to avoid thundering-herd re-collision. */ private async createFtsIndexWithRetry(table: LanceTable): Promise { const maxRetries = 3; @@ -2091,6 +2129,8 @@ export class MemoryStore { return; } + let lastErrorMsg = ""; + for (let attempt = 0; attempt < maxRetries; attempt++) { this.indexState.ftsRetries = attempt + 1; try { @@ -2106,18 +2146,41 @@ export class MemoryStore { this.indexState.ftsError = ""; return; } catch (error) { - const errorMsg = error instanceof Error ? error.message : String(error); + lastErrorMsg = error instanceof Error ? error.message : String(error); + + // Commit conflict: another process may have just created the index — re-verify. + if (this.isCommitConflict(lastErrorMsg)) { + const updatedIndices = await table.listIndices(); + if (updatedIndices.some(idx => idx.name === "text")) { + console.log(`[store] FTS index created by concurrent process, adopting it (attempt ${attempt + 1})`); + this.indexState.fts = true; + this.indexState.ftsError = ""; + return; + } + } + if (attempt < maxRetries - 1) { - const delay = baseDelay * Math.pow(2, attempt); - console.warn(`[store] FTS index creation failed (attempt ${attempt + 1}/${maxRetries}): ${errorMsg}. Retrying in ${delay}ms...`); + // Jitter prevents thundering-herd re-collision among concurrent processes. + const delay = baseDelay * Math.pow(2, attempt) + Math.random() * baseDelay; + console.warn(`[store] FTS index creation failed (attempt ${attempt + 1}/${maxRetries}): ${lastErrorMsg}. Retrying in ${Math.round(delay)}ms...`); await new Promise((resolve) => setTimeout(resolve, delay)); - } else { - console.error(`[store] FTS index creation failed after ${maxRetries} attempts: ${errorMsg}. Falling back to vector-only search.`); - this.indexState.fts = false; - this.indexState.ftsError = errorMsg; } } } + + // Final-pass existence check: the last retry's conflict may have caused another + // process to succeed even though our own call threw. + const finalIndices = await table.listIndices(); + if (finalIndices.some(idx => idx.name === "text")) { + console.log("[store] FTS index found on final check (created by concurrent process), adopting it"); + this.indexState.fts = true; + this.indexState.ftsError = ""; + return; + } + + console.error(`[store] FTS index creation failed after ${maxRetries} attempts: ${lastErrorMsg}. Falling back to vector-only search.`); + this.indexState.fts = false; + this.indexState.ftsError = lastErrorMsg; } private async ensureMemoriesTableCompatibility(): Promise { diff --git a/test/unit/index-race-condition.test.ts b/test/unit/index-race-condition.test.ts new file mode 100644 index 0000000..dfdd625 --- /dev/null +++ b/test/unit/index-race-condition.test.ts @@ -0,0 +1,151 @@ +import assert from "node:assert/strict"; +import test from "node:test"; +import { MemoryStore } from "../../src/store.js"; + +interface MockTable { + listIndices(): Promise>; + createIndex(column: string, options?: Record): Promise; + add(rows: unknown[]): Promise; + delete(filter: string): Promise; + schema(): Promise<{ fields: Array<{ name: string }> }>; + query(): { + where(expr: string): ReturnType; + select(columns: string[]): ReturnType; + limit(n: number): ReturnType; + toArray(): Promise>>; + }; +} + +function makeMockTable(overrides: Partial = {}): MockTable { + const q: ReturnType = { + where(_: string) { return q; }, + select(_: string[]) { return q; }, + limit(_: number) { return q; }, + async toArray() { return []; }, + }; + const base: MockTable = { + async listIndices() { return []; }, + async createIndex() {}, + async add() {}, + async delete() {}, + async schema() { return { fields: [] }; }, + query() { return q; }, + }; + return { ...base, ...overrides }; +} + +function asInternal(store: MemoryStore): Record { + return store as unknown as Record; +} + +function makeStore(): MemoryStore { + const store = new MemoryStore("/tmp/unused-no-init"); + const internal = asInternal(store); + internal.indexState = { vector: false, fts: false, ftsError: "", vectorRetries: 0, ftsRetries: 0 }; + internal.lancedb = null; + return store; +} + +test("isCommitConflict: detects 'Retryable commit conflict' message", () => { + const store = makeStore(); + const fn = (asInternal(store).isCommitConflict as (msg: string) => boolean).bind(store); + + assert.ok(fn("lance error: Retryable commit conflict for version 3010: This CreateIndex transaction was preempted by concurrent transaction")); + assert.ok(fn("Retryable commit conflict")); + assert.ok(!fn("Not enough rows to train PQ")); + assert.ok(!fn("Creating empty vector indices with train=False")); +}); + +test("isCommitConflict: detects 'preempted by concurrent transaction' message", () => { + const store = makeStore(); + const fn = (asInternal(store).isCommitConflict as (msg: string) => boolean).bind(store); + + assert.ok(fn("preempted by concurrent transaction CreateIndex at version 42")); + assert.ok(!fn("timeout error")); +}); + +test("createVectorIndexWithRetry: commit-conflict on first attempt, index exists on re-verify → vector = true", async () => { + const store = makeStore(); + + let listCallCount = 0; + const table = makeMockTable({ + async listIndices() { + listCallCount++; + if (listCallCount === 1) return []; + return [{ name: "vector" }]; + }, + async createIndex() { + throw new Error("Retryable commit conflict for version 99: preempted by concurrent transaction CreateIndex"); + }, + }); + + const internal = asInternal(store); + await (internal.createVectorIndexWithRetry as (t: MockTable) => Promise).call(store, table); + + const indexState = internal.indexState as { vector: boolean }; + assert.strictEqual(indexState.vector, true, "vector index should be adopted after commit-conflict re-verify"); +}); + +test("createFtsIndexWithRetry: commit-conflict on first attempt, index exists on re-verify → fts = true", async () => { + const store = makeStore(); + + let listCallCount = 0; + const table = makeMockTable({ + async listIndices() { + listCallCount++; + if (listCallCount === 1) return []; + return [{ name: "text" }]; + }, + async createIndex() { + throw new Error("Retryable commit conflict for version 42: preempted by concurrent transaction"); + }, + }); + + const internal = asInternal(store); + await (internal.createFtsIndexWithRetry as (t: MockTable) => Promise).call(store, table); + + const indexState = internal.indexState as { fts: boolean; ftsError: string }; + assert.strictEqual(indexState.fts, true, "FTS index should be adopted after commit-conflict re-verify"); + assert.strictEqual(indexState.ftsError, "", "ftsError should be cleared"); +}); + +test("createFtsIndexWithRetry: non-conflict error with absent index on final check → fts = false", async () => { + const store = makeStore(); + + const table = makeMockTable({ + async listIndices() { return []; }, + async createIndex() { + throw new Error("Not enough rows to train PQ. Requires 256 rows but only 1 available"); + }, + }); + + const internal = asInternal(store); + await (internal.createFtsIndexWithRetry as (t: MockTable) => Promise).call(store, table); + + const indexState = internal.indexState as { fts: boolean; ftsError: string }; + assert.strictEqual(indexState.fts, false, "FTS index should be false when all retries fail with non-conflict error"); + assert.ok(indexState.ftsError.includes("Not enough rows"), "ftsError should contain the last error message"); +}); + +test("createFtsIndexWithRetry: final-pass check adopts index created by concurrent process after retries exhausted", async () => { + const store = makeStore(); + + let listCallCount = 0; + const table = makeMockTable({ + async listIndices() { + listCallCount++; + if (listCallCount <= 4) return []; + return [{ name: "text" }]; + }, + async createIndex() { + throw new Error("Retryable commit conflict for version 77"); + }, + }); + + const internal = asInternal(store); + await (internal.createFtsIndexWithRetry as (t: MockTable) => Promise).call(store, table); + + const indexState = internal.indexState as { fts: boolean; ftsError: string }; + assert.strictEqual(indexState.fts, true, "FTS index should be adopted via final-pass check after all retries exhausted"); + assert.strictEqual(indexState.ftsError, "", "ftsError should be cleared when adopted via final-pass"); +});