Skip to content

fix(rvf): add string ID mapping to NodeBackend — silent data loss#200

Merged
ruvnet merged 1 commit intomainfrom
fix/rvf-string-id-mapping
Feb 22, 2026
Merged

fix(rvf): add string ID mapping to NodeBackend — silent data loss#200
ruvnet merged 1 commit intomainfrom
fix/rvf-string-id-mapping

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented Feb 22, 2026

Summary

  • Fixes RvfBackend silently drops vectors with non-numeric string IDs agentic-flow#114RvfBackend silently drops vectors with non-numeric string IDs (UUIDs, hex hashes)
  • Adds bidirectional string↔numeric label mapping to NodeBackend in @ruvector/rvf
  • Persists mappings to {path}.idmap.json sidecar file (survives restarts)
  • Fixes query() returning numeric labels instead of original string IDs
  • Fixes delete() silently failing for non-numeric IDs
  • Bumps @ruvector/rvf 0.1.9 → 0.2.0

Root Cause

NodeBackend.ingestBatch() used Number(e.id) to convert string IDs to i64 for the N-API layer. Non-numeric strings → NaN → native HNSW silently drops them. No error thrown.

Fix

Follows the exact idToLabel/labelToId pattern from HNSWLibBackend:

  • resolveLabel(id): allocates sequential numeric labels for string IDs
  • query(): reverse-maps numeric labels to original string IDs
  • delete(): resolves string IDs to labels before calling native layer
  • Mappings saved/loaded from {storePath}.idmap.json

Test plan

  • 10 new unit tests covering string IDs, numeric IDs, mixed IDs, persistence, restore
  • node tests/test-id-mapping.js — 10/10 pass
  • TypeScript compiles cleanly
  • Backward compatible — numeric string IDs still work

🤖 Generated with claude-flow

NodeBackend.ingestBatch() passed string IDs directly to the N-API layer
via Number(e.id), which returns NaN for non-numeric strings (UUIDs, hex
hashes, etc.). The native Rust HNSW silently drops entries with NaN IDs,
causing silent data loss with no error signal.

Fix: Add a bidirectional string↔numeric mapping layer to NodeBackend,
following the same pattern used by HNSWLibBackend in AgentDB:
- resolveLabel(): allocates sequential i64 labels for string IDs
- query(): maps numeric labels back to original string IDs
- delete(): resolves string IDs to labels before calling native layer
- Mappings persisted to {path}.idmap.json sidecar file

Also fixes query() returning numeric labels instead of original string
IDs, and delete() silently failing for non-numeric IDs.

Bumps @ruvector/rvf from 0.1.9 → 0.2.0 (breaking fix).

Closes ruvnet/agentic-flow#114

Co-Authored-By: claude-flow <ruv@ruv.net>
@ruvnet ruvnet merged commit 7d410c4 into main Feb 22, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant