fix(rvf): add string ID mapping to NodeBackend — silent data loss#200
Merged
fix(rvf): add string ID mapping to NodeBackend — silent data loss#200
Conversation
NodeBackend.ingestBatch() passed string IDs directly to the N-API layer
via Number(e.id), which returns NaN for non-numeric strings (UUIDs, hex
hashes, etc.). The native Rust HNSW silently drops entries with NaN IDs,
causing silent data loss with no error signal.
Fix: Add a bidirectional string↔numeric mapping layer to NodeBackend,
following the same pattern used by HNSWLibBackend in AgentDB:
- resolveLabel(): allocates sequential i64 labels for string IDs
- query(): maps numeric labels back to original string IDs
- delete(): resolves string IDs to labels before calling native layer
- Mappings persisted to {path}.idmap.json sidecar file
Also fixes query() returning numeric labels instead of original string
IDs, and delete() silently failing for non-numeric IDs.
Bumps @ruvector/rvf from 0.1.9 → 0.2.0 (breaking fix).
Closes ruvnet/agentic-flow#114
Co-Authored-By: claude-flow <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RvfBackendsilently drops vectors with non-numeric string IDs (UUIDs, hex hashes)NodeBackendin@ruvector/rvf{path}.idmap.jsonsidecar file (survives restarts)query()returning numeric labels instead of original string IDsdelete()silently failing for non-numeric IDs@ruvector/rvf0.1.9 → 0.2.0Root Cause
NodeBackend.ingestBatch()usedNumber(e.id)to convert string IDs to i64 for the N-API layer. Non-numeric strings →NaN→ native HNSW silently drops them. No error thrown.Fix
Follows the exact
idToLabel/labelToIdpattern fromHNSWLibBackend:resolveLabel(id): allocates sequential numeric labels for string IDsquery(): reverse-maps numeric labels to original string IDsdelete(): resolves string IDs to labels before calling native layer{storePath}.idmap.jsonTest plan
node tests/test-id-mapping.js— 10/10 pass🤖 Generated with claude-flow