Bug Report: RVF Backend Silently Drops Vectors with Non-Numeric String IDs
Package: agentdb (alpha) — specifically RvfBackend in agentdb/backends/self-learning
Upstream dependency: @ruvector/rvf / @ruvector/rvf-node (N-API addon)
Severity: Critical — silent data loss
Date: 2026-02-22
Summary
RvfBackend silently drops all vectors whose IDs are non-numeric strings
(e.g. UUIDs, hex hashes). Only vectors with purely numeric string IDs (e.g.
"1", "42") are persisted. No error is thrown — the operation appears to
succeed but the data is lost.
Root Cause
The NodeBackend.ingestBatch() method in @ruvector/rvf/dist/backend.js
converts string IDs to numbers on line 118:
// @ruvector/rvf/dist/backend.js, line 118
const ids = entries.map((e) => Number(e.id));
The N-API layer expects i64[] (numeric labels). When e.id is a
non-numeric string like "da003664_2b0f6ff3747e", Number() returns NaN.
The native Rust HNSW layer silently ignores entries with NaN IDs — no error
is thrown, and ingestBatch returns { accepted: N } as if all entries were
ingested.
The same pattern appears in NodeBackend.delete() at line 147:
const numIds = ids.map((id) => Number(id));
And in NodeBackend.query() at line 136, results come back as numeric labels
converted to strings:
return results.map((r) => ({ id: String(r.id), distance: r.distance }));
This means even if vectors were ingested successfully (with numeric IDs),
search results would return the numeric label (e.g. "42") rather than the
original semantic ID the caller passed.
Reproduction
import { SelfLearningRvfBackend } from "agentdb/backends/self-learning";
// --- Non-numeric IDs: FAILS silently ---
const backend = await SelfLearningRvfBackend.create({
dimension: 4, metric: "cosine",
storagePath: "/tmp/test_string_ids.rvf",
learning: false, maxElements: 1000,
});
for (let i = 0; i < 10; i++) {
const vec = new Float32Array([Math.random(), Math.random(), Math.random(), Math.random()]);
await backend.insertAsync(`chunk_${i}`, vec, { text: `test ${i}` });
}
await backend.flush();
await backend.save();
await backend.backend.db.close();
// Reopen — expect 10, get 1
const backend2 = await SelfLearningRvfBackend.create({
dimension: 4, metric: "cosine",
storagePath: "/tmp/test_string_ids.rvf",
learning: false, maxElements: 1000,
});
console.log("Count:", backend2.getStats().count); // → 1 (should be 10)
await backend2.backend.db.close();
// --- Numeric IDs: WORKS correctly ---
const backend3 = await SelfLearningRvfBackend.create({
dimension: 4, metric: "cosine",
storagePath: "/tmp/test_numeric_ids.rvf",
learning: false, maxElements: 1000,
});
for (let i = 0; i < 10; i++) {
const vec = new Float32Array([Math.random(), Math.random(), Math.random(), Math.random()]);
await backend3.insertAsync(String(i + 1), vec, { text: `test ${i}` });
}
await backend3.flush();
await backend3.save();
await backend3.backend.db.close();
// Reopen — expect 10, get 10 ✓
const backend4 = await SelfLearningRvfBackend.create({
dimension: 4, metric: "cosine",
storagePath: "/tmp/test_numeric_ids.rvf",
learning: false, maxElements: 1000,
});
console.log("Count:", backend4.getStats().count); // → 10 ✓
await backend4.backend.db.close();
Contrast: HNSWLibBackend Handles This Correctly
HNSWLibBackend (agentdb/backends) maintains explicit string↔numeric
mappings:
// HNSWLibBackend.js
idToLabel = new Map(); // string ID → numeric label
labelToId = new Map(); // numeric label → string ID
metadata = new Map(); // string ID → metadata
nextLabel = 0;
insert(id, embedding, metadata) {
// Assigns a numeric label, stores the mapping
const label = this.nextLabel++;
this.idToLabel.set(id, label);
this.labelToId.set(label, id);
// ...
}
These mappings are persisted to {path}.mappings.json alongside the HNSW
index file, and restored on load().
RvfBackend has no such mapping layer. It passes string IDs directly to
the N-API layer, which expects integers.
Affected Components
| Method |
File |
Line |
Issue |
ingestBatch() |
@ruvector/rvf/dist/backend.js |
118 |
Number(e.id) → NaN for non-numeric strings |
delete() |
@ruvector/rvf/dist/backend.js |
147 |
Number(id) → NaN for non-numeric strings |
query() |
@ruvector/rvf/dist/backend.js |
136 |
Returns numeric labels, not original string IDs |
Impact
- Silent data loss: Vectors are reported as accepted but never persisted
- No error signal:
ingestBatch() returns { accepted: N } even though
0 vectors were actually stored
- Metadata loss: Even with numeric IDs, metadata is not round-tripped
through the N-API layer (it's stored in-memory only by AgentDB's
RvfBackend wrapper, not persisted to the .rvf file)
- Search returns wrong IDs: Query results use numeric labels, not the
original semantic IDs
Proposed Fix
Option A: Add ID mapping to RvfBackend (AgentDB layer)
Add idToLabel/labelToId maps to RvfBackend, identical to
HNSWLibBackend. Persist the mappings as a sidecar JSON file
({path}.mappings.json). This is the simplest fix and keeps the N-API layer
unchanged.
// RvfBackend additions
idToLabel = new Map();
labelToId = new Map();
metadata = new Map();
nextLabel = 1; // RVF uses 1-based labels
insert(id, embedding, metadata) {
let label = this.idToLabel.get(id);
if (label === undefined) {
label = this.nextLabel++;
this.idToLabel.set(id, label);
this.labelToId.set(label, id);
}
this.metadata.set(id, metadata);
// Queue with numeric label
this.pending.push({ id: String(label), vector: ..., metadata });
}
Option B: Support string IDs natively in @ruvector/rvf-node
Modify the Rust N-API layer to accept string IDs and handle the mapping
internally (e.g. store a string→u64 hash table in the RVF file). This is a
larger change but eliminates the sidecar file.
Option C: Hash string IDs to numeric labels (quick but lossy)
Use a deterministic hash (e.g. FNV-1a or xxHash) to convert string IDs to
numeric labels. Risk: hash collisions cause silent overwrites. Not
recommended for production use.
Recommended Fix
Option A — it's what HNSWLibBackend already does successfully, requires
no changes to the Rust N-API layer, and can be implemented entirely in the
AgentDB TypeScript code.
Environment
agentdb@alpha (npm)
@ruvector/rvf@0.x (npm, N-API addon)
- Node.js v22.19.0
- Linux 6.17.0-14-generic, x86_64
- Tested on RTX 4090, 24 GB VRAM
Bug Report: RVF Backend Silently Drops Vectors with Non-Numeric String IDs
Package:
agentdb(alpha) — specificallyRvfBackendinagentdb/backends/self-learningUpstream dependency:
@ruvector/rvf/@ruvector/rvf-node(N-API addon)Severity: Critical — silent data loss
Date: 2026-02-22
Summary
RvfBackendsilently drops all vectors whose IDs are non-numeric strings(e.g. UUIDs, hex hashes). Only vectors with purely numeric string IDs (e.g.
"1","42") are persisted. No error is thrown — the operation appears tosucceed but the data is lost.
Root Cause
The
NodeBackend.ingestBatch()method in@ruvector/rvf/dist/backend.jsconverts string IDs to numbers on line 118:
The N-API layer expects
i64[](numeric labels). Whene.idis anon-numeric string like
"da003664_2b0f6ff3747e",Number()returnsNaN.The native Rust HNSW layer silently ignores entries with
NaNIDs — no erroris thrown, and
ingestBatchreturns{ accepted: N }as if all entries wereingested.
The same pattern appears in
NodeBackend.delete()at line 147:And in
NodeBackend.query()at line 136, results come back as numeric labelsconverted to strings:
This means even if vectors were ingested successfully (with numeric IDs),
search results would return the numeric label (e.g.
"42") rather than theoriginal semantic ID the caller passed.
Reproduction
Contrast: HNSWLibBackend Handles This Correctly
HNSWLibBackend(agentdb/backends) maintains explicit string↔numericmappings:
These mappings are persisted to
{path}.mappings.jsonalongside the HNSWindex file, and restored on
load().RvfBackendhas no such mapping layer. It passes string IDs directly tothe N-API layer, which expects integers.
Affected Components
ingestBatch()@ruvector/rvf/dist/backend.jsNumber(e.id)→ NaN for non-numeric stringsdelete()@ruvector/rvf/dist/backend.jsNumber(id)→ NaN for non-numeric stringsquery()@ruvector/rvf/dist/backend.jsImpact
ingestBatch()returns{ accepted: N }even though0 vectors were actually stored
through the N-API layer (it's stored in-memory only by AgentDB's
RvfBackendwrapper, not persisted to the.rvffile)original semantic IDs
Proposed Fix
Option A: Add ID mapping to
RvfBackend(AgentDB layer)Add
idToLabel/labelToIdmaps toRvfBackend, identical toHNSWLibBackend. Persist the mappings as a sidecar JSON file(
{path}.mappings.json). This is the simplest fix and keeps the N-API layerunchanged.
Option B: Support string IDs natively in
@ruvector/rvf-nodeModify the Rust N-API layer to accept string IDs and handle the mapping
internally (e.g. store a string→u64 hash table in the RVF file). This is a
larger change but eliminates the sidecar file.
Option C: Hash string IDs to numeric labels (quick but lossy)
Use a deterministic hash (e.g. FNV-1a or xxHash) to convert string IDs to
numeric labels. Risk: hash collisions cause silent overwrites. Not
recommended for production use.
Recommended Fix
Option A — it's what
HNSWLibBackendalready does successfully, requiresno changes to the Rust N-API layer, and can be implemented entirely in the
AgentDB TypeScript code.
Environment
agentdb@alpha(npm)@ruvector/rvf@0.x(npm, N-API addon)