This is a sparse vector similarity search engine. Given a collection of documents — each represented as a sparse vector of (term_id, weight) pairs — it answers "which documents have the highest dot-product with this query vector?" efficiently.
It stores everything in MDBX (an embedded key-value database, similar to LMDB). There are two layers:
- Raw document store — the source-of-truth sparse vectors, one row per document.
- Inverted index — a derived structure that maps each term to the list of documents containing it, organized into fixed-size blocks for efficient streaming.
Both layers live in the same MDBX environment and are updated atomically within a single transaction.
| File | What it does |
|---|---|
| sparse_vector.hpp | SparseVector struct: holds (term_id, weight) pairs, packs/unpacks them to a compact binary format |
| sparse_storage.hpp | SparseVectorStorage: public API — open the DB, store/delete/search vectors, manage transactions |
| inverted_index.hpp | InvertedIndex class declaration, on-disk structs (BlockHeader, PostingListHeader), iterator |
| inverted_index.cpp | All the logic — search algorithm, block merge/save/load, quantization, SIMD helpers, pruning |
ndd::idInt is uint32_t — this is the document ID type used throughout.
A sparse vector is just two parallel arrays:
struct SparseVector {
std::vector<uint32_t> indices; // term IDs, sorted ascending
std::vector<float> values; // weights, aligned with indices
};When stored in MDBX, vectors are packed as:
[nr_nonzero : uint16_t] [term_ids : nr_nonzero × uint32_t] [values : nr_nonzero × fp16]
nr_nonzeroisuint16_t, so max 65535 terms per vector.- Values are stored as IEEE FP16 (half-precision float) in the raw document table. Conversion is done inline (
float_to_fp16/fp16_to_float). - Constructor
SparseVector(const uint8_t*, size_t)unpacks;pack()repacks.
This is the class users interact with. It wraps the MDBX environment and exposes:
SparseVectorStorage storage("/path/to/db");
storage.initialize(); // opens MDBX env, creates DBIs, loads term cacheMDBX is opened with flags MDBX_NOSTICKYTHREADS | MDBX_NORDAHEAD | MDBX_LIFORECLAIM, max size 1TB, max 10 named databases.
Two named databases (DBIs) are created:
sparse_docs— raw vector store, keyed bydoc_id(integer key)blocked_term_postings— inverted index blocks, keyed by packed(term_id, block_nr)(integer key)
// Single insert via transaction
auto txn = storage.begin_transaction();
txn->store_vector(doc_id, sparse_vec);
txn->commit();
// Batch insert (preferred — fewer transactions)
storage.store_vectors_batch({{doc_id1, vec1}, {doc_id2, vec2}, ...});Insert order: raw vector is written to sparse_docs first, then the inverted index is updated. Both happen in the same MDBX write transaction.
storage.delete_vector(doc_id);
// or via transaction:
txn->delete_vector(doc_id);Delete order is reversed: read the raw vector, remove its terms from the inverted index, then delete the raw vector row.
auto results = storage.search(query_vec, k);
// returns vector<pair<doc_id, score>> sorted by score descending
// With a filter (only consider docs in the bitmap):
auto results = storage.search(query_vec, k, &roaring_filter);SparseVectorStorage has a shared_mutex:
- Writes (
store_vectors_batch,delete_vector) take an exclusive lock. - Search is delegated directly to
InvertedIndex, which has its ownshared_mutex(shared for search, exclusive for add/remove).
MDBX transactions and cursors are single-threaded — the search loop is not parallelized.
Every row in blocked_term_postings has a uint64_t key:
packed_key = (uint64(term_id) << 32) | uint64(block_nr)
This puts all rows for one term next to each other in MDBX's sorted key order, so you can seek to (term_id, 0) and iterate forward.
Reserved keys:
block_nr = UINT32_MAX→ this row is the metadata row for the term (storesPostingListHeader)(term_id = UINT32_MAX, block_nr = 0)→ the superblock — a single row storingSuperBlock(format version metadata)term_id = UINT32_MAXis otherwise a reserved sentinel, rejected by all code paths
Documents are partitioned into fixed-size blocks:
block_nr = doc_id / 65535
block_offset = doc_id % 65535 (uint16_t)
kBlockCapacity = 65535 (std::numeric_limits<uint16_t>::max()). This means block offsets fit in 16 bits.
Each MDBX row for (term_id, block_nr) stores exactly the postings from that term that fall into that block's doc-id range. This is the fundamental design choice — writes are block-local merges, not whole-list rewrites.
A single metadata row stored at key packPostingKey(UINT32_MAX, 0):
struct SuperBlock { // 1 byte, packed
uint8_t format_version; // must match settings::SPARSE_ONDISK_VERSION
};On initialize(), the inverted index calls validateSuperBlock() which:
- Reads the superblock row.
- If not found and the DB is empty (fresh) → writes a new superblock with
format_version = settings::SPARSE_ONDISK_VERSION. - If not found but the DB has existing rows → throws
std::runtime_error(legacy incompatible database). - If found but
format_version != settings::SPARSE_ONDISK_VERSION→ throwsstd::runtime_error(version mismatch).
This key doesn't interfere with normal iteration: loadTermInfo() filters out term_id == UINT32_MAX, and iterateTermBlocks() seeks to specific term IDs that are never UINT32_MAX.
Stored at key (term_id, UINT32_MAX):
struct PostingListHeader { // 12 bytes, packed
uint32_t nr_entries; // total entries across all blocks (including tombstones)
uint32_t nr_live_entries; // entries with value > 0
float max_value; // global max weight across all blocks
};Stored at key (term_id, block_nr):
[BlockHeader] [doc_offsets: n × uint16_t] [values: n × uint8_t (or float)]
struct BlockHeader { // 8 bytes, packed
uint16_t nr_entries;
uint16_t nr_live_entries;
float max_value; // block-local max weight
};doc_offsets[]are sorteduint16_tvalues — the offset within the block (doc_id % 65535).values[]are the posting weights, eitheruint8_t(default, quantized) orfloat(whenNDD_INV_IDX_STORE_FLOATSis defined).
By default, weights are quantized to uint8_t relative to the block's max value:
quantize(val, max_val) = round(val / max_val * 255)
clamped to [1, 255] ← 0 means deleted (tombstone)
dequantize(val, max_val) = val * (max_val / 255)
The value 0 is reserved as a tombstone marker — it means the entry has been deleted but not yet compacted out of the block.
If NDD_INV_IDX_STORE_FLOATS is defined at compile time, values are stored as raw float and no quantization happens. The value 0.0f (or ≤ 0) is still the tombstone.
std::unordered_map<uint32_t, float> term_info_; // term_id → global max weightPopulated at startup by loadTermInfo(), which scans all metadata rows. Updated incrementally during add/remove. Used by search to:
- Skip query terms that don't exist in the index.
- Compute upper bounds for pruning (
upper_bound = global_max * query_weight).
Given a batch of (doc_id, SparseVector) pairs:
-
Pivot to term-major order. Build a map:
term_id → [(doc_id, value), ...]. -
For each term:
- Sort updates by
doc_id, deduplicate (keep last value for duplicate doc_ids). - Split into sub-ranges by
block_nr. - For each
(term_id, block_nr)slice:loadBlockEntries()— read and decode the existing block (if any) into avector<PostingListEntry>.- Merge the existing entries and incoming updates as two sorted streams (classic merge-sort merge).
- Recompute
new_live_countandnew_block_max. saveBlockEntries()— serialize and write the merged block back to MDBX (or delete the block if empty).
- Update the
PostingListHeader:- Adjust
nr_entriesandnr_live_entriesusing deltas. - If the old global max might have been invalidated (the block that held it now has a lower max), call
recomputeGlobalMaxFromBlocks()— a full scan of that term's block headers.
- Adjust
- Update
term_info_.
- Sort updates by
For each term in the deleted vector:
- Read the term's
PostingListHeader. - Compute which
block_nrthe doc falls in. loadBlockEntries()for that block.- Binary search for the
doc_id. - Set its value to
0.0f(tombstone). - If the tombstone ratio exceeds
INV_IDX_COMPACTION_TOMBSTONE_RATIO(default 10%), compact the block in-place (remove all tombstones). - Recompute block stats, save or delete the block.
- Update the
PostingListHeaderandterm_info_.
Search computes the top-k documents by dot-product score with the query vector. It works by streaming through posting lists in doc-id order, accumulating scores in a dense buffer, and maintaining a min-heap of the best results.
For each query term (term_id, query_weight):
- Skip if
query_weight ≤ 0or term not interm_info_. - Read the term's
PostingListHeader; skip if no live entries. - Open an MDBX cursor.
- Create a
PostingListIteratorand callinit():- Seeks to the first block for this term.
- Positions on the first live (non-tombstone) entry.
- If the iterator is not exhausted, keep it.
Search processes the doc-id space in windows of size INV_IDX_SEARCH_BATCH_SZ (default 10,000, configurable via NDD_INV_IDX_SEARCH_BATCH_SZ env var):
batch_start = min doc_id across all active iterators
batch_end = batch_start + batch_size - 1
A dense float array scores_buf[batch_size] is zeroed. Then for each iterator:
accumulateBatchScores<StoreFloats>() — the hot inner loop:
- Walk through the current block's
doc_offsets[]andvalues[]. - For each live entry within
[batch_start, batch_end]:local = block_base_doc_id - batch_start + offsetscores_buf[local] += dequantized_value * query_weight
- When the current block is exhausted,
loadNextBlock()and continue if still in range. - Track
remaining_entriesfor pruning.
Scan scores_buf. For each non-zero score above the current threshold:
- Reconstruct
doc_id = batch_start + local. - If a RoaringBitmap filter exists, skip docs not in the filter.
- Push into a min-heap of size
k. Update the threshold to the heap's minimum score.
- Remove exhausted iterators.
- If the heap is full and pruning is enabled (more than 1 iterator):
pruneLongest()finds the iterator with the mostremaining_entries.- Computes
upper_bound = global_max * query_weightfor that iterator. - If
upper_bound ≤ current_threshold, advance that iterator forward to the minimum doc_id among the other iterators (skipping a chunk of its posting list that can't contribute winners). - Only one list is pruned at a time.
Close all cursors, abort the read-only transaction. Pop the heap into a vector and reverse it so results are in descending score order.
A cursor-backed streaming iterator over one term's posting list. It never loads the entire list into memory — it reads one block at a time via zero-copy MDBX pointers.
Key methods:
| Method | What it does |
|---|---|
init() |
Seek to first block, position on first live entry |
loadFirstBlock() |
MDBX_SET_RANGE to (term_id, 0), skip empty blocks |
loadNextBlock() |
MDBX_NEXT, stop when term_id changes or metadata row reached |
parseCurrentKV() |
Validate key, parse block payload into BlockView, set up zero-copy pointers |
advanceToNextLive() |
Skip tombstones in current block (uses SIMD for uint8 mode), load next block if needed |
next() |
Move to next live entry |
advance(target_doc_id) |
Block-aware seek — skip whole blocks if target is ahead, lower_bound within a block |
valueAt(idx) |
Dequantize and return the weight at position idx |
upperBound() |
global_max * term_weight — used for pruning decisions |
A zero-copy view into an MDBX value. Pointers are only valid while the cursor stays on the same record and the transaction is alive.
struct BlockView {
const uint16_t* doc_offsets; // sorted block-local offsets
const void* values; // uint8_t* or float* depending on mode
uint32_t count;
uint8_t value_bits; // 8 or 32
float max_value; // block-local max (needed for dequantization)
};Two SIMD-accelerated functions with implementations for AVX-512, AVX2, NEON, and SVE2 (plus scalar fallback):
findNextLiveSIMD(values, size, start_idx)— finds the next non-zero byte in auint8_tarray. Used byadvanceToNextLive()to skip tombstones quickly.findDocIdSIMD(doc_ids, size, start_idx, target)— finds the firstuint32_t≥ target. Currently not used by the main search path but available.
| Flag | Effect |
|---|---|
NDD_INV_IDX_STORE_FLOATS |
Store block values as float instead of uint8_t. No quantization. |
ND_SPARSE_INSTRUMENT |
Enable timing instrumentation for search and update paths. Call printSparseSearchDebugStats() / printSparseUpdateDebugStats() to dump. |
NDD_INV_IDX_PRUNE_DEBUG |
Track how many entries each iterator skipped via pruning. Logged after search. |
| Setting | Default | Description |
|---|---|---|
INV_IDX_SEARCH_BATCH_SZ |
10,000 | Size of the dense scoring window. Configurable via NDD_INV_IDX_SEARCH_BATCH_SZ env var. |
INV_IDX_COMPACTION_TOMBSTONE_RATIO |
0.10 | When this fraction of a block's entries are tombstones during delete, compact in-place. |
NEAR_ZERO |
1e-9 | Epsilon for float comparisons. |
SPARSE_ONDISK_VERSION |
1 | Format version written to the superblock. Checked on load; mismatch throws. |
User code
│
┌──────────▼──────────┐
│ SparseVectorStorage │ ← public API, owns MDBX env
│ shared_mutex │
└──┬──────────────┬───┘
│ │
┌──────────▼──┐ ┌──────▼──────────┐
│ sparse_docs │ │ InvertedIndex │ ← owns blocked_term_postings DBI
│ (MDBX DBI) │ │ shared_mutex │
│ │ │ term_info_ │ ← in-memory cache
│ doc_id → │ └──┬──────────┬───┘
│ packed vec │ │ │
└─────────────┘ write search
│ │
┌─────────▼─┐ ┌─────▼──────────────┐
│ term-major │ │ PostingListIterator │
│ block-local│ │ (cursor-backed, │
│ merge │ │ zero-copy blocks) │
└────────────┘ └─────────────────────┘
Several ideas could further improve performance:
- Allow a configurable fraction of the lowest-weight query terms to be ignored at search time. This reduces the number of posting-list iterators, which can improve latency while preserving most of the recall. Users can tune the fraction on a per-query basis to balance speed and accuracy.