feat: DumpMergeWriter + merge-aware compaction#148
Conversation
…zations Multi-phase dump correctness: - DocOp::Merge variant: merges fields into existing docs instead of replacing - All dump phases use Merge for object-level writes (fixes data loss bug) - Tags post-pass: bitmap inversion writes one Merge per slot (4.5B→109M ops) - 10 unit tests for Merge semantics (roundtrip, accumulate, delete+resurrect) Pipeline performance (StreamingDocWriter fixes): - BufWriter 256→8192 bytes on new shard creation (2x throughput improvement) - Hardware CRC32 via crc32fast (replaces software byte-at-a-time table) - Remove per-shard fsync in finalize (saves 20-80s per phase) - Background enrichment drop (50s blocking → non-blocking) - Mmap explicit drop after parse (zombie RSS 83GB→24GB) DataSilo crate (crates/datasilo/): - Generic mmap'd key-value store: 35M writes/sec, 23M reads/sec - ParallelWriter with atomic bump + 1MB thread-local regions - OpsLog with CRC32 append + replay on startup - Compaction (replay ops → rewrite data file) - 6 unit tests passing Server endpoints: - POST /time-buckets/rebuild: rebuild from sort field data + cache clear - GET /dictionaries: reverse maps for LCS/MappedString fields - GET /ui-config: serves YAML as JSON for config-driven UI Config-driven UI (static/index.html): - Dynamic filter/sort controls from engine metadata + YAML overrides - Card rendering with image URL templates, badges, meta fields - Detail modal with configurable fields, display types, formats - URL state sync for bookmarkable/shareable filter states - Civitai UI config (deploy/configs/civitai/ui-config.yaml) Design docs: - docs/design/docop-merge.md (GPT + Gemini reviewed) - docs/design/datasilo-implementation-plan.md (full migration plan) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Frozen query path (Task #33): - BitmapSilo frozen accessors: get_frozen_filter(), get_frozen_sort_layer() - mark_filters_backed() / mark_sorts_backed() — startup marks bitmaps as unloaded placeholders, reads from mmap at query time - QueryExecutor: get_effective_bitmap() + and_effective_bitmap() helpers with frozen fallback for all filter ops (Eq, In, NotEq, NotIn, Or, Range) - Sort traversal: bifurcate_frozen(), apply_cursor_filter_frozen(), reconstruct_value_frozen() — frozen layers from BitmapSilo mmap - ConcurrentEngine holds BitmapSilo behind RwLock, passes to executor Aggressive V2 retirement (~15K lines removed): - Removed lazy loading: pending_filter_loads, pending_sort_loads, lazy_value_fields, ensure_fields_loaded(), LazyLoad enum - Removed eviction: eviction_stamps, eviction_total, idle sweep - Removed existence sets: existing_keys - Deleted bitmap_memory_cache.rs, bitmap_fs.rs, bound_store.rs, doc_cache.rs, field_handler.rs, preset.rs, shard_store*.rs (4 files) - Removed FilterField::load_field_complete(), load_values(), clear_bases_and_unload() - Cleaned up 47 stale TODO comments (49→2) - Deleted 8 dead test stubs, un-ignored 3 tests (0 ignored remaining) 635 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove dead SortField wrappers: bifurcate(), order_results(), apply_cursor_filter() — only frozen variants remain - Remove dead FlushCommand fields: skip_lazy, cursors, dictionaries - Remove dead docstore_root field from ConcurrentEngine - Clean unused imports across datasilo, concurrent_engine, executor - 635 tests passing, 0 failed, 0 ignored Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Arc<RoaringBitmap> on VersionedBitmap.base was for ArcSwap CoW snapshot publishing. With V3 frozen mmap, published snapshots read bases from BitmapSilo mmap, making the Arc unnecessary overhead. - base: Arc<RoaringBitmap> → base: RoaringBitmap - Removed from_arc() constructor - Simplified merge(), or_into_base(), load_base() — direct mutation - Updated all .base().as_ref() call sites to .base() - diff: Arc<BitmapDiff> stays (still needed for swap_diff) 635 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Temporary get_shard/get_shard_packed shims on DocSiloAdapter to unblock server compilation. These will be replaced with proper get_document() API that reads from mmap + applies pending ops. 635 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Zero references to ShardStore, DocStoreV3, BitmapFs, doc_cache, bound_store, or field_handler remain in the codebase. - Removed DocCacheConfigEntry struct + doc_cache config field - Removed 8 dead doc_cache metrics from metrics.rs - Removed evict_doc_cache() + doc_cache_stats() stub methods - Removed doc_cache metric scraping from server.rs - Updated all comments from V2 system names to V3 (DataSilo/BitmapSilo) - Updated test assertions from DocStoreV3 to DataSilo 635 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DocSink and Ingester<B> were V2 abstractions never used in production. Keep BitmapSink trait, CoalescerSink, AccumSink (actively used). 631 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DataSilo::delete(key) appends Delete tombstone to ops log - get_with_ops() respects delete tombstones (returns None) - Cold compaction: deleted keys excluded from output data file - Hot compaction: deleted keys have index entry zeroed out - OpsLog::for_each_ops() yields full SiloOp (Put + Delete) - Delete CRC validation in for_each() - 4 new tests: cold delete, hot delete, get_with_ops delete, delete+reinsert 29 datasilo tests passing, 631 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge thread now only does: 1. DataSilo compaction when dirty (apply pending doc ops) 2. RSS-aware memory pressure eviction Removed: unused inner clone, time_buckets capture, cursors capture, suppress-unused hacks. Named the thread "bitdex-merge". 631 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added compact_threshold to SiloConfig (default 0.20 = 20%) - Added dead_bytes counter on DataSilo (AtomicU64) - Hot compaction tracks dead bytes from deletes (zeroed index entries) and relocating updates (overflows where old slot becomes dead) - Cold compaction resets dead_bytes to 0 (full rewrite) - Added dead_bytes(), dead_ratio(), needs_compaction() accessors - BitmapSilo uses compact_threshold=0.0 (bitmaps rewritten in full) 29 datasilo tests, 631 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cold compaction now uses mmap for both data and index writes: 1. Compute entry layouts sequentially (offsets are cumulative) 2. Pre-allocate data file at exact size and mmap it 3. Write entries via pointer copy to pre-computed offsets Each entry targets a unique non-overlapping region, ready for parallel writes (rayon) when needed. Currently sequential but the infrastructure is in place — just change .for_each to .par_iter().for_each() when rayon is added to datasilo. 29 datasilo tests, 631 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deleted pg_sync/backfill.rs entirely (no external callers) - Deleted pg_sync/csv_ops.rs entirely (no external callers) - Removed apply_ops_batch_dump + process_wal_dump from ops_processor.rs - Removed 7 dead parse_*_row functions from copy_queries.rs (kept parse_post_row, parse_model_version_row, parse_model_row) - Removed associated dead types: CopyImageRow, CopyResourceRow, CopyMetricRow 631 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New src/cache_silo.rs: - CacheEntryData: serializable subset of UnifiedEntry (bitmap, metadata, sorted_keys) - Binary format v1: fixed header + variable bitmap + optional sorted_keys - hash_unified_key(): folds 64-bit hash to u32 for DataSilo key - save_entry/delete_entry: append to ops log - load_all: scan ops log + data file for last-write-wins restore - compact: delegates to DataSilo compaction Wiring in ConcurrentEngine: - cache_silo field (Arc<RwLock<CacheSilo>>) - Startup: open + load_all from bitmap_path/cache_silo/ - Flush thread: drain_dirty_for_silo() → save dirty entries after cache maintenance - Merge thread: compact CacheSilo when dead space exceeds threshold UnifiedCache additions: - drain_dirty_for_silo(): collects dirty entries as (key_hash, CacheEntryData) 8 new CacheSilo tests, 639 total tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CacheSilo restore fix: - Added UnifiedKey serialization to CacheEntryData binary format (v2) - Added key field to CacheEntryData (encode/decode round-trips the key) - Wired actual restore path: load_all → from_cache_entry_data → insert_restored_entry - Added UnifiedEntry::from_cache_entry_data() constructor - begin_restore/finish_restore for batch eviction Dead enrichment code removal: - Removed PostEnrichment, MvEnrichment, ModelEnrichment structs - Removed load_posts_enrichment, load_mv_enrichment, load_model_enrichment - Removed CopyPostRow, CopyModelVersionRow, CopyModelRow + parse functions - Removed dead helper functions (is_null, parse_opt_*, parse_bool, parse_i64_fast) 639 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
compact_hot() previously truncated the single ops log after compaction, losing any ops written during the compaction window. Fix: two ops log slots (ops_a.log, ops_b.log) with atomic swap. Protocol: 1. Freeze active slot, redirect writes to other slot (atomic xor) 2. Compact data from frozen slot 3. Truncate frozen slot only after data+index fully flushed Legacy migration: existing ops.log renamed to ops_a.log on first open. Tests: test_ab_swap_no_ops_lost, test_legacy_ops_log_migration. 31 datasilo tests, 639 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs fixed in compact_hot_from():
1. Reader blocking: old code dropped self.data_mmap during compaction,
causing get() to return None. Fix: write to data.bin.tmp while old
mmap stays alive, then rename over data.bin.
2. Data/index interleaving: old code wrote data AND updated index in
same loop body. Crash mid-loop = corrupt state. Fix: three strict
phases — classify (read-only), write data (tmp file), update index
(only after data flushed).
Dead-space accounting also fixed: captures old_allocated during the
read-only classification pass before any mutations.
Tests: test_hot_compact_does_not_drop_read_mmap_early,
test_hot_compact_data_before_index_sequential_rounds
33 datasilo tests, 639 lib tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- BitmapSilo compact_threshold: 0.0 → 0.20 (20% dead space triggers) - Added compact() and needs_compaction() to BitmapSilo - Merge thread now round-robins across doc, cache, and bitmap silos - bitmap_silo_arc created early for sharing with merge thread 639 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scarlet audit: previous hot compaction copied the ENTIRE data file to a temp file on every cycle — 25GB memcpy at 107M docs. Fix: two-tier approach: - In-place updates: seek+write to existing data.bin at allocated offsets - Overflows: append to end of existing data.bin (old slot = dead space) - Full file rewrite only when dead_ratio > compact_threshold (separate pass) - Never copy the entire file for routine compaction No temp file, no rename. data_mmap remaps only when file grows (overflows). In-place path doesn't touch the mmap at all — readers unblocked throughout. 33 datasilo tests, 639 lib tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BitmapSilo true silo (Phase 2 foundation): - Ops encoding: OP_SET_BIT (0x01) and OP_CLEAR_BIT (0x02) for individual bit mutations, alongside existing full frozen bitmap format - Mutation methods: filter_set/clear, sort_set/clear, alive_set/clear — append 5-byte ops to silo ops log - Ops-on-read: get_filter_with_ops, get_sort_layer_with_ops, get_alive_with_ops — read frozen base + scan ops for pending set/clear, apply inline - DataSilo.scan_ops_for_key() — scan both A-B logs for all ops on a key Dead stubs cleanup (Phase 5 partial): - Deleted memory_pressure.rs + all references - Deleted get_rss_bytes() + Windows/Linux FFI from concurrent_engine - Deleted dead stubs: boundstore_*, preload_*, build_all_from_docstore, rebuild_fields_from_docstore, add_fields_from_docstore, etc. - Merge thread: removed RSS eviction loop (no heap data to evict) - Removed rebuild_on_boot from server.rs 636 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
get_effective_bitmap now reads from BitmapSilo first (frozen base + pending silo ops), then merges with in-memory VersionedBitmap diffs for mutations not yet written to the silo. During the Phase 2→4 transition both sources may have data; union combines them. and_effective_bitmap simplified to delegate to get_effective_bitmap. 636 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added send_mutation_ops() helper that dual-writes every MutationOp to both the BitmapSilo ops log (V3 path) and the coalescer channel (V2 path, removed in Phase 4). All 6 mutation entry points wired. Filter, sort, AND alive mutations all go to the silo ops log: - FilterInsert/Remove → silo.filter_set/clear per slot - SortSet/Clear → silo.sort_set/clear per slot - AliveInsert/Remove → silo.alive_set/clear per slot Combined with the executor ops-on-read from the previous commit, this means the silo now has complete mutation data AND reads apply it. The coalescer/ArcSwap path is now redundant (Phase 4 removes it). 636 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Query path now checks CacheSilo before UnifiedCache: 1. Hash UnifiedKey → key_hash 2. If not in UnifiedCache, try cache_silo.get_entry(key_hash) 3. On silo hit: promote to UnifiedCache via from_cache_entry_data 4. Downstream logic (sorted_keys, radix, bucket diffs) works unchanged New: CacheSilo.get_entry(key_hash) — single-key read via get_with_ops New: silo_hits metric for tracking cross-restart cache effectiveness Write path unchanged: flush thread drain_dirty_for_silo still handles persistence 4 new get_entry tests. 640 total tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- BitmapSilo: RwLock on name_to_key for concurrent key auto-creation (new bitmap values auto-assign silo keys instead of silently skipping) - send_mutation_ops(): skip coalescer when bitmap_silo exists (mutations go ONLY to silo ops log for engines with a silo) - get_effective_bitmap(): simplified to silo-first, VB fallback for tests - Removed V2 lazy-load test (tested flush thread mechanics, N/A with silo) Phase 4 foundation: with silo-only mutations, the coalescer path is now dead for production engines. Tests without silos still use it. 640 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MutationOp and MutationSender now live in mutation.rs (their natural home) instead of write_coalescer.rs. Updated all imports across concurrent_engine, ingester, ops_processor. write_coalescer.rs now imports from mutation.rs — preparation for deleting the coalescer. 640 tests passing, 0 failed, 0 ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WriteCoalescer batching system replaced by direct silo ops log writes. Flush thread uses local FlushBatch struct for remaining staging updates. MutationOp + MutationSender already moved to mutation.rs. FilterGroupKey moved to unified_cache.rs. 615 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
UnifiedCache replaced entirely by CacheSilo: - Query path reads cache via CacheSilo.get_entry() only - No in-memory HashMap, no radix sort index, no LRU tracking - UnifiedKey moved to cache_silo.rs - Flush thread live maintenance removed (~1,800 lines from concurrent_engine) - Prefetch worker removed - Cache stats/metrics simplified to CacheSilo-only Total removed this commit: ~5,200 lines (unified_cache.rs + flush thread code) 561 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FlushCommand (ForcePublish, SyncUnloaded, ExitLoadingSaveUnload) and the cmd_tx/cmd_rx command channel are gone. loading_mode AtomicBool and all enter/exit methods removed. - enter_loading_mode() / exit_loading_mode() → no-ops - exit_loading_mode_and_save_unload() → just calls save_snapshot() - save_and_unload() → calls publish_staging directly - Flush thread simplified: no command handling, no loading mode checks - 2 V2 tests deleted (loading mode timing tests) 559 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ArcSwap<InnerEngine> replaced with direct RwLock fields: - slots: Arc<RwLock<SlotAllocator>> - filters: Arc<RwLock<FilterIndex>> - sorts: Arc<RwLock<SortIndex>> Queries hold read locks. Flush thread holds write locks for mutation application only. No more staging clone, no snapshot publishing. Bulk-load paths (clone_staging/publish_staging) still work via read-lock clone → offline build → write-lock swap. 559 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes the following methods that are no longer part of the engine API: - put_via_wal, patch_document_via_wal (WAL write path — superseded by ops pipeline) - put_inner (inlined into put()) - patch, patch_document (PATCH semantics — use PUT for all writes) - sync_filter_values (filter_only sync — use PUT for all writes) - put_many, put_bulk, put_bulk_loading, put_bulk_into (bulk loading) - spawn_docstore_writer, write_docs_to_docstore (docstore helpers) - apply_accum (BitmapAccum apply — superseded by apply_bitmap_maps) - wal_writer field and set_wal_writer (WAL path removed) Keeps: put(), delete(), clone_staging(), publish_staging(), apply_bitmap_maps() — these are still used by dump_processor, loader, and remove_fields. Server PATCH and filter_sync handlers now return 501 Not Implemented. Removes 7 tests that covered the deleted methods. Benchmark "bulk" stage replaced with a no-op placeholder. cargo check --lib: 0 errors cargo check --features server,pg-sync: 0 errors cargo test --lib: 548 passed, 0 failed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Converted concurrent_engine.rs to directory module. Extracted 6 query methods to src/concurrent_engine/query.rs: - query(), execute_query(), execute_query_impl() - execute_query_traced(), execute_query_with_collector() - resolve_filters(), post_validate() concurrent_engine/mod.rs is now the engine struct + construction + mutations. concurrent_engine/query.rs is the query execution path. 548 tests passing, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sent When has_silo is true, the flush thread skips acquiring write locks on FilterIndex/SortIndex/SlotAllocator and applying mutations — those go directly to BitmapSilo instead. Also removes the dead stale_fields collection (collected then immediately cleared with no reads between) and the flush_cycle counter + COMPACTION_INTERVAL constant which were only used for dead compaction logic. Shutdown/final-drain path mirrors: apply only when !has_silo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
refactor: skip in-memory bitmap apply when BitmapSilo present
…lo (#140) feat: add FieldRegistry + deterministic u64 key encoding for BitmapSilo
…ewrite save_snapshot to use live silo - Remove pending_bucket_diffs field from ConcurrentEngine — it was only passed to the flush thread on startup and never read back. The flush thread still owns and updates the ArcSwap<PendingBucketDiffs> locally. - Rewrite save_snapshot to use the live BitmapSilo when present. Instead of reading all in-memory bitmaps and calling save_all_parallel (which re-serialized everything), the ops-on-read path only needs to flush metadata (slot_counter, cursors) and compact the ops log. Added BitmapSilo::save_meta() for this purpose. - save_and_unload now calls save_snapshot first, ensuring metadata is persisted before the in-memory state is reset. - Mark BitmapSilo::save_all as #[cfg(test)] — it's only used in test fixtures to create frozen silo snapshots. Production path is write_dump_maps or the ops-on-read individual set/clear ops. - Delete snapshot_public from ConcurrentEngine — was only used by the benchmark to get a sort field handle. Replace with reconstruct_sort_value public method on ConcurrentEngine. - Fix benchmark.rs import path: bitdex_v2::concurrent_engine -> bitdex_v2::engine. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ap_maps Delete InnerEngine struct, clone_staging(), publish_staging(), and apply_bitmap_maps() — the staging clone-then-swap pattern required cloning all three live indexes (filters, sorts, slots) just to build bitmaps offline, adding memory pressure during bulk loads. Replace with engine.merge_bitmap_maps(filter_maps, sort_maps, alive) which applies pre-built bitmaps directly to the live state under brief write locks. This is equivalent to the old staging pattern but without the full InnerEngine clone. Update loader.rs Stage 3 to use merge_bitmap_maps directly instead of accumulating into a staging snapshot then publishing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove cache_silo from FlushArgs — the flush thread no longer does any cache maintenance (that moved to the CacheSilo/ops-on-read path). CacheSilo is still owned by ConcurrentEngine and used by the query path. - Remove flush_cycle counter and COMPACTION_INTERVAL constant from the flush thread — filter diff compaction was removed in a prior session and these variables were just incrementing with no effect. - Remove has_alive_mutations() and mutated_filter_fields() from FlushBatch — neither was called anywhere in the codebase. - Fix save_snapshot fallback: when no live silo exists (engine started fresh with no prior snapshot), fall back to save_all_parallel to serialize the in-memory state to a new silo file. This restores the test_save_snapshot_* test behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Delete copy_field_arc_from() from both FilterIndex and SortIndex — these were used by the old save_and_unload path when preserving field Arcs across the staging swap. No callers remain in the codebase. Delete total_bitmap_count() from FilterIndex — never called outside the dead snapshot V2 code path. Suppress dead_code warning on ConcurrentEngine::doc_tx — the field is a channel sender used by the test put() helper to drive docstore writes through the flush thread, so it is a real field. The linter can't see the test usage due to cfg(test) scoping. Also remove the stale ConcurrentEngine ArcSwap doc-comment banner — we no longer publish immutable snapshots via ArcSwap (removed in prior sessions). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
refactor: delete V2 in-memory infrastructure (InnerEngine, staging, dead methods)
…poch gaps) Fix 1: alive_count() and reconstruct_sort_value() in ConcurrentEngine now read from BitmapSilo when present instead of stale in-memory SlotAllocator/SortIndex. Fix 2: bump_field_epochs() now tracks AliveInsert/AliveRemove under the "__alive__" field key. Cache seeding in query.rs now includes the __alive__ epoch so inserts/deletes correctly invalidate cached results. Fix 3: QueryExecutor::alive_count() now derives from alive_bitmap() via the OnceCell, ensuring per-query consistency with the cached alive set. Fix 4: merge_bitmap_maps() now routes writes to the silo via write_dump_maps() when a BitmapSilo is present (skipping stale in-memory writes), and falls back to the in-memory path only when no silo is configured. Fix 5: Flush thread time bucket maintenance now uses frozen_reconstruct_value() from the silo when has_silo is true, instead of reading from the in-memory SortIndex that is not updated on the silo path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix: 5 correctness bugs from external review (stale reads, cache epochs, silo bypass)
…Silo
Part 1 — DataSilo u32 → u64 key upgrade:
- ops log frame format: [tag:1][key:4][len:4][value][crc32:4] →
[tag:1][key:8][len:4][value][crc32:4] (breaking format change)
- Replace flat array index with HashIndex (open-addressed mmap hash table)
that already uses u64 keys — eliminates the 2^32 slot limit
- All public API signatures updated: append_op, append_ops_batch, delete,
get, get_with_ops, write_batch_parallel, scan_ops_for_key
- compact_cold_from / compact_hot_from rewritten for HashIndex
- CacheSilo: hash_unified_key returns u64 (no fold), load_all uses
iter_index_keys instead of 0..index_cap iteration
- DocSiloAdapter: slot as u64 casts at call sites
Part 2 — FieldRegistry + deterministic u64 keys in BitmapSilo:
- Add field_registry: parking_lot::Mutex<FieldRegistry> to BitmapSilo
- Load/create FieldRegistry in open(); save alongside manifest
- Replace format!("filter:{}:{}") + ensure_key() with
ensure_field_id() + encode_filter_key(field_id, value) in all
mutation methods (filter_set/clear, sort_set/clear, save_all,
save_all_parallel, write_dump_maps, ParallelBitmapWriter)
- Frozen accessors (get_frozen_filter, get_frozen_sort_layer) and
ops-on-read (get_filter_with_ops, get_sort_layer_with_ops) now use
FieldRegistry encoding for point lookups
- Enumeration paths (load_filters, load_sorts, filter_entries,
filter_values_for_field, mark_filters_backed, mark_sorts_backed)
retain legacy name_to_key manifest — populated by all write paths
to maintain backward-compat enumeration
- All 450 lib tests pass, zero warnings in main crate
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: DataSilo u64 keys + FieldRegistry wiring into BitmapSilo
…Silo Cherry-pick of 4f8d4da from feat/datasilo-u64-wiring (PR #144). Part 1 — DataSilo u32 → u64 key upgrade: - ops log frame format: [tag:1][key:4][len:4][value][crc32:4] → [tag:1][key:8][len:4][value][crc32:4] - Replace flat array index with HashIndex (open-addressed mmap hash table) that already uses u64 keys — eliminates the 2^32 slot limit - All public API signatures updated: append_op, append_ops_batch, delete, get, get_with_ops, write_batch_parallel, scan_ops_for_key - CacheSilo: hash_unified_key returns u64 (no fold), load_all uses iter_index_keys - DocSiloAdapter: slot as u64 casts at call sites Part 2 — FieldRegistry + deterministic u64 keys in BitmapSilo: - Add field_registry: parking_lot::Mutex<FieldRegistry> to BitmapSilo - Load/create FieldRegistry in open(); save alongside manifest - Replace format!("filter:{}:{}") + ensure_key() with ensure_field_id() + encode_filter_key(field_id, value) in all mutation methods (filter_set/clear, sort_set/clear, save_all, save_all_parallel, write_dump_maps, ParallelBitmapWriter) - Enumeration paths retain legacy name_to_key manifest (populated by all write paths) for backward-compat enumeration - Fix dead_code warning on REGION_SIZE (test-only constant) Dump processor flow verified: dump builds bitmaps → passes String field names to write_dump_maps → write_dump_maps calls ensure_field_id() (FieldRegistry.ensure()) to get u16 field_id → encode_filter_key(field_id, value) / encode_sort_key(field_id, bit_idx) → u64 silo key → DataSilo. No changes needed in dump_processor.rs or loader.rs — the API surface (HashMap<String, ...> inputs) is unchanged; key encoding moved inside write_dump_maps. cargo check --lib: clean (0 errors, 0 warnings) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: DataSilo u64 keys + FieldRegistry wiring + dump processor V3 verified
DataSilo's write_put, write_put_reuse, and append_ops_batch APIs were upgraded to u64 keys, but dump_processor.rs still passed u32 slot values. Fixes all 8 sites: ThreadResult type alias, doc_ops/all_doc_ops/mv_ops Vec types, write_put_reuse/write_put call sites, and test Vec declarations. Resolves all 5 E0308 compile errors under `--features "server,pg-sync"`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix: u32→u64 casts for DataSilo callers in dump_processor (server+pg-sync)
- filter_field_to_idx and filter_tuples use u16 instead of u8, removing the silent 255-field cap - serialize_frozen_into errors now logged instead of silently discarded From FOLLOWUP.md review items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_entries=0 - bitdex.default.toml: rayon_threads = 24 (optimal for 16-core, avoids HT contention) - server.rs: reads rayon_threads from TOML, sets RAYON_NUM_THREADS env var at startup - query.rs: max_entries=0 or max_bytes=0 disables caching entirely Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Multi-value fields (tagIds, toolIds, techniqueIds) now use the same per-row Merge doc op path as standard fields. Each row emits a single-element Mi([value]) array; compaction concatenates arrays for the same slot via the existing Mi merge semantics. Eliminates: - is_multi_value_only detection (hardcoded field name list) - Bitmap inversion post-pass (~50 lines) that rebuilt per-slot arrays from the merged filter bitmaps after parse - Separate parallel/sequential write paths for multi-value doc ops Adds: - DumpFieldValue::MultiInt variant + write_field_multi_int encoding - DocValueType::MultiInt in the compiled doc field plan - multi_value_fields HashSet built from config FilterFieldType::MultiValue 450 tests pass. Net -35 lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…phase The unified per-row Merge doc op path (f842a72) caused 4.5B doc ops for the tags phase, hitting 70GB RSS in 25 seconds. Multi-value-only phases (tags, tools, techniques) must skip per-row doc ops and use the bitmap inversion post-pass instead (~108M ops vs 4.5B). Detection now uses config-driven multi_value_fields HashSet instead of the old hardcoded field name list. The DumpFieldValue::MultiInt variant and DocValueType::MultiInt are kept for future use when per-slot accumulation during parse becomes feasible. 450 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…post-pass All dump phases now stream doc ops directly to the DocSilo mmap via ParallelOpsWriter during parse. No in-memory Vec accumulation. Multi-value fields emit Mi([value]) per row; compaction concatenates per slot. The is_multi_value_only flag and bitmap inversion post-pass are fully removed. This eliminates ~50 lines of special-case code. Fix for prior OOM: the parallel ops writer mmap size estimate now uses 40 bytes/row for single-field multi-value phases (tags: ~30 bytes actual) vs 400 bytes/row for standard phases. This prevents over-allocation while ensuring the mmap is large enough for streaming writes. 450 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1: Slot ID 0 mapped to DataSilo key 0, which is the HashIndex empty sentinel. Fix: slot_to_key(slot) = slot + 1. Applied in DocSiloAdapter and all dump_processor parallel write sites. Bug 2: Parallel ops writer mmap size was estimated as body.len()/100 rows, but tags CSV has ~14 byte lines (not 100), so 63GB/100 = 630M estimated vs 4.5B actual. Fix: sample first 4KB to compute actual average line length. Tags now estimates ~4.5B rows × 40 bytes = ~180GB. 450 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Multi-value CSV rows (tags, tools, techniques) are sorted by imageId. Instead of writing one Merge(Mi([tagId])) per row (4.5B ops for tags), accumulate values while the slot stays the same and flush one Merge(Mi([all_tags])) when the slot changes. 4.5B per-row ops → ~109M per-slot ops. Mmap estimate drops from ~180GB to ~5GB. Compaction RSS from ~104GB to manageable. Standard phases (images) are unaffected — they use the direct per-row path since each row has a unique slot. 450 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…action Eliminates the ops log + compaction bottleneck for doc writes during dump. After the images phase creates data.bin via write_batch_parallel, subsequent phases (tags, tools, techniques, resources) use DumpMergeWriter to read existing doc records, merge Mi arrays in-place, and write back — no ops log, no cold compact needed. Also fixes a correctness bug where compaction used last-write-wins for duplicate keys, silently dropping earlier Merge ops. DocSiloAdapter now sets a merge function that concatenates Mi fields during compaction. Changes: - DumpMergeWriter: striped-lock (1024 buckets) concurrent read-modify-write - HashIndex::update_existing_concurrent: thread-safe entry update - merge_encoded_docs: decode two doc records, fuse Mi arrays, re-encode - DataSilo::set_merge_fn: optional merge callback for compaction - compact_cold_merge: merge-aware cold compaction path - compact_hot_from: merge duplicate ops + merge with existing data file - dump_processor: auto-detects DumpMergeWriter availability per phase 11 new tests (3 DumpMergeWriter, 2 merge_encoded_docs, 3 compaction, 3 compat) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JustMaier
left a comment
There was a problem hiding this comment.
PR #148 Review: DumpMergeWriter + merge-aware compaction
Overall this is well-structured -- the dump merge writer is a sound optimization (bypass ops log for subsequent phases), and the merge-aware compaction fixes a real correctness bug. The test coverage is solid for the happy paths. I found several issues that need attention before merge, ranging from potential data corruption to silent data loss.
1. Critical: Read/write mmap aliasing on the same file (UB risk)
prepare_dump_merge creates a second MmapMut (the write mmap) on the same data.bin file that self.data_mmap (the read mmap) already maps. The DumpMergeWriter then reads from read_ptr (the DataSilo read-only mmap) and writes to write_ptr (the new writable mmap).
Two mmaps on the same file from the same process is not undefined behavior per se, but the OS makes no guarantees about visibility timing between them. On Linux, both mappings typically share the same page cache, so writes through write_ptr are visible through read_ptr immediately. On Windows, this is less guaranteed -- separate MapViewOfFile calls may or may not share pages.
This matters because merge_put reads existing data from read_ptr, merges, and writes to write_ptr. If a prior merge_put wrote to the same slot (which should not happen in normal dump flow but could happen with retry logic or duplicate keys in CSV), the read would see stale data from the read mmap.
Recommendation: Document the single-write-per-key invariant explicitly in the safety contract. Consider whether the read path should use write_ptr instead of read_ptr to guarantee read-your-own-writes, which would eliminate the aliasing concern entirely -- the read mmap would only be needed for its length.
2. High: update_existing_concurrent writes partial fields without atomicity
The update_existing_concurrent method writes offset, length, and allocated as three separate copy_nonoverlapping calls. If a concurrent reader (e.g., a get() call from another thread) reads the entry between writes, it could see an inconsistent state -- new length with old offset, or vice versa.
During dump phases this is likely safe because queries are not running, but this method is pub and its safety contract only says "no two threads call this with the same key simultaneously" -- it says nothing about concurrent readers. A get() on a key being updated could observe torn reads.
Recommendation: Either (a) write all three fields in a single copy_nonoverlapping of 16 bytes (offset+length+allocated are contiguous at +8..+24), or (b) document in the safety contract that concurrent reads of the same key are also unsafe.
3. Medium: merge_encoded_docs error silently falls back to new.to_vec()
Throughout dump_processor.rs, the merge callback is:
|existing, new| {
merge_encoded_docs(existing, new)
.unwrap_or_else(|_| new.to_vec())
}If decoding fails (corrupted existing data, truncated write, etc.), this silently drops the existing doc and replaces it with just the new fragment. For a tags phase, this means the base image doc fields (userId, sortAt, etc.) would be silently lost and replaced with just Mi([tagId]).
Same pattern in doc_silo_adapter.rs line ~798 for the compaction merge function.
Recommendation: At minimum, log the error with the key so it is diagnosable. Better: count decode failures in an atomic counter on DumpMergeWriter (like overflow_count) so the post-phase summary reports them.
4. Medium: Hot compaction merge order -- correct but needs a comment
In compact_hot_from (around line 1024-1033), after merging ops together, the code merges with existing data as:
*new_value = merge_fn(existing_bytes, new_value);This calls merge_fn(existing_from_data_file, merged_ops_value). In merge_encoded_docs, new_data fields replace existing fields (except Mi which concatenates). So the argument order matters: the "new" ops should be in the new_data position, and the data file entry should be in the existing position. This is correct.
However, the ops-internal merge also uses the same argument convention: first op is "existing", second op is "new". If both ops set the same non-Mi field, the second op wins (correct LWW within an ops batch). Then the full merge treats the data file as "existing" and the ops-merged value as "new", so ops correctly override data file values. This is correct but subtle -- a comment clarifying the argument order semantics in compact_hot_from would prevent future mistakes.
5. Medium: compact_cold_merge calls write_batch_parallel which replaces the entire data file
compact_cold_merge collects all merged ops into a HashMap, then calls write_batch_parallel to write them. write_batch_parallel creates a new data.bin and index.bin from scratch. This is correct for cold compaction (no existing data file), but if called when there IS an existing data file (which should not happen because compact() routes to compact_hot_from when data exists), it would overwrite the data file with only the ops entries, losing all other keys.
This is protected by the routing logic in compact(), but it is a latent footgun. Consider adding a debug assertion in compact_cold_merge that self.data_mmap.is_none().
6. Low: flush() is a no-op
DumpMergeWriter::flush() returns Ok(()) with a comment saying "the mmap will flush on drop." This is misleading -- callers might expect flush() to actually persist data. The MmapMut is stored in _write_mmap but flush() cannot access it through the raw pointer.
Recommendation: Either make flush() actually call self._write_mmap.flush() (change _write_mmap to not be prefixed with _ and access it directly), or remove the method and document that data is persisted on drop.
7. Low: Linear scan in merge_encoded_docs field lookup
if let Some(entry) = fields.iter_mut().find(|(f, _)| *f == field_idx) {This is O(n) per field in the new record, making the merge O(n*m) where n = existing fields and m = new fields. For doc records with ~15 fields this is negligible. Fine as-is, just noting it.
8. Nit: merge_encoded_docs_into is declared but never called
The zero-allocation variant merge_encoded_docs_into that writes into a caller-provided buffer is defined but all call sites use merge_encoded_docs (which allocates). Was this intended for future use from the DumpMergeWriter hot path?
Summary
The architecture is sound -- DumpMergeWriter for direct in-place writes during dump phases is a clear win over the ops log + compaction path. The merge-aware compaction fix is necessary for correctness. Main concerns:
- Use
write_ptrfor reads too (or document the aliasing invariant) - Single 16-byte write in
update_existing_concurrent(torn read risk) - Do not silently swallow decode errors in the merge callback
flush()should actually flush or be removed
Tests are thorough for the new code paths. The fallback to ops log when DumpMergeWriter is None looks correct -- all existing collect_doc_op call sites pass None for the new parameter in the non-merge path.
JustMaier
left a comment
There was a problem hiding this comment.
Code Review: DumpMergeWriter + merge-aware compaction
Good structural approach — bypassing the ops log for subsequent dump phases is the right move for throughput, and the striped-lock design is straightforward. Several issues below, ranging from a potential soundness problem to minor nits.
1. Dual-mmap aliasing in DumpMergeWriter (soundness concern)
prepare_dump_merge() opens a second MmapMut on the same data.bin file that the DataSilo already holds a read-only Mmap for (self.data_mmap). The DumpMergeWriter then:
- Reads from
read_ptr(the DataSilo's existing read mmap) - Writes to
write_ptr(the new writable mmap)
Both point to the same file. On Linux, the kernel maps the same physical pages for both, so writes through write_ptr are visible through read_ptr — this works. On Windows, behavior is less guaranteed with independent mmap handles on the same file.
More importantly: there is no memory barrier between the write via write_ptr and the subsequent read via read_ptr in merge_put. If thread A writes slot X, then thread B reads slot X (different stripe bucket because slot X hashes differently for B's merge), the write may not be visible through the read mmap. In practice the stripe lock provides ordering for same-key access, and different-key access reads different offsets, so this is likely safe — but the aliasing of two separate mmaps over the same file is technically UB under Rust's aliasing rules.
Suggestion: Consider using a single MmapMut and reading from the same pointer (cast to *const u8 for reads). This eliminates the aliasing question entirely. The read mmap in DataSilo isn't needed during dump since no queries run concurrently.
2. merge_encoded_docs slot extraction assumes Merge op layout
let slot = if existing.len() >= 5 {
u32::from_le_bytes(existing[1..5].try_into().unwrap())
} else {
0
};This hard-codes that byte 0 is the op tag and bytes 1..5 are the slot. It works for OP_TAG_MERGE records, but:
- No validation of byte 0: If
existingis a Create op or corrupted data, slot extraction silently returns a wrong value. - Fallback to slot=0: If
existing.len() < 5, slot becomes 0 — butslot_to_keymaps slot 0 to key 1 (via SLOT_KEY_OFFSET), so this wouldn't cause a key=0 sentinel collision. However, a record claiming to be slot 0 when the real slot is something else is still a silent data corruption.
Suggestion: Either assert/validate that existing[0] == OP_TAG_MERGE, or extract slot from the decode_doc_fields path (which already parses the op). The perf cost of one extra byte check is negligible.
3. merge_encoded_docs_into allocates despite the name
The doc comment says "Zero allocation except for the field Vec decode" — but decode_doc_fields calls decode_doc_op which allocates Vec<(u16, PackedValue)> and potentially allocates for string/Mi values. Then encode_merge_fields_into allocates the encoded output. In the hot path (dump_processor calling merge_put), the merge_fn closure calls merge_encoded_docs (not _into), which also allocates the return Vec<u8>.
This is not a bug, but the naming is misleading. Consider either:
- Actually making
_intozero-alloc (reuse a thread-local field vec), or - Dropping the "zero allocation" claim from the doc
Also: merge_encoded_docs_into is defined but never called — all callsites use merge_encoded_docs. Dead code?
4. Hot compact merge ordering — verified correct
In compact_hot_from, ops are accumulated left-to-right (merge_fn(accumulated, new_op)), then the accumulated result is merged with the data file (merge_fn(data_file, accumulated)). For Mi concatenation this yields [data_file_values, op1_values, op2_values, ...] — chronologically correct. The test confirms: "base+add1+add2". Good.
5. flush() is a no-op (misleading API)
pub fn flush(&self) -> io::Result<()> {
Ok(())
}This silently succeeds without flushing anything. Callers who call flush() expecting data to be durable will be misled. The mmap is stored as _write_mmap — you could expose a method that calls self._write_mmap.flush() instead.
Suggestion: Either make it actually flush, or remove the method and document that callers must drop the writer to trigger the mmap flush.
6. Overflow handling: silent data loss in dump_processor
When merge_put returns false (overflow — merged data exceeds allocated buffer), the dump_processor silently continues:
mw.merge_put(key, &doc_encode_buf, |existing, new| {
crate::silos::doc_format::merge_encoded_docs(existing, new)
.unwrap_or_else(|_| new.to_vec())
});The return value is discarded at every callsite. The overflow count is logged at the end of the phase, but by then the data is lost — those slots will have incomplete Mi arrays.
Suggestion: At minimum, collect overflow keys into a Vec and fall back to ops-log writes for those entries. Or, if the overflow rate is expected to be zero in practice (because buffer_ratio provides enough headroom), add an assertion or at least a per-row warning log so the first overflow is immediately visible.
7. unwrap_or_else in merge_fn swallows decode errors
This pattern appears 5+ times:
doc_format::merge_encoded_docs(existing, new)
.unwrap_or_else(|_| new.to_vec())If decode_doc_fields fails on the existing record (corrupt data, truncated write), the merge silently replaces the entire existing record with just the new data. This loses all previously accumulated fields for that slot.
For dump phases where data integrity matters, consider at least logging when this fallback triggers. A decode failure on an existing record in data.bin suggests something went wrong in a prior phase.
8. collect_doc_op allocates on the merge_writer path
The merge_writer branch in collect_doc_op uses encode_merge_fields (allocating) instead of encode_merge_fields_into (reusing doc_buf):
if let Some(mw) = merge_writer {
let bytes = crate::silos::doc_format::encode_merge_fields(slot, &fields);
mw.merge_put(key, &bytes, ...);
}The parallel-ops-writer path reuses doc_buf via encode_merge_fields_into. For consistency and throughput, the merge writer path should do the same — especially since this runs in a rayon hot loop at 100M+ rows.
9. Minor: compact_cold_merge uses write_batch_parallel
let batch: Vec<(u64, Vec<u8>)> = entries.into_iter().collect();
let count = self.write_batch_parallel(&batch)?;write_batch_parallel rebuilds the entire data file and index from scratch. For cold compaction that is fine (there is no existing data file, or it is being replaced). But the doc comment doesn't make it clear that cold-merge is a full rewrite. Worth a one-line comment.
Also: the entries HashMap iteration order is nondeterministic. If write_batch_parallel cares about key ordering (e.g., for sequential I/O during reads), this could cause suboptimal layout. Probably fine for dumps, but worth noting.
10. Things done well
- Striped lock design is clean and well-documented. 1024 stripes with
key % Nis a solid choice for rayon workloads. - Test coverage is good: basic, overflow, 1000-entry concurrent, cold merge, hot merge, and LWW backwards compat. The hot merge test validates the non-obvious three-way merge ordering.
- Fallback strategy (merge writer when data.bin exists, ops log otherwise) handles the images-phase bootstrap correctly.
- Code organization: keeping
merge_encoded_docsindoc_format.rsand the merge plumbing indatasilois the right separation of concerns. - The encode_dump_merge hoisting in dump_processor (encoding once, then choosing write path) simplifies the control flow vs the old duplicated encode calls.
Summary
Must fix before merge:
- Issue #6 (overflow return value silently discarded — data loss)
- Issue #5 (flush() is a no-op — either implement or remove)
Should fix:
- Issue #2 (validate op tag before slot extraction)
- Issue #7 (log when decode fallback triggers)
- Issue #8 (use encode_merge_fields_into for merge writer path)
Consider:
- Eliminate dual-mmap aliasing: single MmapMut for both reads and writes - Single 16-byte copy in update_existing_concurrent (prevent torn reads) - Log decode errors in merge callbacks instead of silent fallback - Add decode_error_count to DumpMergeWriter stats - Make flush() actually call mmap.flush() - Remove unused merge_encoded_docs_into Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ollect_doc_op - merge_encoded_docs now validates OP_TAG_MERGE/CREATE before slot extraction (returns error on corrupt data instead of silently using wrong slot) - collect_doc_op merge_writer path reuses scratch buffer via encode_merge_fields_into (avoids per-call allocation, matches the parallel-ops-writer path) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The merge_fn caused ALL cold compacts to take the slow merge path (copies 5.7GB of ops data into HashMap). For the images phase dump (14M+ unique keys, no duplicates), the merge function is never called — we were just paying the copy cost for nothing. Now: zero-copy scan always runs first, detects duplicate keys, and only falls back to the merge-aware path when merging is actually needed. Images phase cold compact should be fast again. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
How it works
compact_after_dumpscreates data.bin + index.binprepare_dump_merge()returnsDumpMergeWritermerge_put()directly into data.bin (read existing → decode → concat Mi → re-encode → write in-place)Files changed
crates/datasilo/src/lib.rscrates/datasilo/src/hash_index.rssrc/silos/doc_format.rssrc/silos/doc_silo_adapter.rssrc/sync/dump_processor.rsTests
Test plan
🤖 Generated with Claude Code