Conversation
ROOT CAUSE: After lazy-loading a filter/sort value from disk, the query thread sent a ForcePublish command to the flush thread and blocked via done_rx.recv_timeout(5s) waiting for the response. With ~431ms flush cycles, every lazy-load query blocked for the entire flush duration. This was the hidden serialization point causing ALL trace phases to appear slow — lazy_load, pre_cache, docs, sort all inflated because queries piled up behind the flush thread's cycle. Fix: Apply loaded bitmaps directly to the published snapshot via ArcSwap::store(), then send to the flush thread non-blocking (fire-and-forget via lazy_tx). The query continues immediately with the updated snapshot instead of waiting for the flush thread. Safety: ArcSwap::store is atomic. The flush thread will also receive and apply the data on its next cycle. Two concurrent publishers is safe — the flush thread's next publish will include all lazy-loaded data from the lazy_tx channel. Expected: lazy_load_us should drop from 1-5s to the actual disk read time (10-100ms). pre_cache_us should drop to near-zero. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JustMaier
added a commit
that referenced
this pull request
Apr 13, 2026
This reverts commit 03075cd.
JustMaier
added a commit
that referenced
this pull request
Apr 13, 2026
… internal RwLock Previous attempt (PR #197, reverted) tried to clone + publish the entire InnerEngine from the query thread, which was O(all fields) and caused a 1.85s flush regression. New approach: FilterField::load_values() and load_field_complete() take &self and use internal RwLock — they can be called directly on the published snapshot's fields without cloning the engine or publishing. Loaded bitmaps become visible to all readers immediately through the shared Arc<FilterField>. Key changes: - Filter loads (load_field_complete, load_values): apply directly to the current snapshot, skip ForcePublish entirely. Fire-and-forget send to flush thread for its staging copy. - Sort loads (load_layers needs &mut self): still use ForcePublish round-trip to flush thread. This is safe because: - FilterField::bitmaps is RwLock<HashMap> — internal mutation is sound - The flush thread also applies the same data from lazy_tx (idempotent) - No engine clone, no ArcSwap::store race Expected: filter lazy_load_us drops from 1-5s (ForcePublish block) to 10-100ms (actual disk read time). Sort lazy loads still block but are much rarer than filter loads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
JustMaier
added a commit
that referenced
this pull request
Apr 13, 2026
… internal RwLock (#198) Previous attempt (PR #197, reverted) tried to clone + publish the entire InnerEngine from the query thread, which was O(all fields) and caused a 1.85s flush regression. New approach: FilterField::load_values() and load_field_complete() take &self and use internal RwLock — they can be called directly on the published snapshot's fields without cloning the engine or publishing. Loaded bitmaps become visible to all readers immediately through the shared Arc<FilterField>. Key changes: - Filter loads (load_field_complete, load_values): apply directly to the current snapshot, skip ForcePublish entirely. Fire-and-forget send to flush thread for its staging copy. - Sort loads (load_layers needs &mut self): still use ForcePublish round-trip to flush thread. This is safe because: - FilterField::bitmaps is RwLock<HashMap> — internal mutation is sound - The flush thread also applies the same data from lazy_tx (idempotent) - No engine clone, no ArcSwap::store race Expected: filter lazy_load_us drops from 1-5s (ForcePublish block) to 10-100ms (actual disk read time). Sort lazy loads still block but are much rarer than filter loads. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root cause found: After lazy-loading a filter/sort bitmap from disk, the query thread blocked on
done_rx.recv_timeout(5s)waiting for the flush thread to process aForcePublishcommand. With 431ms flush cycles, every lazy-load query blocked for the full cycle duration.This was the hidden serialization point Justin identified — it explains why all trace phases appeared slow (lazy_load, pre_cache, docs, sort). Queries queued behind the flush thread, inflating every metric.
Fix
Apply loaded bitmaps directly to the published snapshot via
ArcSwap::store(), then send to flush thread non-blocking (fire-and-forget). Query continues immediately with the updated snapshot.Before: lazy_load → disk read (10ms) → send ForcePublish → block 431ms → resume
After: lazy_load → disk read (10ms) → direct publish (1ms) → resume immediately
Safety
ArcSwap::storeis atomic — safe for concurrent publisherslazy_txchannel on its next cycleEvidence
Traces show lazy_load_us at 1-5s when actual disk reads are 10-100ms. The 431ms flush_last_duration_nanos matches the unexplained gap. Queries with
pre_cache_us: 1800ms, filter_us: 0, sort_us: 0— all the time is between function entry and cache lookup, exactly where ForcePublish blocks.Test plan
cargo check --features serverpasses🤖 Generated with Claude Code