perf(p99): eliminate ForcePublish blocking — lazy load 1-5s→actual disk time by JustMaier · Pull Request #197 · civitai/bitdex

JustMaier · 2026-04-13T06:28:29Z

Summary

Root cause found: After lazy-loading a filter/sort bitmap from disk, the query thread blocked on done_rx.recv_timeout(5s) waiting for the flush thread to process a ForcePublish command. With 431ms flush cycles, every lazy-load query blocked for the full cycle duration.

This was the hidden serialization point Justin identified — it explains why all trace phases appeared slow (lazy_load, pre_cache, docs, sort). Queries queued behind the flush thread, inflating every metric.

Fix

Apply loaded bitmaps directly to the published snapshot via ArcSwap::store(), then send to flush thread non-blocking (fire-and-forget). Query continues immediately with the updated snapshot.

Before: lazy_load → disk read (10ms) → send ForcePublish → block 431ms → resume
After: lazy_load → disk read (10ms) → direct publish (1ms) → resume immediately

Safety

ArcSwap::store is atomic — safe for concurrent publishers
Flush thread still receives data via lazy_tx channel on its next cycle
No data loss — flush thread applies it redundantly (idempotent)

Evidence

Traces show lazy_load_us at 1-5s when actual disk reads are 10-100ms. The 431ms flush_last_duration_nanos matches the unexplained gap. Queries with pre_cache_us: 1800ms, filter_us: 0, sort_us: 0 — all the time is between function entry and cache lookup, exactly where ForcePublish blocks.

Test plan

cargo check --features server passes
Deploy, check traces: lazy_load_us should drop from 1-5s to 10-100ms
Verify P99 improvement
Watch for any bitmap consistency issues (shouldn't happen — ArcSwap is atomic)

🤖 Generated with Claude Code

ROOT CAUSE: After lazy-loading a filter/sort value from disk, the query thread sent a ForcePublish command to the flush thread and blocked via done_rx.recv_timeout(5s) waiting for the response. With ~431ms flush cycles, every lazy-load query blocked for the entire flush duration. This was the hidden serialization point causing ALL trace phases to appear slow — lazy_load, pre_cache, docs, sort all inflated because queries piled up behind the flush thread's cycle. Fix: Apply loaded bitmaps directly to the published snapshot via ArcSwap::store(), then send to the flush thread non-blocking (fire-and-forget via lazy_tx). The query continues immediately with the updated snapshot instead of waiting for the flush thread. Safety: ArcSwap::store is atomic. The flush thread will also receive and apply the data on its next cycle. Two concurrent publishers is safe — the flush thread's next publish will include all lazy-loaded data from the lazy_tx channel. Expected: lazy_load_us should drop from 1-5s to the actual disk read time (10-100ms). pre_cache_us should drop to near-zero. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 03075cd.

… internal RwLock Previous attempt (PR #197, reverted) tried to clone + publish the entire InnerEngine from the query thread, which was O(all fields) and caused a 1.85s flush regression. New approach: FilterField::load_values() and load_field_complete() take &self and use internal RwLock — they can be called directly on the published snapshot's fields without cloning the engine or publishing. Loaded bitmaps become visible to all readers immediately through the shared Arc<FilterField>. Key changes: - Filter loads (load_field_complete, load_values): apply directly to the current snapshot, skip ForcePublish entirely. Fire-and-forget send to flush thread for its staging copy. - Sort loads (load_layers needs &mut self): still use ForcePublish round-trip to flush thread. This is safe because: - FilterField::bitmaps is RwLock<HashMap> — internal mutation is sound - The flush thread also applies the same data from lazy_tx (idempotent) - No engine clone, no ArcSwap::store race Expected: filter lazy_load_us drops from 1-5s (ForcePublish block) to 10-100ms (actual disk read time). Sort lazy loads still block but are much rarer than filter loads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… internal RwLock (#198) Previous attempt (PR #197, reverted) tried to clone + publish the entire InnerEngine from the query thread, which was O(all fields) and caused a 1.85s flush regression. New approach: FilterField::load_values() and load_field_complete() take &self and use internal RwLock — they can be called directly on the published snapshot's fields without cloning the engine or publishing. Loaded bitmaps become visible to all readers immediately through the shared Arc<FilterField>. Key changes: - Filter loads (load_field_complete, load_values): apply directly to the current snapshot, skip ForcePublish entirely. Fire-and-forget send to flush thread for its staging copy. - Sort loads (load_layers needs &mut self): still use ForcePublish round-trip to flush thread. This is safe because: - FilterField::bitmaps is RwLock<HashMap> — internal mutation is sound - The flush thread also applies the same data from lazy_tx (idempotent) - No engine clone, no ArcSwap::store race Expected: filter lazy_load_us drops from 1-5s (ForcePublish block) to 10-100ms (actual disk read time). Sort lazy loads still block but are much rarer than filter loads. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JustMaier merged commit 03075cd into main Apr 13, 2026
1 check failed

JustMaier added a commit that referenced this pull request Apr 13, 2026

Revert "perf(p99): eliminate ForcePublish blocking on lazy load (#197)"

1ff7e8e

This reverts commit 03075cd.

JustMaier mentioned this pull request Apr 13, 2026

perf(p99): skip ForcePublish for filter lazy loads (v2, no engine clone) #198

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(p99): eliminate ForcePublish blocking — lazy load 1-5s→actual disk time#197

perf(p99): eliminate ForcePublish blocking — lazy load 1-5s→actual disk time#197
JustMaier merged 1 commit intomainfrom
ivy/eliminate-force-publish-block

JustMaier commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JustMaier commented Apr 13, 2026

Summary

Fix

Safety

Evidence

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant