perf(p99): skip ForcePublish for filter lazy loads (v2, no engine clone) by JustMaier · Pull Request #198 · civitai/bitdex

JustMaier · 2026-04-13T06:51:02Z

Summary

Replaces reverted PR #197 with a safe approach that doesn't clone InnerEngine.

Filter lazy loads are the most common per_value_lazy path (postId, modelVersionIds, tagIds). The old path blocked on ForcePublish (~431ms) waiting for the flush thread. PR #197 tried to clone + publish the engine (caused 1.85s regression).

New approach: FilterField::load_values() takes &self with internal RwLock — call it directly on the published snapshot's fields. No engine clone, no ArcSwap::store, no ForcePublish. Loaded bitmaps are immediately visible to all readers.

Sort loads still use ForcePublish (rare — load_layers needs &mut self).

Why this is safe

FilterField::bitmaps is RwLock<HashMap<u64, VersionedBitmap>> — designed for concurrent access
Flush thread also gets the data via lazy_tx (idempotent, applies on next cycle)
No new publishers, no engine clone, no race with flush thread's store()

What failed in PR #197

(*current).clone() deep-cloned InnerEngine on the query thread — O(all fields), 1.85s on 109M records. Caused flush to spike from 431ms to 1.85s.

Expected impact

Filter lazy_load_us drops from 1-5s → actual disk read time (10-100ms).

Test plan

cargo check --features server passes
Deploy, check traces: lazy_load_us on filter loads should drop dramatically
Verify flush_last_duration_nanos stays ~431ms (no regression)
Verify filter bitmaps are correct (queries return same results)

🤖 Generated with Claude Code

… internal RwLock Previous attempt (PR #197, reverted) tried to clone + publish the entire InnerEngine from the query thread, which was O(all fields) and caused a 1.85s flush regression. New approach: FilterField::load_values() and load_field_complete() take &self and use internal RwLock — they can be called directly on the published snapshot's fields without cloning the engine or publishing. Loaded bitmaps become visible to all readers immediately through the shared Arc<FilterField>. Key changes: - Filter loads (load_field_complete, load_values): apply directly to the current snapshot, skip ForcePublish entirely. Fire-and-forget send to flush thread for its staging copy. - Sort loads (load_layers needs &mut self): still use ForcePublish round-trip to flush thread. This is safe because: - FilterField::bitmaps is RwLock<HashMap> — internal mutation is sound - The flush thread also applies the same data from lazy_tx (idempotent) - No engine clone, no ArcSwap::store race Expected: filter lazy_load_us drops from 1-5s (ForcePublish block) to 10-100ms (actual disk read time). Sort lazy loads still block but are much rarer than filter loads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JustMaier merged commit 38cee84 into main Apr 13, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(p99): skip ForcePublish for filter lazy loads (v2, no engine clone)#198

perf(p99): skip ForcePublish for filter lazy loads (v2, no engine clone)#198
JustMaier merged 1 commit intomainfrom
ivy/direct-filter-load-v2

JustMaier commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JustMaier commented Apr 13, 2026

Summary

Why this is safe

What failed in PR #197

Expected impact

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant