Skip to content

perf(p99): skip ForcePublish for filter lazy loads (v2, no engine clone)#198

Merged
JustMaier merged 1 commit intomainfrom
ivy/direct-filter-load-v2
Apr 13, 2026
Merged

perf(p99): skip ForcePublish for filter lazy loads (v2, no engine clone)#198
JustMaier merged 1 commit intomainfrom
ivy/direct-filter-load-v2

Conversation

@JustMaier
Copy link
Copy Markdown
Contributor

Summary

Replaces reverted PR #197 with a safe approach that doesn't clone InnerEngine.

Filter lazy loads are the most common per_value_lazy path (postId, modelVersionIds, tagIds). The old path blocked on ForcePublish (~431ms) waiting for the flush thread. PR #197 tried to clone + publish the engine (caused 1.85s regression).

New approach: FilterField::load_values() takes &self with internal RwLock — call it directly on the published snapshot's fields. No engine clone, no ArcSwap::store, no ForcePublish. Loaded bitmaps are immediately visible to all readers.

Sort loads still use ForcePublish (rare — load_layers needs &mut self).

Why this is safe

  • FilterField::bitmaps is RwLock<HashMap<u64, VersionedBitmap>> — designed for concurrent access
  • Flush thread also gets the data via lazy_tx (idempotent, applies on next cycle)
  • No new publishers, no engine clone, no race with flush thread's store()

What failed in PR #197

(*current).clone() deep-cloned InnerEngine on the query thread — O(all fields), 1.85s on 109M records. Caused flush to spike from 431ms to 1.85s.

Expected impact

Filter lazy_load_us drops from 1-5s → actual disk read time (10-100ms).

Test plan

  • cargo check --features server passes
  • Deploy, check traces: lazy_load_us on filter loads should drop dramatically
  • Verify flush_last_duration_nanos stays ~431ms (no regression)
  • Verify filter bitmaps are correct (queries return same results)

🤖 Generated with Claude Code

… internal RwLock

Previous attempt (PR #197, reverted) tried to clone + publish the entire
InnerEngine from the query thread, which was O(all fields) and caused a
1.85s flush regression.

New approach: FilterField::load_values() and load_field_complete() take
&self and use internal RwLock — they can be called directly on the
published snapshot's fields without cloning the engine or publishing.
Loaded bitmaps become visible to all readers immediately through the
shared Arc<FilterField>.

Key changes:
- Filter loads (load_field_complete, load_values): apply directly to
  the current snapshot, skip ForcePublish entirely. Fire-and-forget
  send to flush thread for its staging copy.
- Sort loads (load_layers needs &mut self): still use ForcePublish
  round-trip to flush thread.

This is safe because:
- FilterField::bitmaps is RwLock<HashMap> — internal mutation is sound
- The flush thread also applies the same data from lazy_tx (idempotent)
- No engine clone, no ArcSwap::store race

Expected: filter lazy_load_us drops from 1-5s (ForcePublish block) to
10-100ms (actual disk read time). Sort lazy loads still block but are
much rarer than filter loads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JustMaier JustMaier merged commit 38cee84 into main Apr 13, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant