perf(zql): fuse fetch pipeline, add PK fast path, reduce allocations#5612
Draft
Karavil wants to merge 6 commits intorocicorp:mainfrom
Draft
perf(zql): fuse fetch pipeline, add PK fast path, reduce allocations#5612Karavil wants to merge 6 commits intorocicorp:mainfrom
Karavil wants to merge 6 commits intorocicorp:mainfrom
Conversation
|
Someone is attempting to deploy a commit to the Rocicorp Team on Vercel. A member of the Team first needs to authorize it. |
f6e8d2a to
f90f254
Compare
This was referenced Feb 25, 2026
d35cb73 to
332d920
Compare
added 6 commits
February 25, 2026 06:10
…euse
Two allocation reduction optimizations for the IVM push hot path:
1. Shared EMPTY_RELATIONSHIPS sentinel: Replace per-node {} allocation
with a frozen shared object, reducing GC pressure during fetch and push.
2. Reuse outputChange objects in genPush: Pre-allocate reusable objects
and mutate row fields before yielding, instead of creating new objects
per connection.
Object reuse is safe because filterPush consumers are synchronous within
the generator chain.
Cache frequently recomputed values to avoid repeated JSON.stringify and map lookups on hot paths: - Cache #primaryIndexKey in constructor (avoid JSON.stringify per call) - Cache pkConstraint on Connection (avoid recomputing from filters) - Cache #getOrCreateIndex results per connection (avoid repeated lookups) Part of IVM pipeline perf optimizations that reduced page freeze from ~7.7s to <1s in a production app.
…imization Optimize hot comparison paths in the IVM pipeline: * Add compareStringUTF8Fast for ASCII-fast string comparison with UTF-8 fallback * Reorder compareValues to check strings before nulls (most common type) * Add single-key fast path in makeComparator avoiding loop overhead * Add single-key fast path in makeBoundComparator with fully inlined comparison * Fix compareBounds null handling for nullable database columns
332d920 to
be6b85b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fuse the MemorySource
#fetchgenerator pipeline, add a direct PK lookup fast path, and add overlay fast paths.Motivation
MemorySource
#fetchis the entry point for every data scan in the IVM pipeline. The current implementation chains 5 generators together:generateRows->generateWithOverlay->generateWithStart->generateWithConstraint->generateWithFilter. Each generator adds a suspend/resume frame per row.In a workload with 135 IVM pipelines, each fetching ~200 rows, the generator frame overhead compounds: 5 frames x 200 rows x 135 pipelines = 135,000 generator suspend/resume cycles per page render. This was the single largest contributor to CPU time in our profiling.
Additionally, many fetches are single-row PK lookups (e.g., fetching a specific assignment by ID) that still go through the full 5-generator pipeline despite only ever returning 0 or 1 rows.
Changes
Generator fusion
generateFetchDirect): Replaces the 5-generator chain with a single generator that handles start position, constraint matching, and filter predicate in one loop. Eliminates 4 generator frame suspend/resume costs per row.generatePostOverlayFused): After overlay interleaving, fuses start + constraint + filter into a single generator. Reduces from 4 post-overlay generators to 1.PK fast path
BTree.get()O(log n) lookup instead of scanning the full index. Returns a single-element array or empty array, bypassing the generator pipeline entirely.Non-generator
#fetch*#fetch()from a generator function to a regular function returningIterable<Node | 'yield'>. The callers already consume it viafor...of, so this removes one generator frame with no behavioral change.Overlay fast paths
connectionComparator: usecompareRowsdirectly for non-reverse case, avoiding closure wrappingCode cleanup
as Value,as string) instead of repeatedascasts in comparator functionsgenerateWithConstraint,generateWithFilter,generateRows) superseded by fused generatorsExpected Performance Impact
The generator fusion is the single biggest win. For 135 IVM pipelines, each fetch previously went through 5 generator frames with suspend/resume overhead per row. The fused paths reduce this to 1 generator (no overlay) or 2 generators (with overlay), eliminating ~80% of generator frame overhead.
The PK fast path provides O(log n) direct lookup for single-row fetches, avoiding the entire generator pipeline. This is particularly impactful for join child fetches that look up individual rows by foreign key.
Combined with the full optimization series, these changes contributed to reducing page freeze from ~7.7s to <1s in a production scenario (45 parent rows x ~200 related rows, 135 IVM pipelines).
Testing
Stack Order
This PR is part of a stacked series of IVM performance optimizations. Merge in order:
Independent PRs (no conflicts): #5607 (BTree iterators), #5608 (Join optimizations)