perf(zql): O(N+M) batch merge path for view change application#5604
Draft
Karavil wants to merge 1 commit intorocicorp:mainfrom
Draft
perf(zql): O(N+M) batch merge path for view change application#5604Karavil wants to merge 1 commit intorocicorp:mainfrom
Karavil wants to merge 1 commit intorocicorp:mainfrom
Conversation
|
Someone is attempting to deploy a commit to the Rocicorp Team on Vercel. A member of the Team first needs to authorize it. |
d99c12c to
e921d15
Compare
Problem: applyChanges loops sequentially, calling applyChange per row. Each applyChange does a binary search + splice, making bulk updates O(N*M) where N is view size and M is change count. At 50K changes this takes >1s. Solution: O(N+M) merge-sort batch path that walks the existing sorted view array and sorted changes simultaneously, producing a new merged array in a single pass. Always attempted first (no threshold gate); falls back to sequential for unsupported change types (child changes, sort-key-changing edits, singular format, hidden schemas). Also batches ArrayView hydration: collects all initial adds into a single applyChanges call instead of per-row applyChange. Works with the immutable applyChange API (returns new Entry, respects mutate parameter). The batch path always builds a fresh array since we construct it from scratch via merge, so immutability is naturally handled. Benchmarks (batch vs sequential): 5 changes: 1.3x faster 10 changes: 1.7x faster 50 changes: 2.2x faster 100 changes: 2.5x faster 500 changes: 3.5x faster 1,000 changes: 4.5x faster 5,000 changes: 13x faster 10,000 changes: 21x faster 50,000 changes: 433x faster Stack overflow prevention: the old sequential path used array.splice() which could blow the call stack at >125K elements. The batch path builds a flat array with push(), avoiding this entirely.
e90b29e to
8db97d4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
applyChangesloops sequentially, callingapplyChangeper row. Each call does a binary search + array copy (immutable) or splice (mutable), making bulk updates O(N*M) where N is view size and M is change count. At 50K changes this takes over 1 second.Additionally,
ArrayView#hydrate()applies each initial row one at a time, creating N intermediate array copies during initial load.Solution
O(N+M) merge-sort batch path in
applyChanges:Falls back to sequential
applyChangeloop for:ArrayView hydration batching: collects all initial adds into a single
applyChangescall instead of per-rowapplyChange.Works with the immutable
applyChangeAPI: the batch path always builds a fresh array since we construct it from scratch via merge, so immutability is naturally handled.No threshold gate
Benchmarks show the batch path is faster at every batch size, even 2 changes. There is no threshold: we always attempt the batch path first and fall back only if the change types are unsupported.
Stack overflow prevention
The sequential path with immutable mode creates intermediate arrays via
toSpliced()for each change. The batch path builds a flat array withpush(), avoiding this entirely. This matters at >125K elements wherearray.push(...spread)would blow the stack.Benchmarks
Test plan
applyChangetests pass (both zql and zqlite-zql-test)ArrayViewtests passapplyChangesbatch tests covering: