sparse strips: Lazily push wide tile layer buffers by tomcur · Pull Request #1414 · linebender/vello

tomcur · 2026-02-03T21:59:13Z

This continues on #1403, and is based on/closes #1383.

In coarse.rs we are doing O(viewport_width x viewport_size) work for pushing and popping buffers for each layer needing buffers. PR #1403 already ensures we aren't actually sending commands for buffer blending through to rendering anymore for empty buffers. This PR makes command generation lazy so that we also don't do the push_buf work for the entire viewport anymore, but only for the wide tiles that actually get drawn on.

Some operations still fall back to pushing buffers for the full viewport, like filter layers and destructive blends. We may be able to do away with some of that in the future, but for now, those operations require buffers even if they don't have content (doing away or tightening the condition causes various tests to fail or asserts to be hit).

This continues on linebender#1403, and is based on/closes linebender#1383. In `coarse.rs` we are doing `O(viewport_width x viewport_size)` work for pushing and popping buffers for each layer needing buffers. PR linebender#1403 already ensures we aren't actually sending commands for buffer blending through to fine rendering for empty buffers anymore. This PR makes command generation lazy so that we also don't do the `push_buf` work for the entire viewport anymore, but only for the wide tiles that actually get drawn on. Some operations still fall back to pushing buffers for the full viewport, like filter layers and destructive blends. We may be able to do away with some of that in the future, but for now, those operations require buffers even if they don't have content (doing away or tightening the condition causes various tests to fail or asserts to be hit).

tomcur

I've done a bit of micro-optimizing to keep the cost low for the worst-case where we're actually drawing a (nearly) dense layer.

Benchmarks in the case of layers look as follows. The first group times wide when drawing the content three layers deep at the content's natural size. Ghostscript Tiger is nearly completely dense, whereas Paris-30k is quite sparse. The tiger regresses as we're now doing the ensure_layer_stack_bufs and end up doing the work for almost all tiles, so it would've been faster to just push all the buffers beforehand, but Paris-30k is much improved.

The second group does the same but draws at 4k. That's much bigger than Tiger's natural size, and so this case also improves a lot. (Paris-30k is still better, but becomes relatively more dense than at its natural, larger size).

coarse_with_layer/Ghostscript_Tiger
                        time:   [5.8025 µs 5.8357 µs 5.8685 µs]
                        change: [+5.3922% +5.8851% +6.4095%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high mild
coarse_with_layer/paris-30k
                        time:   [620.76 µs 623.32 µs 626.41 µs]
                        change: [-54.967% -52.845% -50.795%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  1 (2.00%) high mild
  3 (6.00%) high severe

coarse_with_layer_4k/Ghostscript_Tiger
                        time:   [107.11 µs 107.33 µs 107.58 µs]
                        change: [-73.934% -73.739% -73.575%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 50 measurements (12.00%)
  5 (10.00%) high mild
  1 (2.00%) high severe
coarse_with_layer_4k/paris-30k
                        time:   [340.78 µs 342.86 µs 345.49 µs]
                        change: [-40.644% -39.706% -38.685%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 50 measurements (8.00%)
  1 (2.00%) high mild
  3 (6.00%) high severe

In case there are no layers at all, we end up never having to push buffers, so there we also needlessly pay for every call. In the sparse Paris-30k case it's not too bad.

coarse/Ghostscript_Tiger
                        time:   [3.2046 µs 3.2230 µs 3.2462 µs]
                        change: [+3.2502% +4.2128% +5.1102%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe
coarse/paris-30k        time:   [356.05 µs 357.00 µs 358.15 µs]
                        change: [+0.5005% +1.5403% +2.4935%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 50 measurements (6.00%)
  2 (4.00%) high mild
  1 (2.00%) high severe

tomcur · 2026-02-19T23:00:41Z

(Getting rid of the subtraction (n_bufs - n_clip) makes no substantial difference.)

LaurenzV · 2026-02-20T06:56:39Z

Hmm, I guess some of those gains do look pretty nice. If we decide to merge some variant of #1454, the "no layer" case would also become irrelevant since we would completely skip coarse rasterization (at least for vello_hybrid).

However, would like to hear what @taj-p or @grebmeg think before reviewing this PR more deeply, as it does add some more complexity.

tomcur added the C-sparse-strips Applies to sparse strips variants of vello in general label Feb 3, 2026

tomcur force-pushed the lazy-push-buf branch from 4810356 to 300168a Compare February 3, 2026 22:04

tomcur force-pushed the lazy-push-buf branch from 300168a to 684b673 Compare February 19, 2026 20:35

tomcur added 5 commits February 19, 2026 22:57

Fix test

8fea696

Micro-optimize

9fea19b

Add some basic coarse rasterization benchmarks

8063077

Fix comment

0ec8020

Clippy

874a575

tomcur commented Feb 19, 2026

View reviewed changes

Remove comment

6746f6d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparse strips: Lazily push wide tile layer buffers#1414

sparse strips: Lazily push wide tile layer buffers#1414
tomcur wants to merge 7 commits intolinebender:mainfrom
tomcur:lazy-push-buf

tomcur commented Feb 3, 2026 •

edited

Loading

Uh oh!

tomcur left a comment

Uh oh!

tomcur commented Feb 19, 2026

Uh oh!

LaurenzV commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

tomcur commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomcur left a comment

Choose a reason for hiding this comment

Uh oh!

tomcur commented Feb 19, 2026

Uh oh!

LaurenzV commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

tomcur commented Feb 3, 2026 •

edited

Loading