perf(vello_common): Cull Bézier path elements during flattening by tomcur · Pull Request #1341 · linebender/vello

tomcur · 2026-01-02T22:59:06Z

On top of #1340.

tldr: If only some of Tiger's whiskers are visible, this results in -90% and -60% timings on flattening and tiling.

This conservatively checks whether Bézier path elements we're about to flatten are outside the viewport. If they are fully to the right, top, or bottom of the viewport, the Bézier does not impact pixel coverage or coarse winding at all, and can be ignored.

If it is fully to the left, it does impact pixel coverage and coarse winding, but only the element's start and endpoint y-values matter, not the exact shape, meaning we can just yield a line rather than finely flattening.

The following two Ghostscript Tigers have their viewboxes reduced to 50 50 100 100 and 90 90 20 20, down from 0 0 200 200. Their flattening time is reduced by 52% and 90% respectively, and their tiling time by 22% and 60%.

Expand for the Tiger files

Ghostscript_Tiger-viewboxed.svg

Ghostscript_Tiger-viewboxed-extreme.svg

If more or less everything ends up being in the viewport, the additional calculation is wasted and increases flattening time by ~3%. This is of course workload-dependent.

Flattening timings:

flatten/Ghostscript_Tiger
                        time:   [209.94 µs 210.21 µs 210.51 µs]
                        change: [+2.6850% +3.1753% +3.6309%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
flatten/Ghostscript_Tiger-viewboxed
                        time:   [97.189 µs 97.287 µs 97.399 µs]
                        change: [-52.787% -52.650% -52.514%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
flatten/Ghostscript_Tiger-viewboxed-extreme
                        time:   [19.722 µs 19.741 µs 19.761 µs]
                        change: [-90.311% -90.280% -90.255%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
flatten/paris-30k       time:   [12.740 ms 12.764 ms 12.788 ms]
                        change: [+2.6014% +3.3631% +4.0837%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Tiling timings:

tile/Ghostscript_Tiger  time:   [175.39 µs 175.79 µs 176.28 µs]
                        change: [-0.4403% -0.0016% +0.4400%] (p = 1.00 > 0.05)
                        No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
tile/Ghostscript_Tiger-viewboxed
                        time:   [78.932 µs 79.147 µs 79.409 µs]
                        change: [-23.209% -22.803% -22.369%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  1 (2.00%) high mild
  4 (8.00%) high severe
tile/Ghostscript_Tiger-viewboxed-extreme
                        time:   [13.378 µs 13.390 µs 13.405 µs]
                        change: [-60.417% -60.306% -60.199%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 50 measurements (12.00%)
  2 (4.00%) high mild
  4 (8.00%) high severe
tile/paris-30k          time:   [20.970 ms 21.001 ms 21.034 ms]
                        change: [-0.4881% -0.2397% +0.0108%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild

tomcur

I wonder whether we should track the axis-aligned bounding box of the active clip (and separately, we may want to provide a fast-path for rectangular clips). If we have that bounding box, we can generalize this viewport culling to clip culling during tiling as well as here. For some workloads that may provide quite a few more culling opportunities.

Second, a small point: the culling we do here and already did in tiling, means a paths' cached strips cannot be translated if any culling occurred. I believe some discussion was already held that we may want to move tiles and strips to i16 coordinates to support translation, and additionally, if paths are to be cached, we may not want to do any culling. Otherwise work will have to done to rerender if a path is translated.

tomcur · 2026-01-02T23:21:36Z

sparse_strips/vello_common/src/flatten_simd.rs

+                // The following checks two things. First, if the quadratic Bèzier is fully to the
+                // left of the viewport, it may affect pixel coverage and winding, but its exact
+                // shape does not matter. It can be emitted as a line segment [p0, p2].
+                //
+                // Second, an upper bound on the shortest distance of any point on the quadratic
+                // Bèzier curve to the line segment [p0, p2] is 1/2 of the maximum of the
+                // endpoint-to-control-point distances.
+                //
+                // The derivation is similar to that for the cubic Bezier (see below). In
+                // short:
+                //
+                // q}(t) = B0(t) p0 + B1(t) p1 + B2(t) p2
+                // dist(q(t), [p0, p1]) <= B1(t) dist(p1, [p0, p1])
+                //                       = 2 (1-t)t dist(p1, [p0, p1]).
+                //
+                // The maximum occurs at t=1/2, hence
+                // max(dist(q(t), [p0, p1] <= 1/2 dist(p1, [p0, p1])).
+                //
+                // A cheap upper bound for dist(p1, [p0, p1]) is max(dist(p1, p0), dist(p1, p2)).
+                //
+                // The following takes the square to elide the square root of the Euclidean
+                // distance.
+                else if [p0, p1, p2].into_iter().all(|p| p.x < 0.)
+                    || f64::max((p1 - p0).hypot2(), (p1 - p2).hypot2()) <= 4. * TOL_2
+                {
+                    callback.callback(LinePathEl::LineTo(p2));
+                } else {


A small observation: for the left-of-viewport case, this could in fact just return a vertical line, as it does not have to be watertight at all, but in terms of performance that isn't likely to buy us much. It's an additional branch here for a slight performance improvement during tiling.

This is just a logic change and slight simplification of the lines representation. By moving the burden of closing input subpaths to flattening itself, this will allow, e.g., culling geometry at the level of Béziers. Before, the closing of subpaths lived as a post-processing step after flattening. The actual change that starts using this for something more exciting is in #1341. This by itself does not really bring down timings (plus noise makes it hard to measure). ``` flatten/Ghostscript_Tiger time: [203.93 µs 204.28 µs 204.68 µs] change: [-2.1022% -1.8481% -1.5684%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 10 (10.00%) high mild 2 (2.00%) high severe flatten/paris-30k time: [12.222 ms 12.298 ms 12.384 ms] change: [+0.7747% +1.3966% +2.2290%] (p = 0.00 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 13 (13.00%) high severe ```

This conservatively checks whether Bézier path elements we're about to flatten are outside the viewport. If they are fully to the right, top, or bottom of the viewport, the Bézier does not impact pixel coverage or coarse winding at all, and can be ignored. If it is fully to the left, it does impact pixel coverage and coarse winding, but only the element's start and endpoint y-values matter, not the exact shape, meaning we can just yield a line rather than finely flattening. If more or less everything ends up to be in the viewport, the additional calculation is wasted and increases flattening time by ~3%. If geometry ends up culled, flattening and tiling times can be reduced significantly, but this is of course workload-dependent. The following two Ghostscript Tiger's have their viewboxes reduced to `50 50 100 100` and `90 90 20 20`, down from `0 0 200 200`. Their flattening time is reduce by 52% and 90% respectively, and their tiling time by 22% and 60%. Flattening timings: ``` flatten/Ghostscript_Tiger time: [209.94 µs 210.21 µs 210.51 µs] change: [+2.6850% +3.1753% +3.6309%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe flatten/Ghostscript_Tiger-viewboxed time: [97.189 µs 97.287 µs 97.399 µs] change: [-52.787% -52.650% -52.514%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe flatten/Ghostscript_Tiger-viewboxed-extreme time: [19.722 µs 19.741 µs 19.761 µs] change: [-90.311% -90.280% -90.255%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 7 (7.00%) high mild 1 (1.00%) high severe flatten/paris-30k time: [12.740 ms 12.764 ms 12.788 ms] change: [+2.6014% +3.3631% +4.0837%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild ``` Tiling timings: ``` tile/Ghostscript_Tiger time: [175.39 µs 175.79 µs 176.28 µs] change: [-0.4403% -0.0016% +0.4400%] (p = 1.00 > 0.05) No change in performance detected. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high mild tile/Ghostscript_Tiger-viewboxed time: [78.932 µs 79.147 µs 79.409 µs] change: [-23.209% -22.803% -22.369%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 50 measurements (10.00%) 1 (2.00%) high mild 4 (8.00%) high severe tile/Ghostscript_Tiger-viewboxed-extreme time: [13.378 µs 13.390 µs 13.405 µs] change: [-60.417% -60.306% -60.199%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 50 measurements (12.00%) 2 (4.00%) high mild 4 (8.00%) high severe tile/paris-30k time: [20.970 ms 21.001 ms 21.034 ms] change: [-0.4881% -0.2397% +0.0108%] (p = 0.07 > 0.05) No change in performance detected. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high mild ```

LaurenzV · 2026-02-20T12:44:41Z

I wonder whether we should track the axis-aligned bounding box of the active clip (and separately, we may want to provide a fast-path for rectangular clips). If we have that bounding box, we can generalize this viewport culling to clip culling during tiling as well as here. For some workloads that may provide quite a few more culling opportunities.

yup, this would probably be possible!

LaurenzV

Passes my PDF test suite with flying colors. 🎉

LaurenzV · 2026-02-20T12:49:37Z

sparse_strips/vello_common/src/flatten.rs

-    let iter = path.into_iter().map(
-        #[inline(always)]
-        |el| affine * el,
-    );


I'm wondering whether there is a specific reason this had to be moved into the method?

It's been a while so I don't remember, probably just something left over from an early iteration on this. I've reverted that part. (As a sanity check, benches show no difference, as expected.)

(Well-spotted btw!)

tomcur force-pushed the cull-beziers branch 2 times, most recently from 78108a4 to 7418a92 Compare January 2, 2026 23:09

tomcur commented Jan 2, 2026

View reviewed changes

tomcur changed the title ~~perf(vello_common): Cull Bèziers path elements during flattening~~ perf(vello_common): Cull Béziers path elements during flattening Jan 2, 2026

tomcur force-pushed the cull-beziers branch from 7418a92 to 4ac3862 Compare January 2, 2026 23:34

tomcur mentioned this pull request Jan 2, 2026

vello_common: move subpath closing logic into flatten #1340

Merged

tomcur changed the title ~~perf(vello_common): Cull Béziers path elements during flattening~~ perf(vello_common): Cull Bézier path elements during flattening Jan 3, 2026

tomcur force-pushed the cull-beziers branch from 4ac3862 to e1989fb Compare January 5, 2026 09:44

tomcur force-pushed the cull-beziers branch from e1989fb to 579c9fd Compare January 5, 2026 13:36

tomcur added the C-sparse-strips Applies to sparse strips variants of vello in general label Jan 13, 2026

tomcur force-pushed the cull-beziers branch from b178be5 to 1f2e24d Compare January 15, 2026 14:31

tomcur enabled auto-merge January 15, 2026 14:34

tomcur disabled auto-merge January 15, 2026 15:55

tomcur added 3 commits February 20, 2026 01:14

Add docs about culling

12608f5

Changelog

9d47ff0

tomcur force-pushed the cull-beziers branch from 1f2e24d to 9d47ff0 Compare February 20, 2026 00:16

tomcur added 2 commits February 20, 2026 01:25

Clippy

2d1c672

Fix missing cubic endpoint

8e16391

LaurenzV approved these changes Feb 20, 2026

View reviewed changes

Undo iter change

ee66818

tomcur enabled auto-merge February 20, 2026 13:02

tomcur added this pull request to the merge queue Feb 20, 2026

Merged via the queue into linebender:main with commit 9b6fb6c Feb 20, 2026
17 checks passed

tomcur deleted the cull-beziers branch February 20, 2026 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

perf(vello_common): Cull Bézier path elements during flattening#1341

perf(vello_common): Cull Bézier path elements during flattening#1341
tomcur merged 6 commits intolinebender:mainfrom
tomcur:cull-beziers

tomcur commented Jan 2, 2026 •

edited

Loading

Uh oh!

tomcur left a comment •

edited

Loading

Uh oh!

tomcur Jan 2, 2026 •

edited

Loading

Uh oh!

LaurenzV commented Feb 20, 2026

Uh oh!

LaurenzV left a comment

Uh oh!

LaurenzV Feb 20, 2026 •

edited

Loading

Uh oh!

tomcur Feb 20, 2026

Uh oh!

tomcur Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

tomcur commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomcur left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomcur Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LaurenzV commented Feb 20, 2026

Uh oh!

LaurenzV left a comment

Choose a reason for hiding this comment

Uh oh!

LaurenzV Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomcur Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

tomcur Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tomcur commented Jan 2, 2026 •

edited

Loading

tomcur left a comment •

edited

Loading

tomcur Jan 2, 2026 •

edited

Loading

LaurenzV Feb 20, 2026 •

edited

Loading