Skip to content

Comments

perf(vello_common): Cull Bézier path elements during flattening#1341

Merged
tomcur merged 6 commits intolinebender:mainfrom
tomcur:cull-beziers
Feb 20, 2026
Merged

perf(vello_common): Cull Bézier path elements during flattening#1341
tomcur merged 6 commits intolinebender:mainfrom
tomcur:cull-beziers

Conversation

@tomcur
Copy link
Member

@tomcur tomcur commented Jan 2, 2026

On top of #1340.

tldr: If only some of Tiger's whiskers are visible, this results in -90% and -60% timings on flattening and tiling.

This conservatively checks whether Bézier path elements we're about to flatten are outside the viewport. If they are fully to the right, top, or bottom of the viewport, the Bézier does not impact pixel coverage or coarse winding at all, and can be ignored.

If it is fully to the left, it does impact pixel coverage and coarse winding, but only the element's start and endpoint y-values matter, not the exact shape, meaning we can just yield a line rather than finely flattening.

The following two Ghostscript Tigers have their viewboxes reduced to 50 50 100 100 and 90 90 20 20, down from 0 0 200 200. Their flattening time is reduced by 52% and 90% respectively, and their tiling time by 22% and 60%.

Expand for the Tiger files

Ghostscript_Tiger-viewboxed.svg

Ghostscript_Tiger-viewboxed

Ghostscript_Tiger-viewboxed-extreme.svg

Ghostscript_Tiger-viewboxed-extreme

If more or less everything ends up being in the viewport, the additional calculation is wasted and increases flattening time by ~3%. This is of course workload-dependent.

Flattening timings:

flatten/Ghostscript_Tiger
                        time:   [209.94 µs 210.21 µs 210.51 µs]
                        change: [+2.6850% +3.1753% +3.6309%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
flatten/Ghostscript_Tiger-viewboxed
                        time:   [97.189 µs 97.287 µs 97.399 µs]
                        change: [-52.787% -52.650% -52.514%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
flatten/Ghostscript_Tiger-viewboxed-extreme
                        time:   [19.722 µs 19.741 µs 19.761 µs]
                        change: [-90.311% -90.280% -90.255%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
flatten/paris-30k       time:   [12.740 ms 12.764 ms 12.788 ms]
                        change: [+2.6014% +3.3631% +4.0837%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Tiling timings:

tile/Ghostscript_Tiger  time:   [175.39 µs 175.79 µs 176.28 µs]
                        change: [-0.4403% -0.0016% +0.4400%] (p = 1.00 > 0.05)
                        No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
tile/Ghostscript_Tiger-viewboxed
                        time:   [78.932 µs 79.147 µs 79.409 µs]
                        change: [-23.209% -22.803% -22.369%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  1 (2.00%) high mild
  4 (8.00%) high severe
tile/Ghostscript_Tiger-viewboxed-extreme
                        time:   [13.378 µs 13.390 µs 13.405 µs]
                        change: [-60.417% -60.306% -60.199%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 50 measurements (12.00%)
  2 (4.00%) high mild
  4 (8.00%) high severe
tile/paris-30k          time:   [20.970 ms 21.001 ms 21.034 ms]
                        change: [-0.4881% -0.2397% +0.0108%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild

@tomcur tomcur force-pushed the cull-beziers branch 2 times, most recently from 78108a4 to 7418a92 Compare January 2, 2026 23:09
Copy link
Member Author

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should track the axis-aligned bounding box of the active clip (and separately, we may want to provide a fast-path for rectangular clips). If we have that bounding box, we can generalize this viewport culling to clip culling during tiling as well as here. For some workloads that may provide quite a few more culling opportunities.

Second, a small point: the culling we do here and already did in tiling, means a paths' cached strips cannot be translated if any culling occurred. I believe some discussion was already held that we may want to move tiles and strips to i16 coordinates to support translation, and additionally, if paths are to be cached, we may not want to do any culling. Otherwise work will have to done to rerender if a path is translated.

Comment on lines 95 to 121
// The following checks two things. First, if the quadratic Bèzier is fully to the
// left of the viewport, it may affect pixel coverage and winding, but its exact
// shape does not matter. It can be emitted as a line segment [p0, p2].
//
// Second, an upper bound on the shortest distance of any point on the quadratic
// Bèzier curve to the line segment [p0, p2] is 1/2 of the maximum of the
// endpoint-to-control-point distances.
//
// The derivation is similar to that for the cubic Bezier (see below). In
// short:
//
// q}(t) = B0(t) p0 + B1(t) p1 + B2(t) p2
// dist(q(t), [p0, p1]) <= B1(t) dist(p1, [p0, p1])
// = 2 (1-t)t dist(p1, [p0, p1]).
//
// The maximum occurs at t=1/2, hence
// max(dist(q(t), [p0, p1] <= 1/2 dist(p1, [p0, p1])).
//
// A cheap upper bound for dist(p1, [p0, p1]) is max(dist(p1, p0), dist(p1, p2)).
//
// The following takes the square to elide the square root of the Euclidean
// distance.
else if [p0, p1, p2].into_iter().all(|p| p.x < 0.)
|| f64::max((p1 - p0).hypot2(), (p1 - p2).hypot2()) <= 4. * TOL_2
{
callback.callback(LinePathEl::LineTo(p2));
} else {
Copy link
Member Author

@tomcur tomcur Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small observation: for the left-of-viewport case, this could in fact just return a vertical line, as it does not have to be watertight at all, but in terms of performance that isn't likely to buy us much. It's an additional branch here for a slight performance improvement during tiling.

@tomcur tomcur changed the title perf(vello_common): Cull Bèziers path elements during flattening perf(vello_common): Cull Béziers path elements during flattening Jan 2, 2026
@tomcur tomcur changed the title perf(vello_common): Cull Béziers path elements during flattening perf(vello_common): Cull Bézier path elements during flattening Jan 3, 2026
github-merge-queue bot pushed a commit that referenced this pull request Jan 5, 2026
This is just a logic change and slight simplification of the lines
representation. By moving the burden of closing input subpaths to
flattening itself, this will allow, e.g., culling geometry at the level
of Béziers. Before, the closing of subpaths lived as a post-processing
step after flattening.

The actual change that starts using this for something more exciting is
in #1341.

This by itself does not really bring down timings (plus noise makes it
hard to measure).

```
flatten/Ghostscript_Tiger
                        time:   [203.93 µs 204.28 µs 204.68 µs]
                        change: [-2.1022% -1.8481% -1.5684%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  10 (10.00%) high mild
  2 (2.00%) high severe
flatten/paris-30k       time:   [12.222 ms 12.298 ms 12.384 ms]
                        change: [+0.7747% +1.3966% +2.2290%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  13 (13.00%) high severe
```
@tomcur tomcur added the C-sparse-strips Applies to sparse strips variants of vello in general label Jan 13, 2026
@tomcur tomcur enabled auto-merge January 15, 2026 14:34
@tomcur tomcur disabled auto-merge January 15, 2026 15:55
This conservatively checks whether Bézier path elements we're about to
flatten are outside the viewport. If they are fully to the right, top,
or bottom of the viewport, the Bézier does not impact pixel coverage or
coarse winding at all, and can be ignored.

If it is fully to the left, it does impact pixel coverage and coarse
winding, but only the element's start and endpoint y-values matter, not
the exact shape, meaning we can just yield a line rather than finely
flattening.

If more or less everything ends up to be in the viewport, the additional
calculation is wasted and increases flattening time by ~3%.

If geometry ends up culled, flattening and tiling times can be reduced
significantly, but this is of course workload-dependent.

The following two Ghostscript Tiger's have their viewboxes reduced to
`50 50 100 100` and `90 90 20 20`, down from `0 0 200 200`. Their
flattening time is reduce by 52% and 90% respectively, and their tiling
time by 22% and 60%.

Flattening timings:

```
flatten/Ghostscript_Tiger
                        time:   [209.94 µs 210.21 µs 210.51 µs]
                        change: [+2.6850% +3.1753% +3.6309%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
flatten/Ghostscript_Tiger-viewboxed
                        time:   [97.189 µs 97.287 µs 97.399 µs]
                        change: [-52.787% -52.650% -52.514%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
flatten/Ghostscript_Tiger-viewboxed-extreme
                        time:   [19.722 µs 19.741 µs 19.761 µs]
                        change: [-90.311% -90.280% -90.255%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
flatten/paris-30k       time:   [12.740 ms 12.764 ms 12.788 ms]
                        change: [+2.6014% +3.3631% +4.0837%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
```

Tiling timings:

```
tile/Ghostscript_Tiger  time:   [175.39 µs 175.79 µs 176.28 µs]
                        change: [-0.4403% -0.0016% +0.4400%] (p = 1.00 > 0.05)
                        No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
tile/Ghostscript_Tiger-viewboxed
                        time:   [78.932 µs 79.147 µs 79.409 µs]
                        change: [-23.209% -22.803% -22.369%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
  1 (2.00%) high mild
  4 (8.00%) high severe
tile/Ghostscript_Tiger-viewboxed-extreme
                        time:   [13.378 µs 13.390 µs 13.405 µs]
                        change: [-60.417% -60.306% -60.199%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 50 measurements (12.00%)
  2 (4.00%) high mild
  4 (8.00%) high severe
tile/paris-30k          time:   [20.970 ms 21.001 ms 21.034 ms]
                        change: [-0.4881% -0.2397% +0.0108%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
```
@LaurenzV
Copy link
Collaborator

I wonder whether we should track the axis-aligned bounding box of the active clip (and separately, we may want to provide a fast-path for rectangular clips). If we have that bounding box, we can generalize this viewport culling to clip culling during tiling as well as here. For some workloads that may provide quite a few more culling opportunities.

yup, this would probably be possible!

Copy link
Collaborator

@LaurenzV LaurenzV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passes my PDF test suite with flying colors. 🎉

Comment on lines 100 to 103
let iter = path.into_iter().map(
#[inline(always)]
|el| affine * el,
);
Copy link
Collaborator

@LaurenzV LaurenzV Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether there is a specific reason this had to be moved into the method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while so I don't remember, probably just something left over from an early iteration on this. I've reverted that part. (As a sanity check, benches show no difference, as expected.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Well-spotted btw!)

@tomcur tomcur enabled auto-merge February 20, 2026 13:02
@tomcur tomcur added this pull request to the merge queue Feb 20, 2026
Merged via the queue into linebender:main with commit 9b6fb6c Feb 20, 2026
17 checks passed
@tomcur tomcur deleted the cull-beziers branch February 20, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-sparse-strips Applies to sparse strips variants of vello in general

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants