perf(vello_common): Cull Bézier path elements during flattening#1341
perf(vello_common): Cull Bézier path elements during flattening#1341tomcur merged 6 commits intolinebender:mainfrom
Conversation
78108a4 to
7418a92
Compare
There was a problem hiding this comment.
I wonder whether we should track the axis-aligned bounding box of the active clip (and separately, we may want to provide a fast-path for rectangular clips). If we have that bounding box, we can generalize this viewport culling to clip culling during tiling as well as here. For some workloads that may provide quite a few more culling opportunities.
Second, a small point: the culling we do here and already did in tiling, means a paths' cached strips cannot be translated if any culling occurred. I believe some discussion was already held that we may want to move tiles and strips to i16 coordinates to support translation, and additionally, if paths are to be cached, we may not want to do any culling. Otherwise work will have to done to rerender if a path is translated.
| // The following checks two things. First, if the quadratic Bèzier is fully to the | ||
| // left of the viewport, it may affect pixel coverage and winding, but its exact | ||
| // shape does not matter. It can be emitted as a line segment [p0, p2]. | ||
| // | ||
| // Second, an upper bound on the shortest distance of any point on the quadratic | ||
| // Bèzier curve to the line segment [p0, p2] is 1/2 of the maximum of the | ||
| // endpoint-to-control-point distances. | ||
| // | ||
| // The derivation is similar to that for the cubic Bezier (see below). In | ||
| // short: | ||
| // | ||
| // q}(t) = B0(t) p0 + B1(t) p1 + B2(t) p2 | ||
| // dist(q(t), [p0, p1]) <= B1(t) dist(p1, [p0, p1]) | ||
| // = 2 (1-t)t dist(p1, [p0, p1]). | ||
| // | ||
| // The maximum occurs at t=1/2, hence | ||
| // max(dist(q(t), [p0, p1] <= 1/2 dist(p1, [p0, p1])). | ||
| // | ||
| // A cheap upper bound for dist(p1, [p0, p1]) is max(dist(p1, p0), dist(p1, p2)). | ||
| // | ||
| // The following takes the square to elide the square root of the Euclidean | ||
| // distance. | ||
| else if [p0, p1, p2].into_iter().all(|p| p.x < 0.) | ||
| || f64::max((p1 - p0).hypot2(), (p1 - p2).hypot2()) <= 4. * TOL_2 | ||
| { | ||
| callback.callback(LinePathEl::LineTo(p2)); | ||
| } else { |
There was a problem hiding this comment.
A small observation: for the left-of-viewport case, this could in fact just return a vertical line, as it does not have to be watertight at all, but in terms of performance that isn't likely to buy us much. It's an additional branch here for a slight performance improvement during tiling.
This is just a logic change and slight simplification of the lines representation. By moving the burden of closing input subpaths to flattening itself, this will allow, e.g., culling geometry at the level of Béziers. Before, the closing of subpaths lived as a post-processing step after flattening. The actual change that starts using this for something more exciting is in #1341. This by itself does not really bring down timings (plus noise makes it hard to measure). ``` flatten/Ghostscript_Tiger time: [203.93 µs 204.28 µs 204.68 µs] change: [-2.1022% -1.8481% -1.5684%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 10 (10.00%) high mild 2 (2.00%) high severe flatten/paris-30k time: [12.222 ms 12.298 ms 12.384 ms] change: [+0.7747% +1.3966% +2.2290%] (p = 0.00 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 13 (13.00%) high severe ```
This conservatively checks whether Bézier path elements we're about to
flatten are outside the viewport. If they are fully to the right, top,
or bottom of the viewport, the Bézier does not impact pixel coverage or
coarse winding at all, and can be ignored.
If it is fully to the left, it does impact pixel coverage and coarse
winding, but only the element's start and endpoint y-values matter, not
the exact shape, meaning we can just yield a line rather than finely
flattening.
If more or less everything ends up to be in the viewport, the additional
calculation is wasted and increases flattening time by ~3%.
If geometry ends up culled, flattening and tiling times can be reduced
significantly, but this is of course workload-dependent.
The following two Ghostscript Tiger's have their viewboxes reduced to
`50 50 100 100` and `90 90 20 20`, down from `0 0 200 200`. Their
flattening time is reduce by 52% and 90% respectively, and their tiling
time by 22% and 60%.
Flattening timings:
```
flatten/Ghostscript_Tiger
time: [209.94 µs 210.21 µs 210.51 µs]
change: [+2.6850% +3.1753% +3.6309%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
flatten/Ghostscript_Tiger-viewboxed
time: [97.189 µs 97.287 µs 97.399 µs]
change: [-52.787% -52.650% -52.514%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
flatten/Ghostscript_Tiger-viewboxed-extreme
time: [19.722 µs 19.741 µs 19.761 µs]
change: [-90.311% -90.280% -90.255%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
flatten/paris-30k time: [12.740 ms 12.764 ms 12.788 ms]
change: [+2.6014% +3.3631% +4.0837%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
```
Tiling timings:
```
tile/Ghostscript_Tiger time: [175.39 µs 175.79 µs 176.28 µs]
change: [-0.4403% -0.0016% +0.4400%] (p = 1.00 > 0.05)
No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
1 (2.00%) high mild
tile/Ghostscript_Tiger-viewboxed
time: [78.932 µs 79.147 µs 79.409 µs]
change: [-23.209% -22.803% -22.369%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 50 measurements (10.00%)
1 (2.00%) high mild
4 (8.00%) high severe
tile/Ghostscript_Tiger-viewboxed-extreme
time: [13.378 µs 13.390 µs 13.405 µs]
change: [-60.417% -60.306% -60.199%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 50 measurements (12.00%)
2 (4.00%) high mild
4 (8.00%) high severe
tile/paris-30k time: [20.970 ms 21.001 ms 21.034 ms]
change: [-0.4881% -0.2397% +0.0108%] (p = 0.07 > 0.05)
No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
1 (2.00%) high mild
```
yup, this would probably be possible! |
LaurenzV
left a comment
There was a problem hiding this comment.
Passes my PDF test suite with flying colors. 🎉
| let iter = path.into_iter().map( | ||
| #[inline(always)] | ||
| |el| affine * el, | ||
| ); |
There was a problem hiding this comment.
I'm wondering whether there is a specific reason this had to be moved into the method?
There was a problem hiding this comment.
It's been a while so I don't remember, probably just something left over from an early iteration on this. I've reverted that part. (As a sanity check, benches show no difference, as expected.)
On top of #1340.
tldr: If only some of Tiger's whiskers are visible, this results in -90% and -60% timings on flattening and tiling.
This conservatively checks whether Bézier path elements we're about to flatten are outside the viewport. If they are fully to the right, top, or bottom of the viewport, the Bézier does not impact pixel coverage or coarse winding at all, and can be ignored.
If it is fully to the left, it does impact pixel coverage and coarse winding, but only the element's start and endpoint y-values matter, not the exact shape, meaning we can just yield a line rather than finely flattening.
The following two Ghostscript Tigers have their viewboxes reduced to
50 50 100 100and90 90 20 20, down from0 0 200 200. Their flattening time is reduced by 52% and 90% respectively, and their tiling time by 22% and 60%.Expand for the Tiger files
Ghostscript_Tiger-viewboxed.svgGhostscript_Tiger-viewboxed-extreme.svgIf more or less everything ends up being in the viewport, the additional calculation is wasted and increases flattening time by ~3%. This is of course workload-dependent.
Flattening timings:
Tiling timings: