Add early culling for lines fully left of the viewport.#1368
Add early culling for lines fully left of the viewport.#1368
Conversation
There was a problem hiding this comment.
Very cool, and I think this is roughly what it should look like. I do have a few observations.
Early culling is conditionally enabled based on whether a culling opportunity was detected in the previous strip generation pass.
I'm not sure culling_opportunity are the best semantics. A path having had a culling opportunity may not be a good predictor of culling being possible the next time generate_with_clip is called, and same if the path didn't have a culling opportunity, the next path may in actuality well have one. These semantics will of course mean that we're always going to miss the performance benefit of culling at least during the first path in a sequence of cullable paths.
Perhaps, instead we should be conditional only on whether we want culling at all -- for example, if we're caching strips, perhaps all types of culling should be disabled (including culling where we don't account for winding, i.e., above, to the right, and below the viewport). In that case it may still make sense sense to have the dispatch be const-generic over the culling, as you have now, but perhaps the difference in performance can be so minimal that we don't have to bother.
E.g., for tiling, with the current implementation it doesn't seem to matter if we always cull or not:
tile/Ghostscript_Tiger time: [184.92 µs 185.14 µs 185.41 µs]
tile/Ghostscript_Tiger-cull time: [181.96 µs 182.34 µs 182.84 µs]
That of course doesn't preclude dispatching to a strip rendering implementation that does or does not use culling, depending on whether we've culled something. That may well be important for performance.
Should also benchmark strip rendering, but that requires some bigger changes to the benchmark code...
It does seem both tiling and strip rendering are about 6% slower with these changes, but there is a little bit of noise so that has to be measured a bit more carefully (and this can perhaps be optimized more).
Currently this does not use the bitvec, and instead traverses the histogram.
We have to carefully check what the performance impact is of this where rendering a bunch of small geometry (like glyphs) which will usually just occupy a few rows.
These are good points! For more context, we made the decision to make the culling conditional on the previous So here's a thought: WDYT about culling always enabled in Bitvec to come! |
Yes! In fact, I edited my message (probably as you were typing) to say this:
|
|
@LaurenzV do you know if it's safe to early cull when the storage generation mode is not For safety, I make the assumption no, but after thinking about it, I think it's fine? Once the strips and alpha are generated (and cached), there's no need to hold onto the culled winding contributions (similarly to how tiles are also cleared regardless of the storage mode)? |
|
Otherwise, ready for review. |
|
I don't believe it should make a difference for the generation mode, as long as the strips are there it should be fine! |
|
Early culling is now always enabled. I've retained the |
|
I have not forgotten about this, but haven't had a chance for a proper look yet! |
Apologies this took an embarrassingly long time due to a funny bug. Adds changes as discussed.
Note: The old code path still exists in strip.rs:
vello/sparse_strips/vello_common/src/strip.rs
Lines 446 to 477 in a2f2823
This is because
make_tiles,this produces no savings in tiles generated. I.e. the line will still produce a tile atx == 0. Leaving in the code path in strip saves extra logic inmake_tiles.Currently this does not use the bitvec, and instead traverses the histogram.Bitvec-like behavior added