Skip to content

Add early culling for lines fully left of the viewport.#1368

Open
b0nes164 wants to merge 18 commits intomainfrom
thomas/AAearlyCull
Open

Add early culling for lines fully left of the viewport.#1368
b0nes164 wants to merge 18 commits intomainfrom
thomas/AAearlyCull

Conversation

@b0nes164
Copy link
Member

@b0nes164 b0nes164 commented Jan 20, 2026

Apologies this took an embarrassingly long time due to a funny bug. Adds changes as discussed.

Note: The old code path still exists in strip.rs:

if tile.x == 0 && line_left_x < 0. {
let (ymin, ymax) = if line.p0.x == line.p1.x {
(line_top_y, line_bottom_y)
} else {
let line_viewport_left_y = (line_top_y - line_top_x * y_slope)
.max(line_top_y)
.min(line_bottom_y);
(
f32::min(line_left_y, line_viewport_left_y),
f32::max(line_left_y, line_viewport_left_y),
)
};
let ymin: f32x4<_> = ymin.simd_into(s);
let ymax: f32x4<_> = ymax.simd_into(s);
let px_top_y: f32x4<_> = [0.0, 1.0, 2.0, 3.0].simd_into(s);
let px_bottom_y = 1.0 + px_top_y;
let ymin = px_top_y.max(ymin);
let ymax = px_bottom_y.min(ymax);
let h = (ymax - ymin).max(0.0);
accumulated_winding = h.madd(sign, accumulated_winding);
for x_idx in 0..Tile::WIDTH {
location_winding[x_idx as usize] = h.madd(sign, location_winding[x_idx as usize]);
}
if line_right_x < 0. {
// Early exit, as no part of the line is inside the tile.
continue;
}
}

This is because

  1. Early culling is conditionally enabled based on whether a culling opportunity was detected in the previous strip generation pass.
  2. Left edge crossing lines are not culled. Although the winding contribution could be calculated in make_tiles, this produces no savings in tiles generated. I.e. the line will still produce a tile at x == 0. Leaving in the code path in strip saves extra logic in make_tiles.

Currently this does not use the bitvec, and instead traverses the histogram.
Bitvec-like behavior added

@b0nes164 b0nes164 requested a review from tomcur January 20, 2026 01:28
Copy link
Member

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, and I think this is roughly what it should look like. I do have a few observations.

Early culling is conditionally enabled based on whether a culling opportunity was detected in the previous strip generation pass.

I'm not sure culling_opportunity are the best semantics. A path having had a culling opportunity may not be a good predictor of culling being possible the next time generate_with_clip is called, and same if the path didn't have a culling opportunity, the next path may in actuality well have one. These semantics will of course mean that we're always going to miss the performance benefit of culling at least during the first path in a sequence of cullable paths.

Perhaps, instead we should be conditional only on whether we want culling at all -- for example, if we're caching strips, perhaps all types of culling should be disabled (including culling where we don't account for winding, i.e., above, to the right, and below the viewport). In that case it may still make sense sense to have the dispatch be const-generic over the culling, as you have now, but perhaps the difference in performance can be so minimal that we don't have to bother.

E.g., for tiling, with the current implementation it doesn't seem to matter if we always cull or not:

tile/Ghostscript_Tiger  time:       [184.92 µs 185.14 µs 185.41 µs]
tile/Ghostscript_Tiger-cull time:   [181.96 µs 182.34 µs 182.84 µs]

That of course doesn't preclude dispatching to a strip rendering implementation that does or does not use culling, depending on whether we've culled something. That may well be important for performance.

Should also benchmark strip rendering, but that requires some bigger changes to the benchmark code...

It does seem both tiling and strip rendering are about 6% slower with these changes, but there is a little bit of noise so that has to be measured a bit more carefully (and this can perhaps be optimized more).

Currently this does not use the bitvec, and instead traverses the histogram.

We have to carefully check what the performance impact is of this where rendering a bunch of small geometry (like glyphs) which will usually just occupy a few rows.

@b0nes164
Copy link
Member Author

b0nes164 commented Jan 28, 2026

A path having had a culling opportunity may not be a good predictor of culling being possible the next time generate_with_clip is called, and same if the path didn't have a culling opportunity, the next path may in actuality well have one.

These are good points! For more context, we made the decision to make the culling conditional on the previous make_tiles call because we were concerned about the cost of clearing the histograms. However, now that we have the data, the histogram cost seems to be negligible. But, I suspect the cost of traversing the rows inside strip.rs is not negligible.

So here's a thought: WDYT about culling always enabled in make_tile and then conditionally dispatching the culling logic in strips. I think that gives us the best of both? (and also only clearing the histograms if the previous make_tile culled?).

Bitvec to come!

@tomcur
Copy link
Member

tomcur commented Jan 28, 2026

So here's a thought: WDYT about culling always enabled in make_tile and then conditionally dispatching the culling logic in strips. I think that gives us the best of both? (and also only clearing the histograms if the previous make_tile culled?).

Yes! In fact, I edited my message (probably as you were typing) to say this:

That of course doesn't preclude dispatching to a strip rendering implementation that does or does not use culling, depending on whether we've culled something. That may well be important for performance.

@b0nes164
Copy link
Member Author

b0nes164 commented Feb 4, 2026

@LaurenzV do you know if it's safe to early cull when the storage generation mode is not GenerationMode::Replace ?

For safety, I make the assumption no, but after thinking about it, I think it's fine? Once the strips and alpha are generated (and cached), there's no need to hold onto the culled winding contributions (similarly to how tiles are also cleared regardless of the storage mode)?

@b0nes164
Copy link
Member Author

b0nes164 commented Feb 4, 2026

Otherwise, ready for review.

@LaurenzV
Copy link
Collaborator

LaurenzV commented Feb 4, 2026

I don't believe it should make a difference for the generation mode, as long as the strips are there it should be fine!

@b0nes164
Copy link
Member Author

b0nes164 commented Feb 4, 2026

Early culling is now always enabled. I've retained the USE_EARLY_CULL generic, as it's useful to manually enable/disable culling for tests and benchmarking.

@b0nes164 b0nes164 requested a review from tomcur February 5, 2026 22:07
@tomcur
Copy link
Member

tomcur commented Feb 11, 2026

I have not forgotten about this, but haven't had a chance for a proper look yet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments