vello_sparse_strips: Optimize pixel-aligned rect filling by bypassing path pipeline#1453
vello_sparse_strips: Optimize pixel-aligned rect filling by bypassing path pipeline#1453
Conversation
| accumulated_winding += acc; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Do we maybe want to move all of this into a separate rect.rs? Up to you!
| #[inline] | ||
| pub fn is_integer_translation(transform: &Affine) -> bool { | ||
| let [a, b, c, d, e, f] = transform.as_coeffs(); | ||
| (a - 1.0).abs() < 1e-9 |
There was a problem hiding this comment.
Can we reuse SCALAR_NEARLY_ZERO here?
There was a problem hiding this comment.
There's also a is_nearly_zero method, although it's only defined for f32. Maybe we can define it for f64 as well.
| let path = rect.to_path(0.0); | ||
| let start = self.allocation_group.path.len() as u32; | ||
| self.allocation_group.path.extend(&path); | ||
| let end = self.allocation_group.path.len() as u32; |
There was a problem hiding this comment.
Is there any difficulty to implementing this? If so, could we at least render the rectangle directly into allocation_group.path instead of allocating a new path using to_path?
| self.clip_context.get(), | ||
| ); | ||
|
|
||
| // Generate coarse-level commands from strips (layer_id 0 = root layer). |
There was a problem hiding this comment.
Isn't 0 the thread index? Or why we can just assume the root layer here?
| if let Some(clip_data) = clip_path { | ||
| self.temp_storage.clear(); | ||
|
|
||
| strip::render_rect_fast( | ||
| self.level, | ||
| clamped, | ||
| &mut self.temp_storage.strips, | ||
| &mut self.temp_storage.alphas, | ||
| ); | ||
| let rect_data = PathDataRef { | ||
| strips: &self.temp_storage.strips, | ||
| alphas: &self.temp_storage.alphas, | ||
| }; | ||
|
|
||
| intersect(self.level, clip_data, rect_data, strip_storage); | ||
| } else { | ||
| strip::render_rect_fast( | ||
| self.level, | ||
| clamped, | ||
| &mut strip_storage.strips, | ||
| &mut strip_storage.alphas, | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Is it possible to deduplicate this part with normal case? I think it should be if you create a new function that takes a closure for the strip rendering part.
| assert!(generator.line_buf.is_empty()); | ||
| assert!(storage.is_empty()); | ||
| } | ||
|
|
There was a problem hiding this comment.
It seems to me like all these method could also be made shorter by creating a new function (or just repurposing assert_strips_equal that takes a Rect and test name as input and does the whole logic.
| if x_end <= left_tile_x || y1 <= y0 { | ||
| return; | ||
| } |
There was a problem hiding this comment.
This means negative rectangles aren't handled correctly, right? We can probably ignore that case for now, but might be good to fix in the future.
There was a problem hiding this comment.
(by that I mean maybe adding a TODO)
| let (max_strips, max_alphas) = | ||
| calc_rect_reserve_capacity(x_end - left_tile_x, tile_end_y - tile_start_y); | ||
| strip_buf.reserve(max_strips); | ||
| alpha_buf.reserve(max_alphas); |
There was a problem hiding this comment.
I'm wondering whether this is worth the additional "complexity" given that the strip and alpha buffers are reused across each frame for each path, so I think it won't have any effect in 99%+ of the cases. But up to you!
This PR adds a fast path for filling pixel-aligned rectangles that skips the full path processing pipeline (flatten → tiles → strips) and instead generates strip coverage data directly from the rectangle geometry. When
fill_rectis called and both the transform and rect coordinates are pixel-aligned (integer translation + integer rect bounds), the renderer callsfill_rect_fastwhich produces strips directly, avoiding the overhead of path flattening and tiling. When these conditions aren't met, the renderer falls back to the standard pipeline, ensuring pixel-identical output in all cases.cargo bench -p vello_bench -- render_rect Finished `bench` profile [optimized] target(s) in 0.12s Running unittests src/lib.rs (target/release/deps/vello_bench-71be2072ddfcacb3) running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Running benches/main.rs (target/release/deps/main-706649511d898baa) render_rect/14x14_via_path time: [278.46 ns 279.32 ns 280.08 ns] change: [-2.5266% +0.5821% +2.7794%] (p = 0.75 > 0.05) No change in performance detected. Found 1 outliers among 50 measurements (2.00%) 1 (2.00%) high mild render_rect/14x14_via_rect time: [33.874 ns 33.977 ns 34.075 ns] change: [-0.4290% -0.0494% +0.2754%] (p = 0.78 > 0.05) No change in performance detected. Found 2 outliers among 50 measurements (4.00%) 2 (4.00%) high mild