Skip to content

vello_sparse_strips: Optimize pixel-aligned rect filling by bypassing path pipeline#1453

Open
grebmeg wants to merge 1 commit intomainfrom
gemberg/glyph-cache-pixel-aligned-rect
Open

vello_sparse_strips: Optimize pixel-aligned rect filling by bypassing path pipeline#1453
grebmeg wants to merge 1 commit intomainfrom
gemberg/glyph-cache-pixel-aligned-rect

Conversation

@grebmeg
Copy link
Collaborator

@grebmeg grebmeg commented Feb 18, 2026

This PR adds a fast path for filling pixel-aligned rectangles that skips the full path processing pipeline (flatten → tiles → strips) and instead generates strip coverage data directly from the rectangle geometry. When fill_rect is called and both the transform and rect coordinates are pixel-aligned (integer translation + integer rect bounds), the renderer calls fill_rect_fast which produces strips directly, avoiding the overhead of path flattening and tiling. When these conditions aren't met, the renderer falls back to the standard pipeline, ensuring pixel-identical output in all cases.

cargo bench -p vello_bench -- render_rect
    Finished `bench` profile [optimized] target(s) in 0.12s
     Running unittests src/lib.rs (target/release/deps/vello_bench-71be2072ddfcacb3)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/main.rs (target/release/deps/main-706649511d898baa)
render_rect/14x14_via_path
                        time:   [278.46 ns 279.32 ns 280.08 ns]
                        change: [-2.5266% +0.5821% +2.7794%] (p = 0.75 > 0.05)
                        No change in performance detected.
Found 1 outliers among 50 measurements (2.00%)
  1 (2.00%) high mild
render_rect/14x14_via_rect
                        time:   [33.874 ns 33.977 ns 34.075 ns]
                        change: [-0.4290% -0.0494% +0.2754%] (p = 0.78 > 0.05)
                        No change in performance detected.
Found 2 outliers among 50 measurements (4.00%)
  2 (4.00%) high mild

@grebmeg grebmeg requested a review from LaurenzV February 19, 2026 06:34
accumulated_winding += acc;
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we maybe want to move all of this into a separate rect.rs? Up to you!

#[inline]
pub fn is_integer_translation(transform: &Affine) -> bool {
let [a, b, c, d, e, f] = transform.as_coeffs();
(a - 1.0).abs() < 1e-9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse SCALAR_NEARLY_ZERO here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a is_nearly_zero method, although it's only defined for f32. Maybe we can define it for f64 as well.

Comment on lines +489 to +492
let path = rect.to_path(0.0);
let start = self.allocation_group.path.len() as u32;
self.allocation_group.path.extend(&path);
let end = self.allocation_group.path.len() as u32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any difficulty to implementing this? If so, could we at least render the rectangle directly into allocation_group.path instead of allocating a new path using to_path?

self.clip_context.get(),
);

// Generate coarse-level commands from strips (layer_id 0 = root layer).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 0 the thread index? Or why we can just assume the root layer here?

Comment on lines +203 to +226
if let Some(clip_data) = clip_path {
self.temp_storage.clear();

strip::render_rect_fast(
self.level,
clamped,
&mut self.temp_storage.strips,
&mut self.temp_storage.alphas,
);
let rect_data = PathDataRef {
strips: &self.temp_storage.strips,
alphas: &self.temp_storage.alphas,
};

intersect(self.level, clip_data, rect_data, strip_storage);
} else {
strip::render_rect_fast(
self.level,
clamped,
&mut strip_storage.strips,
&mut strip_storage.alphas,
);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to deduplicate this part with normal case? I think it should be if you create a new function that takes a closure for the strip rendering part.

assert!(generator.line_buf.is_empty());
assert!(storage.is_empty());
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like all these method could also be made shorter by creating a new function (or just repurposing assert_strips_equal that takes a Rect and test name as input and does the whole logic.

Comment on lines +495 to +497
if x_end <= left_tile_x || y1 <= y0 {
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means negative rectangles aren't handled correctly, right? We can probably ignore that case for now, but might be good to fix in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(by that I mean maybe adding a TODO)

Comment on lines +502 to +505
let (max_strips, max_alphas) =
calc_rect_reserve_capacity(x_end - left_tile_x, tile_end_y - tile_start_y);
strip_buf.reserve(max_strips);
alpha_buf.reserve(max_alphas);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether this is worth the additional "complexity" given that the strip and alpha buffers are reused across each frame for each path, so I think it won't have any effect in 99%+ of the cases. But up to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments