PoC: Fast-path for skipping coarse rasterization and scheduling#1454
PoC: Fast-path for skipping coarse rasterization and scheduling#1454laurenz-canva wants to merge 1 commit intomainfrom
Conversation
| /// This replicates the strip→GpuStrip conversion that normally happens across | ||
| /// `Wide::generate` + `Scheduler::do_tile`, but for the simple case where all draws | ||
| /// happen at depth=1 directly to the surface with no layers or blending. | ||
| pub(crate) fn build_gpu_strips_direct( |
There was a problem hiding this comment.
We can probably reuse existing code here, as mentioned I haven't cleaned this up yet, just a PoC.
taj-p
left a comment
There was a problem hiding this comment.
There are parallels with this work and the blit pipeline.
If we go this route, I think we needn't flush the fast path in push layer. We could instead flush the paths that intersect the bounds produced in pop layer. And then, we mightn't even need to depending on the layer type (opacity layer with SRC over blending, for example).
This could also be batched. The fast path can be re-enabled after we pop layer.
In my work on the blit pipeline, there is a batching mechanism you could reuse if we think this is the right strategy to take
|
Sounds good, looking forward to the PR! Yes you could probably optimize this even further, but this is the bare minimum that should already be a good improvement in many cases. 😄 |
|
One thing I didn't state but that I hope is implied: AMAZING to have a POC so quickly to validate the approach. Very cool!!! 🎉 |
Note that this was AI-generated, I haven't reviewed this fully in-depth yet and it's possible we can be smarter about the storing of strips, so no nitpicky review please. 😄 But this should demonstrate what I was imagining. And all tests seem to be passing.
As Alex rightly highlighted, this does have the disadvantage of not allowing the "if there is an opaque fill, clear all previous fill" optimization. However, it seems to me like this should be overshadowed by the improvements that come from not doing scheduling and coarse rasterization. Here are the timings for rendering 1000 frames of the GhotScript tiger:
Before (note in particular

Wide::generateandScheduler::do_scene:After:
