-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Closed
Labels
A-RenderingDrawing game state to the screenDrawing game state to the screenC-Code-QualityA section of code that is hard to understand or changeA section of code that is hard to understand or changeC-UsabilityA targeted quality-of-life change that makes Bevy easier to useA targeted quality-of-life change that makes Bevy easier to use
Description
Description
In order to support 'unlimited' (practically just less-limited) point lights in a scene, we should leverage storage buffers wherever they are available, which should be WebGPU + native, just not WebGL2.
This this branch on an M1 Max I could keep 60fps with about 2150 point lights in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million, and there are probably some optimisations that can be done CPU-side too.
Solution
- [Merged by Bors] - Use storage buffers for clustered forward point lights #3989
- Copy
UniformVecand modify to make aStorageVec - Add
RenderDeviceto theMaterialandSpecializedMaterialtrait::key()functions to allow setting flags on the keys depending on feature/limit availability - Make
GpuPointLightsandViewClusterBuffersinto enums containing UniformVec and StorageVec variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases
Open Questions
How should we detect storage buffer support?- As I needed 3 bindings (I just realised I could consolidate these into one binding, when using a storage buffer), I checked for
max_storage_buffers_per_shader_stage >= 3but this feels like a hack as it knows nothing about any other storage buffers used by any other code. I am leaning toward just checking for> 0and then letting wgpu validation complain whenever too many are used... though we should check the limits for WebGPU and probably try to make sure we use fewer than this if possible for the sake of compatibility and leaving space for custom shader modifications to use more. - Some time passed. I am still leaning toward the same conclusion - let wgpu complain for now until we better understand the different use cases for managing the available resources for bindings.
- Striking through the question for now. We need to use something to detect support, but we can let wgpu complain about using too many storage buffers until we better understand what we need.
- As I needed 3 bindings (I just realised I could consolidate these into one binding, when using a storage buffer), I checked for
- Should the material key function have
WgpuOptionsas an argument? Should it be part ofspecialize()? Both? Neither?- I think perhaps in the case I have where I'm just trying to detect support for storage buffers, maybe adding it as an argument to
specialize()could be sufficient rather than proxying the logic through the key. However, I think there will be cases where material keys may need to depend on information fromWgpuOptions. Now that I think about it some more while writing this, if we wanted to implement some form of batching based on specialization and material, then we would need to group based on the knowledge that the pipeline specialization is the same, and that the material handle is the same. If the pipeline specialization key contains all the information necessary then it alone can be used for the purpose of batching to try to minimise pipeline switching. This at least makes me think that no other information than the prepared asset and key should be passed tospecialize()that can impact the pipeline configuration. - Some time passed and changes were made to WgpuOptions.
RenderDevicehas been added to the material trait key() functions as an argument to allow configuration of custom behaviours depending on wgpu limits/features. This feels like the right approach to me to retain that the key is the only source of information for specialisation. - Marking as done from my perspective.
- I think perhaps in the case I have where I'm just trying to detect support for storage buffers, maybe adding it as an argument to
- Is there a pattern emerging for using Uniform vs Storage buffers that is reusable? It looks like there could be, though the bit-packing for some values may make it a bit trickier... depending on how it's handled.
- Does this approach with StorageVec even make sense? The problem it is trying to solve is Std430 for variable-sized arrays in structs. It feels like a storage buffer should just be a struct, possibly containing Vec members, with AsStd430 derived on it, and it would just work.
- Merge
ViewClusterBuffersandViewClusterBindings? - Merge the 3 storage buffer bindings into 1 containing the information for all 3?
schneiderfelipemcobzarenco and schneiderfelipe
Metadata
Metadata
Assignees
Labels
A-RenderingDrawing game state to the screenDrawing game state to the screenC-Code-QualityA section of code that is hard to understand or changeA section of code that is hard to understand or changeC-UsabilityA targeted quality-of-life change that makes Bevy easier to useA targeted quality-of-life change that makes Bevy easier to use
Type
Projects
Status
Done