perf(client): skip GPU uploads for unchanged GLTF instance attributes#26
Open
RZDESIGN wants to merge 1 commit intohytopiagg:mainfrom
Open
perf(client): skip GPU uploads for unchanged GLTF instance attributes#26RZDESIGN wants to merge 1 commit intohytopiagg:mainfrom
RZDESIGN wants to merge 1 commit intohytopiagg:mainfrom
Conversation
GLTFManager._processClonedMeshes() previously re-uploaded every instance attribute (matrix, color, opacity, light level, sky light, emissive) to the GPU every frame for all instanced meshes, even when nothing changed. This adds per-cloned-mesh dirty tracking via a WeakMap cache that stores the last-uploaded values for each attribute. On each frame, current values are compared against the cache and only written to the typed array + marked for GPU upload when they actually differ. A force-update is triggered when a mesh is new, its instance index shifts, or the target InstancedMesh changes (e.g. after a resize). The separate per-attribute iteration passes are merged into a single loop for better L1 cache locality. In a typical scene with 200 instanced entities where only ~10 are moving, this skips ~95% of GPU buffer uploads per frame. The savings are both CPU (skipping typed array writes) and GPU (avoiding unnecessary bufferSubData calls that stall the rendering pipeline). A new `attributeUploadsSkipped` counter is added to GLTFStats and the debug panel so the optimization's effectiveness is observable at runtime. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GLTFManager._processClonedMeshes()unconditionally re-uploaded every instance attribute buffer (matrix, color, opacity, light level, sky light, emissive) to the GPU on every frame for all instanced GLTF meshes — even when nothing had changed since the previous frame.This PR adds per-cloned-mesh dirty tracking so that attribute data is only written and uploaded when it actually differs from the last frame.
What changed
Per-mesh cached state via
WeakMap: ACachedInstanceStatestores the last-uploaded value for each attribute (16 matrix elements, sky light, color RGB, opacity, light level, emissive RGBA). On each frame, current values are compared against the cache. Data is only written to the typed array when it differs.Force-update safety net: A full write is forced when a mesh is newly created, its instance index shifts (due to visibility changes reordering instances), or the target
InstancedMeshchanges (e.g. after a capacity resize). This guarantees correctness when the instancing layout changes.Merged single-pass loop: The previously separate per-attribute iteration passes (matrix, then color, then opacity, then light, then emissive) are merged into a single loop over cloned meshes. This improves L1 cache locality since all properties of a mesh are accessed once rather than revisited in multiple passes.
Conditional GPU upload: At the end of the method,
clearUpdateRanges()/addUpdateRange()/needsUpdate = trueare only called on attributes that had at least one instance change. Entire buffer uploads are skipped when nothing changed — avoiding thebufferSubDataGPU call entirely for those attributes.Observable stats: A new
attributeUploadsSkippedcounter is added toGLTFStatsand the debug panel (F3), so the optimization's effectiveness can be measured at runtime.Why this matters
In a typical scene with 200 instanced entities (trees, props, NPCs) where only ~10 are actively moving:
bufferSubDatacalls (one per attribute type)The GPU upload savings are the primary win. Each
bufferSubDatacall can stall the rendering pipeline while the driver transfers data. Skipping unnecessary uploads reduces GPU stalls and improves frame pacing.Files changed
client/src/gltf/GLTFManager.tsCachedInstanceStatetype and_instanceStateCacheWeakMap. Refactored_processClonedMesheswith dirty tracking and merged single-pass loop. Removed unusedattributesandclonedMeshArraymodule-level working arrays.client/src/gltf/GLTFStats.tsattributeUploadsSkippedcounter.client/src/core/DebugPanel.tsattributeUploadsSkippedto the debug panel UI.Edge cases handled
forceUpdate = true→ full upload (identical to previous behavior)forceUpdate. Meshes at shifted indices →lastIndex !== index→forceUpdateInstancedMeshreference →lastInstancedMesh !== instancedMesh→forceUpdateWeakMapautomatically releases cache entries when cloned meshes are garbage collectedExisting comment this addresses
The codebase already had a TODO noting this optimization opportunity (previously at line 1256-1262):
This PR implements exactly that suggestion.
Test plan
Attr Uploads Skipincreasing when entities are stationaryAttr El Updatedrops significantly compared tomainwhen most entities are idle