Skip to content

perf(client): skip GPU uploads for unchanged GLTF instance attributes#26

Open
RZDESIGN wants to merge 1 commit intohytopiagg:mainfrom
RZDESIGN:perf/gltf-dirty-instance-attribute-uploads
Open

perf(client): skip GPU uploads for unchanged GLTF instance attributes#26
RZDESIGN wants to merge 1 commit intohytopiagg:mainfrom
RZDESIGN:perf/gltf-dirty-instance-attribute-uploads

Conversation

@RZDESIGN
Copy link
Copy Markdown

@RZDESIGN RZDESIGN commented Mar 5, 2026

Summary

GLTFManager._processClonedMeshes() unconditionally re-uploaded every instance attribute buffer (matrix, color, opacity, light level, sky light, emissive) to the GPU on every frame for all instanced GLTF meshes — even when nothing had changed since the previous frame.

This PR adds per-cloned-mesh dirty tracking so that attribute data is only written and uploaded when it actually differs from the last frame.

What changed

  • Per-mesh cached state via WeakMap: A CachedInstanceState stores the last-uploaded value for each attribute (16 matrix elements, sky light, color RGB, opacity, light level, emissive RGBA). On each frame, current values are compared against the cache. Data is only written to the typed array when it differs.

  • Force-update safety net: A full write is forced when a mesh is newly created, its instance index shifts (due to visibility changes reordering instances), or the target InstancedMesh changes (e.g. after a capacity resize). This guarantees correctness when the instancing layout changes.

  • Merged single-pass loop: The previously separate per-attribute iteration passes (matrix, then color, then opacity, then light, then emissive) are merged into a single loop over cloned meshes. This improves L1 cache locality since all properties of a mesh are accessed once rather than revisited in multiple passes.

  • Conditional GPU upload: At the end of the method, clearUpdateRanges() / addUpdateRange() / needsUpdate = true are only called on attributes that had at least one instance change. Entire buffer uploads are skipped when nothing changed — avoiding the bufferSubData GPU call entirely for those attributes.

  • Observable stats: A new attributeUploadsSkipped counter is added to GLTFStats and the debug panel (F3), so the optimization's effectiveness can be measured at runtime.

Why this matters

In a typical scene with 200 instanced entities (trees, props, NPCs) where only ~10 are actively moving:

Before After
Instance attribute writes per frame All 200 × all attributes Only ~10 changed × changed attributes
GPU buffer uploads per frame 6-7 bufferSubData calls (one per attribute type) Only for attributes with changes (often 0-1)
CPU work 5 separate loops × 200 iterations 1 merged loop × 200 iterations (with early-skip per mesh)

The GPU upload savings are the primary win. Each bufferSubData call can stall the rendering pipeline while the driver transfers data. Skipping unnecessary uploads reduces GPU stalls and improves frame pacing.

Files changed

File Change
client/src/gltf/GLTFManager.ts Added CachedInstanceState type and _instanceStateCache WeakMap. Refactored _processClonedMeshes with dirty tracking and merged single-pass loop. Removed unused attributes and clonedMeshArray module-level working arrays.
client/src/gltf/GLTFStats.ts Added attributeUploadsSkipped counter.
client/src/core/DebugPanel.ts Wired up attributeUploadsSkipped to the debug panel UI.

Edge cases handled

  • First frame: No cached state exists → all meshes get forceUpdate = true → full upload (identical to previous behavior)
  • Instance count change: New meshes have no cache entry → forceUpdate. Meshes at shifted indices → lastIndex !== indexforceUpdate
  • InstancedMesh resize: Different InstancedMesh reference → lastInstancedMesh !== instancedMeshforceUpdate
  • Mesh released: WeakMap automatically releases cache entries when cloned meshes are garbage collected

Existing comment this addresses

The codebase already had a TODO noting this optimization opportunity (previously at line 1256-1262):

"Accessing all cloned meshes every animation frame, copying necessary data, and transferring it to the WebGL buffer may be costly for both the CPU and GPU. [...] If this cost becomes an issue, the following optimizations could be considered: Transfer instance attribute values to the WebGL buffer only when changes occur."

This PR implements exactly that suggestion.

Test plan

  • Verify instanced GLTF entities render correctly (position, rotation, color, opacity, emissive glow, lighting)
  • Verify entities entering/leaving view (frustum culling) still display correctly
  • Verify dynamic property changes (color change, opacity change, emissive toggle) take effect immediately
  • Verify the debug panel (F3) shows Attr Uploads Skip increasing when entities are stationary
  • Verify Attr El Update drops significantly compared to main when most entities are idle
  • Verify transparent instance rendering still works correctly
  • Verify custom texture swapping on instanced entities still works
  • Check FPS improvement in scenes with many instanced entities (e.g. 100+ trees/props)

GLTFManager._processClonedMeshes() previously re-uploaded every instance
attribute (matrix, color, opacity, light level, sky light, emissive) to
the GPU every frame for all instanced meshes, even when nothing changed.

This adds per-cloned-mesh dirty tracking via a WeakMap cache that stores
the last-uploaded values for each attribute. On each frame, current values
are compared against the cache and only written to the typed array + marked
for GPU upload when they actually differ. A force-update is triggered when
a mesh is new, its instance index shifts, or the target InstancedMesh
changes (e.g. after a resize).

The separate per-attribute iteration passes are merged into a single loop
for better L1 cache locality.

In a typical scene with 200 instanced entities where only ~10 are moving,
this skips ~95% of GPU buffer uploads per frame. The savings are both CPU
(skipping typed array writes) and GPU (avoiding unnecessary bufferSubData
calls that stall the rendering pipeline).

A new `attributeUploadsSkipped` counter is added to GLTFStats and the
debug panel so the optimization's effectiveness is observable at runtime.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant