Skip to content

Conversation

Beilinson
Copy link
Contributor

Description

I made this branch, a compilation of minor performance improvements that I found were possible while researching/testing for #12916 but didn't really fit in there.

I separated all the changes into separate commits, and I added PR comments on each file explaining what I see in my performance profiles and why this change works, as well as how much of an improvement each provides.

Overall, these changes pushed my measured FPS with 6x CPU Slowdown from ~60fps to ~70fps, a 16% improvement on total FPS!

I will summarize the changes here as well:

  1. Cesium3DTilesetStatistics.clone - remove destructure clone of Cesium3DTilesetStatistics.textureReferenceCounterById ~ 2% total FPS, no issues observed.
  2. Credit.clone - Change to just return the original credit. This is mostly experimental, this also can gain a ~2% improvement in total FPS, and I'm not sure why should credits be cloned every frame. No issues observed but maybe I missed something.
  3. Cesium3DTilesetTraversal.anyChildrenVisible - This can early return once one visible child is found. Results in ~10% fewer tiles getting updated every frame, with no observed changes in behavior or tiles loaded on a variety of different tilesets. Sadly, this didn't provide the desired performance boost in tileset traversal.
  4. Added ScreenSpaceCameraController.collisionHeightReference. Height clamping of the camera to the scene takes about 2.55ms/frame, mostly due to Model.pick. Adding this flag allows the user to continue clamping entities to the 3D tileset without also sampling the camera height off the 3D tileset every frame.
  5. My favorite, I mentioned this in my comment in Possible performance issues for model picking #11814: Convert Matrix4 and Matrix3 to be based on Float64Array. This provides a significant 2x speedup of Matrix multiplication and other methods, and saves memory. This results in a 4x improvement on Model.pick and propagates throughout other areas.

Testing plan

I tested all of these separately, mostly visually by comparing almost all the sandboxes between main and local.

As mentioned above, commits 2 and 3 are the ones I'm unsure about, and also incidentally provide the least noticeable impact.

Author checklist

  • I have submitted a Contributor License Agreement
  • I have added my name to CONTRIBUTORS.md
  • I have updated CHANGES.md with a short summary of my change
  • I have added or updated unit tests to ensure consistent code coverage
  • I have updated the inline documentation, and included code examples where relevant
  • I have performed a self-review of my code

@github-actions
Copy link

Thank you for the pull request, @Beilinson!

✅ We can confirm we have a CLA on file for you.

result.texturesReferenceCounterById = {
...statistics.texturesReferenceCounterById,
};
result.texturesReferenceCounterById = statistics.texturesReferenceCounterById;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I did: loaded the basic Google Photorealistic sandcastle and profiled me just moving around slowly

Performance Profile:
I ordered by Self Time descending, and saw that this Cesium3DTilesetStatistics.clone takes about 8% of total frame time.
Image

Why this change:
The performance profile showed that this object destructure was taking up the main bulk of the time of this function.

I audited the code, and saw that while the statistics object is copied frame-to-frame, this property specifically is only ever used by the new statistics object, and passing it by reference kept all the same behavior in prod. Changing it a Map is also a minor performance improvement.

Credit.clone = function (credit) {
if (defined(credit)) {
return new Credit(credit.html, credit.showOnScreen);
return credit;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I did: loaded the basic Google Photorealistic sandcastle and profiled me just moving around slowly

Performance Profile:
I analyzed multiple frames, and saw that CreditDisplay.endFrame was taking up about 2% of frame time.

Image

Why this change:

The performance profile showed that due to the credit clones only passing the original HTML, every frame for all credits:

  1. The HTML has to be sanitized
  2. created as a DOM element
  3. appended to the CreditDisplay.

By passing credits by reference, the steps 1+2 are executed only once when the first credit of that id is added, and then every frame the generated element is reused.

Credits still properly get removed/added when displayed tilesets change. I'm not sure why credits aren't passed by reference, and this isn't a significant performance improvement alone, but together with the others it stacks up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fundamentally changing the behavior of this function to not actually be a clone like it's named. I don't think we want to do this. There may be merit in reducing the need to call .clone wherever it's called but a clone function itself should not be returning the original object.

I believe @mzschwartz5 was looking into credits recently and might have some more insights. I feel like this is something that could benefit from being managed higher up where we can store the references and reduce the amount .clone needs to be called in the first place.

Copy link
Contributor Author

@Beilinson Beilinson Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, this was more of a POC that passing them by reference does not break rendering them fundamentally. If the team is already looking into this I'll refrain from opening this as a separate PR

Comment on lines +287 to +289
if (anyVisible) {
break;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I did: loaded the basic Google Photorealistic sandcastle and profiled me just moving around slowly

Performance Profile:

I checked in total, what part of the frametime was spent by the 3D Tileset. Between 15-30% of each frame is spent in either of the Cesium3DTileset<TYPE>Traversal.selectTiles:

Image Image

Why this change:

The performance profile showed that vast majority of time was spent in the tileset traversal updating visibility. I noticed that this method while by name acts like Array.some, which returns early after one item returns true, it actually iterates over all tile children.

This change results in about 10% fewer tiles visited total (after the change above to make visited tiles more accurate), but I didn't notice a consistent improvement in FPS from this sadly.

This change does not from my testing affect the final selected tiles.

Comment on lines +191 to +193
if (tile._visitedFrame === frameState.frameNumber) {
// Prevents another pass from visiting the frame again
return;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tileset statistics are kind of wrong today. Only tiles which are marked as visible and added to the traversal queue are marked as visited, however in anyChildrenVisible and in updateVisibility there is the possibility that additional tiles are updated and visited but because they return false they are not added to the traversal queue and therefore not marked as visited for the statistics.

This fixes the above, now all tiles that have they visibility updated are correctly marked as visited. In addition, any previously visited tiles this frame are early returned right away, which already exists but is slightly further below in the call chain somewhere inside of updateTileVisibility.

This change shows that on average, there are actually around 60% more visited tiles than previously reported!!

Comment on lines 268 to +275
/**
* When disabled, the values of <code>maximumZoomDistance</code> and <code>minimumZoomDistance</code> are ignored.
* Also used in conjunction with {@link Cesium3DTileset#enableCollision} to prevent the camera from moving through or below a 3D Tileset surface.
* This may also affect clamping behavior when using {@link HeightReference.CLAMP_TO_GROUND} on 3D Tiles.
* @type {boolean}
* @default true
* This value controls the height reference for the {@link ScreenSpaceCameraController} collision, which is used to limit the camera minimum and maximum zoom.
* When set to {@link HeightReference.NONE}, the camera can go underground, and the values of <code>maximumZoomDistance</code> and <code>minimumZoomDistance</code> are ignored.
* For all other values, the camera is constrained to be above the terrain and/or 3D Tiles (depending on {@link Cesium3DTileset#enableCollision}).
* @type {HeightReference}
* @default HeightReference.CLAMP_TO_GROUND
*/
this.enableCollisionDetection = true;
this.collisionHeightReference = HeightReference.CLAMP_TO_GROUND;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I did: loaded the basic Google Photorealistic sandcastle and profiled me just moving around slowly

Performance Profile:

Image

Why this change:

As can be seen in the sample frame above, about 25% of frametime is spent at the beginning of every frame (when the camera moved) sampling the height of the 3D tileset. While modelPick improvements could improve this, this behavior in my opinion is not needed/wanted by all users.

The naive solution provided by the ScreenSpaceCameraController currently tells us to set enableCollision=false on all 3D tilesets, but what if I do want entities to be clamped?. This forces us to either have clamping on both entities and camera, or neither.

This commit adds support for a collisionHeightReference flag (with backwards compatibility with enableCollisionDetection), which adds an additional check whether to clamp specifically the scene camera to 3D tilesets, providing feature parity with entity.<graphics>.heightReference.

This now allows me to clamp entities to a 3D tileset, while clamping the camera only to the terrain.

this[13] = column3Row1 ?? 0.0;
this[14] = column3Row2 ?? 0.0;
this[15] = column3Row3 ?? 0.0;
class Matrix4 extends Float64Array {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important improvement here. Matrices are at the heart of 3D graphics, and any performance improvement here radiates throughout the entire codebase.

I approached this primarily to speed up pickModel, where getVertexPosition is primarily limited by the performance of Matrix4.multiplyByPoint.

By making Matrix4 a class extending Float64Array, we get a 2x speedup in that method (as well as various levels of speedups on other methods):

Image

I show the performance improvements of this combined with @javagl 's branch #12658 in #11814.

Importantly, changing the backing datastructure to Float64Array has 100% identical behavior and results as the previous implementation. As per https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number#number_encoding, regular numbers in JS are also 64 bit floats. By using the class .. extends syntax, we continue having values accessed by doing matrix[i], rather than forcing something like matrix.buffer[i].

this[6] = column2Row0 ?? 0.0;
this[7] = column2Row1 ?? 0.0;
this[8] = column2Row2 ?? 0.0;
class Matrix3 extends Float64Array {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended the improvements on Matrix4 to Matrix3 as well

0.0,
1.0,
),
Matrix4.IDENTITY = new Matrix4(
Copy link
Contributor Author

@Beilinson Beilinson Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jjspace jjspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Beilinson.

I didn't read through all your comments yet but had a few initial concerns, see below.

I also think that it may be better to break each of these changes into distinct PRs. Maybe they're small but that's the point, easier to review and validate. We may want to merge some without the others. Or there may be a few that need a lot more discussion than others and there's no reason to delay the ones that we know are easy wins for them.

Matrix3.IDENTITY = Object.freeze(
new Matrix3(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0),
);
Matrix3.IDENTITY = new Matrix3(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the limitation that it's not possible to .freeze() a typed array in JS (another explination) is enough to give me pause about changing this implementation.

The identity and zero matrices are used frequently across CesiumJS and I'd assume other projects that use CesiumJS. There needs to be a way to guarantee these are always the expected values. Users should not be allowed to do this:

console.log(Cesium.Matrix3.IDENTITY);
Cesium.Matrix3.IDENTITY[2] = 27.0;
console.log(Cesium.Matrix3.IDENTITY);

Currently this code throws an error but in this branch this works just fine and the identity is now wrong in every place it's used.

It's possible this could become a getter style function that always returns a new matrix but that defeats the purpose of having a shared object and could create more memory issues as it creates new objects every time.

Are there ways to mix frozen, normal arrays and typed arrays? Is there a different way to change internal structure based on expected mutability? Would that just make it more painful and inconsistent to use?

I don't know the answer to these questions but we need to figure it out before pushing forward with this change.

Copy link
Contributor Author

@Beilinson Beilinson Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no issue mixing old object matrices and new array ones, so keeping these as old frozen object based matrices is a valid solution.

However:

  1. It will slow down the total performance gains to make the frozen matrices not typed array based
  2. if anyone changes the frozen matrices that would instantly break the entire application (so in my opinion it's an unreasonable expectation)
  3. Nothing prevents in the current cesium code someone from doing Cesium.Matrix3.IDENTITY = new Cesium.Matrix3(), since the actual Matrix3,Matrix4,etc. namespace objects aren't frozen themselves.

If after further consideration you believe it's still important these specifically can be reverted to frozen object matrices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets continue this conversation in #12973

Credit.clone = function (credit) {
if (defined(credit)) {
return new Credit(credit.html, credit.showOnScreen);
return credit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fundamentally changing the behavior of this function to not actually be a clone like it's named. I don't think we want to do this. There may be merit in reducing the need to call .clone wherever it's called but a clone function itself should not be returning the original object.

I believe @mzschwartz5 was looking into credits recently and might have some more insights. I feel like this is something that could benefit from being managed higher up where we can store the references and reduce the amount .clone needs to be called in the first place.

@Beilinson
Copy link
Contributor Author

Hey @jjspace, that sounds fair, I thought that reviewing 5 separate tiny prs would be a nuisance 😅

I'll split them after going over your comments!

@Beilinson Beilinson mentioned this pull request Oct 10, 2025
6 tasks
@Beilinson
Copy link
Contributor Author

I also think that it may be better to break each of these changes into distinct PRs. Maybe they're small but that's the point, easier to review and validate. We may want to merge some without the others. Or there may be a few that need a lot more discussion than others and there's no reason to delay the ones that we know are easy wins for them.

Opened #12973, #12974, and #12975. Will open an issue instead for Credits, and anyChildrenVisible is something I'm unsure about its correctness.

@jjspace
Copy link
Contributor

jjspace commented Oct 14, 2025

@Beilinson thanks for breaking this apart, I think given the amount of discussion that's already happened in the other PRs it was a good decision to do so 😅
What is your remaining plan with this PR itself? Can it be closed in favor of the others? If so please make sure any remaining relevant comments/info are copied over (I think you already did but just reminding). Also don't forget to open issues if needed.

@Beilinson
Copy link
Contributor Author

Thanks for reminding @jjspace, I'll close this and open an issue for credits and for tileset traversal performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants