-
Notifications
You must be signed in to change notification settings - Fork 3.7k
General performance improvements #12968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for the pull request, @Beilinson! ✅ We can confirm we have a CLA on file for you. |
result.texturesReferenceCounterById = { | ||
...statistics.texturesReferenceCounterById, | ||
}; | ||
result.texturesReferenceCounterById = statistics.texturesReferenceCounterById; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I did: loaded the basic Google Photorealistic
sandcastle and profiled me just moving around slowly
Performance Profile:
I ordered by Self Time descending, and saw that this Cesium3DTilesetStatistics.clone
takes about 8% of total frame time.
Why this change:
The performance profile showed that this object destructure was taking up the main bulk of the time of this function.
I audited the code, and saw that while the statistics object is copied frame-to-frame, this property specifically is only ever used by the new statistics object, and passing it by reference kept all the same behavior in prod. Changing it a Map
is also a minor performance improvement.
Credit.clone = function (credit) { | ||
if (defined(credit)) { | ||
return new Credit(credit.html, credit.showOnScreen); | ||
return credit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I did: loaded the basic Google Photorealistic sandcastle and profiled me just moving around slowly
Performance Profile:
I analyzed multiple frames, and saw that CreditDisplay.endFrame
was taking up about 2% of frame time.

Why this change:
The performance profile showed that due to the credit clones only passing the original HTML
, every frame for all credits:
- The HTML has to be sanitized
- created as a DOM element
- appended to the CreditDisplay.
By passing credits by reference, the steps 1+2 are executed only once when the first credit of that id is added, and then every frame the generated element is reused.
Credits still properly get removed/added when displayed tilesets change. I'm not sure why credits aren't passed by reference, and this isn't a significant performance improvement alone, but together with the others it stacks up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fundamentally changing the behavior of this function to not actually be a clone
like it's named. I don't think we want to do this. There may be merit in reducing the need to call .clone
wherever it's called but a clone function itself should not be returning the original object.
I believe @mzschwartz5 was looking into credits recently and might have some more insights. I feel like this is something that could benefit from being managed higher up where we can store the references and reduce the amount .clone
needs to be called in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, this was more of a POC that passing them by reference does not break rendering them fundamentally. If the team is already looking into this I'll refrain from opening this as a separate PR
if (anyVisible) { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I did: loaded the basic Google Photorealistic
sandcastle and profiled me just moving around slowly
Performance Profile:
I checked in total, what part of the frametime was spent by the 3D Tileset. Between 15-30% of each frame is spent in either of the Cesium3DTileset<TYPE>Traversal.selectTiles
:


Why this change:
The performance profile showed that vast majority of time was spent in the tileset traversal updating visibility. I noticed that this method while by name acts like Array.some
, which returns early after one item returns true, it actually iterates over all tile children.
This change results in about 10% fewer tiles visited total (after the change above to make visited tiles more accurate), but I didn't notice a consistent improvement in FPS from this sadly.
This change does not from my testing affect the final selected tiles.
if (tile._visitedFrame === frameState.frameNumber) { | ||
// Prevents another pass from visiting the frame again | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tileset statistics are kind of wrong today. Only tiles which are marked as visible and added to the traversal queue are marked as visited
, however in anyChildrenVisible
and in updateVisibility
there is the possibility that additional tiles are updated and visited but because they return false they are not added to the traversal queue and therefore not marked as visited
for the statistics.
This fixes the above, now all tiles that have they visibility updated are correctly marked as visited
. In addition, any previously visited tiles this frame are early returned right away, which already exists but is slightly further below in the call chain somewhere inside of updateTileVisibility
.
This change shows that on average, there are actually around 60% more visited tiles than previously reported!!
/** | ||
* When disabled, the values of <code>maximumZoomDistance</code> and <code>minimumZoomDistance</code> are ignored. | ||
* Also used in conjunction with {@link Cesium3DTileset#enableCollision} to prevent the camera from moving through or below a 3D Tileset surface. | ||
* This may also affect clamping behavior when using {@link HeightReference.CLAMP_TO_GROUND} on 3D Tiles. | ||
* @type {boolean} | ||
* @default true | ||
* This value controls the height reference for the {@link ScreenSpaceCameraController} collision, which is used to limit the camera minimum and maximum zoom. | ||
* When set to {@link HeightReference.NONE}, the camera can go underground, and the values of <code>maximumZoomDistance</code> and <code>minimumZoomDistance</code> are ignored. | ||
* For all other values, the camera is constrained to be above the terrain and/or 3D Tiles (depending on {@link Cesium3DTileset#enableCollision}). | ||
* @type {HeightReference} | ||
* @default HeightReference.CLAMP_TO_GROUND | ||
*/ | ||
this.enableCollisionDetection = true; | ||
this.collisionHeightReference = HeightReference.CLAMP_TO_GROUND; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I did: loaded the basic Google Photorealistic
sandcastle and profiled me just moving around slowly
Performance Profile:

Why this change:
As can be seen in the sample frame above, about 25% of frametime is spent at the beginning of every frame (when the camera moved) sampling the height of the 3D tileset. While modelPick
improvements could improve this, this behavior in my opinion is not needed/wanted by all users.
The naive solution provided by the ScreenSpaceCameraController
currently tells us to set enableCollision=false
on all 3D tilesets, but what if I do want entities to be clamped?. This forces us to either have clamping on both entities and camera, or neither.
This commit adds support for a collisionHeightReference
flag (with backwards compatibility with enableCollisionDetection
), which adds an additional check whether to clamp specifically the scene camera to 3D tilesets, providing feature parity with entity.<graphics>.heightReference
.
This now allows me to clamp entities to a 3D tileset, while clamping the camera only to the terrain.
this[13] = column3Row1 ?? 0.0; | ||
this[14] = column3Row2 ?? 0.0; | ||
this[15] = column3Row3 ?? 0.0; | ||
class Matrix4 extends Float64Array { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most important improvement here. Matrices are at the heart of 3D graphics, and any performance improvement here radiates throughout the entire codebase.
I approached this primarily to speed up pickModel
, where getVertexPosition
is primarily limited by the performance of Matrix4.multiplyByPoint
.
By making Matrix4 a class extending Float64Array
, we get a 2x speedup in that method (as well as various levels of speedups on other methods):

I show the performance improvements of this combined with @javagl 's branch #12658 in #11814.
Importantly, changing the backing datastructure to Float64Array
has 100% identical behavior and results as the previous implementation. As per https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number#number_encoding, regular numbers in JS are also 64 bit floats. By using the class .. extends
syntax, we continue having values accessed by doing matrix[i]
, rather than forcing something like matrix.buffer[i]
.
this[6] = column2Row0 ?? 0.0; | ||
this[7] = column2Row1 ?? 0.0; | ||
this[8] = column2Row2 ?? 0.0; | ||
class Matrix3 extends Float64Array { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended the improvements on Matrix4
to Matrix3
as well
0.0, | ||
1.0, | ||
), | ||
Matrix4.IDENTITY = new Matrix4( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JS does not allow freezing Typed arrays:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Beilinson.
I didn't read through all your comments yet but had a few initial concerns, see below.
I also think that it may be better to break each of these changes into distinct PRs. Maybe they're small but that's the point, easier to review and validate. We may want to merge some without the others. Or there may be a few that need a lot more discussion than others and there's no reason to delay the ones that we know are easy wins for them.
Matrix3.IDENTITY = Object.freeze( | ||
new Matrix3(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0), | ||
); | ||
Matrix3.IDENTITY = new Matrix3(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the limitation that it's not possible to .freeze()
a typed array in JS (another explination) is enough to give me pause about changing this implementation.
The identity and zero matrices are used frequently across CesiumJS and I'd assume other projects that use CesiumJS. There needs to be a way to guarantee these are always the expected values. Users should not be allowed to do this:
console.log(Cesium.Matrix3.IDENTITY);
Cesium.Matrix3.IDENTITY[2] = 27.0;
console.log(Cesium.Matrix3.IDENTITY);
Currently this code throws an error but in this branch this works just fine and the identity is now wrong in every place it's used.
It's possible this could become a getter style function that always returns a new matrix but that defeats the purpose of having a shared object and could create more memory issues as it creates new objects every time.
Are there ways to mix frozen, normal arrays and typed arrays? Is there a different way to change internal structure based on expected mutability? Would that just make it more painful and inconsistent to use?
I don't know the answer to these questions but we need to figure it out before pushing forward with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no issue mixing old object matrices and new array ones, so keeping these as old frozen object based matrices is a valid solution.
However:
- It will slow down the total performance gains to make the frozen matrices not typed array based
- if anyone changes the frozen matrices that would instantly break the entire application (so in my opinion it's an unreasonable expectation)
- Nothing prevents in the current cesium code someone from doing
Cesium.Matrix3.IDENTITY = new Cesium.Matrix3()
, since the actualMatrix3,Matrix4,etc.
namespace objects aren't frozen themselves.
If after further consideration you believe it's still important these specifically can be reverted to frozen object matrices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets continue this conversation in #12973
Credit.clone = function (credit) { | ||
if (defined(credit)) { | ||
return new Credit(credit.html, credit.showOnScreen); | ||
return credit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fundamentally changing the behavior of this function to not actually be a clone
like it's named. I don't think we want to do this. There may be merit in reducing the need to call .clone
wherever it's called but a clone function itself should not be returning the original object.
I believe @mzschwartz5 was looking into credits recently and might have some more insights. I feel like this is something that could benefit from being managed higher up where we can store the references and reduce the amount .clone
needs to be called in the first place.
Hey @jjspace, that sounds fair, I thought that reviewing 5 separate tiny prs would be a nuisance 😅 I'll split them after going over your comments! |
Opened #12973, #12974, and #12975. Will open an issue instead for Credits, and |
@Beilinson thanks for breaking this apart, I think given the amount of discussion that's already happened in the other PRs it was a good decision to do so 😅 |
Thanks for reminding @jjspace, I'll close this and open an issue for credits and for tileset traversal performance |
Description
I made this branch, a compilation of minor performance improvements that I found were possible while researching/testing for #12916 but didn't really fit in there.
I separated all the changes into separate commits, and I added PR comments on each file explaining what I see in my performance profiles and why this change works, as well as how much of an improvement each provides.
Overall, these changes pushed my measured FPS with
6x CPU Slowdown
from ~60fps to ~70fps, a 16% improvement on total FPS!I will summarize the changes here as well:
Cesium3DTilesetStatistics.clone
- remove destructure clone ofCesium3DTilesetStatistics.textureReferenceCounterById
~ 2% total FPS, no issues observed.Credit.clone
- Change to just return the original credit. This is mostly experimental, this also can gain a ~2% improvement in total FPS, and I'm not sure why should credits be cloned every frame. No issues observed but maybe I missed something.Cesium3DTilesetTraversal.anyChildrenVisible
- This can early return once one visible child is found. Results in ~10% fewer tiles getting updated every frame, with no observed changes in behavior or tiles loaded on a variety of different tilesets. Sadly, this didn't provide the desired performance boost in tileset traversal.ScreenSpaceCameraController.collisionHeightReference
. Height clamping of the camera to the scene takes about 2.55ms/frame, mostly due toModel.pick
. Adding this flag allows the user to continue clamping entities to the 3D tileset without also sampling the camera height off the 3D tileset every frame.Matrix4
andMatrix3
to be based onFloat64Array
. This provides a significant 2x speedup of Matrix multiplication and other methods, and saves memory. This results in a 4x improvement onModel.pick
and propagates throughout other areas.Testing plan
I tested all of these separately, mostly visually by comparing almost all the sandboxes between main and local.
As mentioned above, commits 2 and 3 are the ones I'm unsure about, and also incidentally provide the least noticeable impact.
Author checklist
CONTRIBUTORS.md
CHANGES.md
with a short summary of my change