Skip to content

Editorial review: Document PerformanceNavigationTiming.confidence#43528

Open
chrisdavidmills wants to merge 5 commits intomdn:mainfrom
chrisdavidmills:performancenavigationtiming-confidence
Open

Editorial review: Document PerformanceNavigationTiming.confidence#43528
chrisdavidmills wants to merge 5 commits intomdn:mainfrom
chrisdavidmills:performancenavigationtiming-confidence

Conversation

@chrisdavidmills
Copy link
Copy Markdown
Contributor

@chrisdavidmills chrisdavidmills commented Mar 23, 2026

Description

Chrome 145 adds support for the PerformanceNavigationTiming.confidence property, and the associated PerformanceTimingConfidence interface. See https://chromestatus.com/feature/5186950448283648.

This PR adds documentation for both features mentioned above.

Motivation

Additional details

Related issues and pull requests

@chrisdavidmills chrisdavidmills requested a review from a team as a code owner March 23, 2026 09:56
@chrisdavidmills chrisdavidmills requested review from hamishwillee and removed request for a team March 23, 2026 09:56
@github-actions github-actions Bot added Content:WebAPI Web API docs size/m [PR only] 51-500 LoC changed labels Mar 23, 2026
@chrisdavidmills chrisdavidmills changed the title Document PerformanceNavigationTiming.confidence Technical review: Document PerformanceNavigationTiming.confidence Mar 23, 2026
@hamishwillee
Copy link
Copy Markdown
Collaborator

@chrisdavidmills This says "technical review". Is it ready for me to look at?

@chrisdavidmills
Copy link
Copy Markdown
Contributor Author

@chrisdavidmills This says "technical review". Is it ready for me to look at?

@hamishwillee. Not yet; I requested a tech review from the browser engineers yesterday. Once it is ready, I'll flip it to "Editorial review".

Copy link
Copy Markdown

@mmocny mmocny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs look great, thanks for doing it!

I'm not the primary engineering contact for this, so hopefully Mike Jackson at Msft will have a change to take a look.

One detail that is missing from the docs: how should you use this value and interpret the data on the server? This feels like the most important part of the API, and also the harder to understand for developers.

Mike has done some presentations on this, and I see that he added a NOTE to the very bottom of this spection of the spec: https://www.w3.org/TR/navigation-timing-2/#sec-PerformanceNavigationTiming

This section is intended to help RUM providers and developers interpret confidence

...that section might be worth including in docs here?

Cheers.

Comment thread files/en-us/web/api/performancenavigationtiming/confidence/index.md Outdated
Comment thread files/en-us/web/api/performancenavigationtiming/confidence/index.md Outdated
Comment thread files/en-us/web/api/performancenavigationtiming/index.md Outdated
Comment thread files/en-us/web/api/performancetimingconfidence/value/index.md Outdated
@chrisdavidmills
Copy link
Copy Markdown
Contributor Author

Mike has done some presentations on this, and I see that he added a NOTE to the very bottom of this section of the spec: https://www.w3.org/TR/navigation-timing-2/#sec-PerformanceNavigationTiming

This section is intended to help RUM providers and developers interpret confidence

...that section might be worth including in docs here?

This makes sense. For the moment, I've gone for including all the text in this section in the PerformanceTimingConfidence page, under a heading of "Interpreting confidence data". I've not made many changes, except for a few tweaks, and adding links to the different values.

Anyway, I'll include that in my next commit.

Copy link
Copy Markdown

@mwjacksonmsft mwjacksonmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes LGTM. Thanks!

@chrisdavidmills chrisdavidmills changed the title Technical review: Document PerformanceNavigationTiming.confidence Editorial review: Document PerformanceNavigationTiming.confidence Mar 27, 2026
@chrisdavidmills
Copy link
Copy Markdown
Contributor Author

Cool, thanks, @mwjacksonmsft. I'll move this to the editorial review stage.

@hamishwillee, ready for you to have a look, if you've still got time early next week.

Comment thread files/en-us/web/api/performancetimingconfidence/index.md Outdated
Comment thread files/en-us/web/api/performancetimingconfidence/index.md Outdated

## Interpreting confidence data

Since the {{domxref("PerformanceTimingConfidence.randomizedTriggerRate", "randomizedTriggerRate")}} can vary across records, per-record weighting is needed to recover unbiased aggregates. The procedure below illustrates how weighting based on {{domxref("PerformanceTimingConfidence.value", "value")}} can be applied before computing summary statistics.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is always an argument around what you should expect a developer to reasonably interpret and how much hand holding you should do. To my mind though, basic questions like "why do I need to do this have not been answered".

Let's start backwards. Why are summary statistics needed and how do I use those?
What is an unbiased aggregate? Why do I care to recover one? Blah blah.

To put it another way, I can see myself following these instructions and generating the data, but then not knowing what to do with it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwjacksonmsft can you provide a short paragraph that answers these questions, which I can edit into some finished prose? I don't know the answers to these questions.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to write new prose for @mwjacksonmsft here, but went looking for existing text on the subject.

The spec doesn't really elaborate, but the original design doc does.

Stealing from there:

Summary

Web applications may suffer from bimodal distribution in page load performance, due to factors outside of the web application’s control. For example:

  • When a user agent first launches (a "cold start" scenario), it must perform many expensive initialization tasks that compete for resources on the system.
  • Browser extensions can affect the performance of a website. For instance, some extensions run additional code on every page you visit, which can increase CPU usage and result in slower response times.
  • When a machine is busy performing intensive tasks, it can lead to slower loading of web pages.

In these scenarios, content the web app attempts to load will be in competition with other work happening on the system. This makes it difficult to detect if performance issues exist within web applications themselves, or because of external factors.

Teams we have worked with have been surprised at the difference between real-world dashboard metrics and what they observe in page profiling tools. Without more information, it is challenging for developers to understand if (and when) their applications may be misbehaving or are simply being loaded in a contended period.

A new ‘confidence’ field on the PerformanceNavigationTiming object will enable developers to discern if the navigation timings are representative for their web application.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, re-reading this patch, the description in the main navigation timing doc section Performance timing confidence seems to answer these questions already, so may just need linking to?


If the question here is even more broad, such as: why do developers measure performance in the field? Then I would also point to existing docs on the subject rather than answer them here.

The point of this feature (confidence) is to help segment field data into two distinct groups, with the observation that the high-confidence results are more stable over time, but the relative distribution between the groups can change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. I've done some research of my own, and added a bit of information on why unbiased aggregates are needed.

It seems to me that summary statistics just refers to the statistics you produce from the raw data, which you actually give people to read. I therefore don't think this needs a huge amount of explanation, but I have added a few more words to indicate that they are statistics based on the confidence data.

@hamishwillee, let me know what you think.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment @mmocny left captures the why.

Teams we have worked with have been surprised at the difference between real-world dashboard metrics and what they observe in page profiling tools. Without more information, it is challenging for developers to understand if (and when) their applications may be misbehaving or are simply being loaded in a contended period.

Developers can debias the data, and then focus on measuring and improving perf for things under their control.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mwjacksonmsft. I've added a couple more sentences to capture some of these thoughts.

Comment thread files/en-us/web/api/performancenavigationtiming/confidence/index.md Outdated
Comment thread files/en-us/web/api/performancenavigationtiming/index.md Outdated

{{APIRef("Performance API")}}{{SeeCompatTable}}

The **`randomizedTriggerRate`** read-only property of the {{domxref("PerformanceTimingConfidence")}} interface is a number representing a percentage value that indicates how often noise is applied when exposing the {{domxref("PerformanceTimingConfidence.value")}}.
Copy link
Copy Markdown
Collaborator

@hamishwillee hamishwillee Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very complete and accurate, but it is hard to parse. Possibly it is not necessary to capture this all in one sentence here, since you should to that in the value.

Suggested change
The **`randomizedTriggerRate`** read-only property of the {{domxref("PerformanceTimingConfidence")}} interface is a number representing a percentage value that indicates how often noise is applied when exposing the {{domxref("PerformanceTimingConfidence.value")}}.
The **`randomizedTriggerRate`** read-only property of the {{domxref("PerformanceTimingConfidence")}} indicates how often noise is applied when exposing the {{domxref("PerformanceTimingConfidence.value")}}.

Copy link
Copy Markdown
Collaborator

@hamishwillee hamishwillee Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way

  • why do we add noise? We should say, and also state what a high rate actually means vs a low rate.
  • So 100% (1) would mean every PerformanceTimingConfidence has noise applied to the value.
  • If noise is applied does that flip the value.

Just trying to get a feel for what a developer might do or not do with this knowledge.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwjacksonmsft can you provide answers to these questions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hamishwillee I've done a bit of research here too, and added more details about the randomized trigger rate and noise. Let me know if that answers your questions.

Comment thread files/en-us/web/api/performancetimingconfidence/randomizedtriggerrate/index.md Outdated
Comment thread files/en-us/web/api/performancetimingconfidence/value/index.md Outdated
Comment thread files/en-us/web/api/performancetimingconfidence/index.md Outdated
Copy link
Copy Markdown
Collaborator

@hamishwillee hamishwillee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. I have questions.

It might be nice to mention this in https://developer.mozilla.org/en-US/docs/Web/API/Performance_API/Navigation_timing

@chrisdavidmills
Copy link
Copy Markdown
Contributor Author

It might be nice to mention this in https://developer.mozilla.org/en-US/docs/Web/API/Performance_API/Navigation_timing

Good point; I've added a section to cover it.

Copy link
Copy Markdown
Collaborator

@hamishwillee hamishwillee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes look excellent. Can you ping me again direclty when you've got the remaining answers and integrated them?

@chrisdavidmills
Copy link
Copy Markdown
Contributor Author

Those changes look excellent. Can you ping me again direclty when you've got the remaining answers and integrated them?

Yes, will do—cheers mate.

@hamishwillee
Copy link
Copy Markdown
Collaborator

Note, I haven't been pinged back on this one AFAIK.

@chrisdavidmills
Copy link
Copy Markdown
Contributor Author

@mmocny @mwjacksonmsft, there are a couple of outstanding questions that came up in the editorial review that are blocking publication of this documentation. Can you look at them and help me with some answers? I've closed all the resolved comments, so they should be easy to find. Thanks!

@mmocny
Copy link
Copy Markdown

mmocny commented May 1, 2026

I spot only a single unresolved comment in the patch at this point-- but github UI claims there are two unresolved comments. (Perhaps one is stale from a line in the patch that has been removed, not sure).

Let me know if I haven't found all the questions that need answering.

@hamishwillee hamishwillee force-pushed the performancenavigationtiming-confidence branch from 42351f9 to 9a130c8 Compare May 8, 2026 01:45
Comment on lines +35 to +39
### Interpreting confidence data

Since the {{domxref("PerformanceTimingConfidence.randomizedTriggerRate", "randomizedTriggerRate")}} can vary across records, per-record weighting is needed to recover unbiased aggregates, to improve consistency of data, cut down on compound errors, and generally produce more realistic and reliable results. The procedures below illustrate how weighting based on {{domxref("PerformanceTimingConfidence.value", "value")}} can be applied before computing summary statistics based on the confidence data.

Once you have debiased the data and computed realistic summary statistics, you can focus on measuring and improving performance for issues under your control.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisdavidmills Same problem as I highlighted before - I didn't understand the "point" from this text and what you would do when you have the debiased data.
I asked Claude if this paragraph was just marketing and apparently it isn't :-). Apparently the term "recover unbiased aggregates" means something :-)

The point that is a bit buried is that value is not deterministic. The browser uses randomization to assign "high" or "low". When p = 0.1 (say), it means 10% of the time the value you see was randomly assigned regardless of actual conditions.

So you can't just filter out "low" records and average the "high" ones to work out your real performance — you'd be throwing away records that were actually fine but happened to get a random "low", and keeping records that were actually bad but got a random "high".

The debiasing math corrects for the random noise so that your aggregate statistics (mean, p75, etc.) are statistically valid. This is what the paragraph above did not make clear to me. Perhaps I am dim.

Claude says that what I'd do with the data if collecting navigation timing data (e.g. for a real-user monitoring dashboard):

  1. Collect records via PerformanceObserver as normal.
  2. For each record, also grab entry.confidence.value and entry.confidence.randomizedTriggerRate.
  3. When computing your p75 LCP or mean load time, apply the weighting formulas instead of a plain average — this gives you separate, corrected metrics for "typical" loads vs. "degraded" loads.
  4. Use the "high" confidence mean/percentile as your "real" performance baseline, and use the "low" one to understand how bad things get in cold-start scenarios.

This last bit is what I meant by "what do you do with the data" - use it as a new baseline.

Does my problem now make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Content:WebAPI Web API docs size/m [PR only] 51-500 LoC changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants