Editorial review: Document PerformanceNavigationTiming.confidence by chrisdavidmills · Pull Request #43528 · mdn/content

chrisdavidmills · 2026-03-23T09:56:02Z

Description

Chrome 145 adds support for the PerformanceNavigationTiming.confidence property, and the associated PerformanceTimingConfidence interface. See https://chromestatus.com/feature/5186950448283648.

This PR adds documentation for both features mentioned above.

Motivation

Additional details

Related issues and pull requests

github-actions · 2026-03-23T09:57:44Z

Preview URLs (8 pages)

(comment last updated: 2026-05-08 01:47:34)

hamishwillee · 2026-03-23T22:36:23Z

@chrisdavidmills This says "technical review". Is it ready for me to look at?

chrisdavidmills · 2026-03-24T08:22:22Z

@chrisdavidmills This says "technical review". Is it ready for me to look at?

@hamishwillee. Not yet; I requested a tech review from the browser engineers yesterday. Once it is ready, I'll flip it to "Editorial review".

mmocny

The docs look great, thanks for doing it!

I'm not the primary engineering contact for this, so hopefully Mike Jackson at Msft will have a change to take a look.

One detail that is missing from the docs: how should you use this value and interpret the data on the server? This feels like the most important part of the API, and also the harder to understand for developers.

Mike has done some presentations on this, and I see that he added a NOTE to the very bottom of this spection of the spec: https://www.w3.org/TR/navigation-timing-2/#sec-PerformanceNavigationTiming

This section is intended to help RUM providers and developers interpret confidence

...that section might be worth including in docs here?

Cheers.

chrisdavidmills · 2026-03-27T16:16:04Z

Mike has done some presentations on this, and I see that he added a NOTE to the very bottom of this section of the spec: https://www.w3.org/TR/navigation-timing-2/#sec-PerformanceNavigationTiming

This section is intended to help RUM providers and developers interpret confidence

...that section might be worth including in docs here?

This makes sense. For the moment, I've gone for including all the text in this section in the PerformanceTimingConfidence page, under a heading of "Interpreting confidence data". I've not made many changes, except for a few tweaks, and adding links to the different values.

Anyway, I'll include that in my next commit.

mwjacksonmsft

These changes LGTM. Thanks!

chrisdavidmills · 2026-03-27T17:21:38Z

Cool, thanks, @mwjacksonmsft. I'll move this to the editorial review stage.

@hamishwillee, ready for you to have a look, if you've still got time early next week.

hamishwillee · 2026-03-30T00:18:38Z

+
+## Interpreting confidence data
+
+Since the {{domxref("PerformanceTimingConfidence.randomizedTriggerRate", "randomizedTriggerRate")}} can vary across records, per-record weighting is needed to recover unbiased aggregates. The procedure below illustrates how weighting based on {{domxref("PerformanceTimingConfidence.value", "value")}} can be applied before computing summary statistics.


There is always an argument around what you should expect a developer to reasonably interpret and how much hand holding you should do. To my mind though, basic questions like "why do I need to do this have not been answered".

Let's start backwards. Why are summary statistics needed and how do I use those?
What is an unbiased aggregate? Why do I care to recover one? Blah blah.

To put it another way, I can see myself following these instructions and generating the data, but then not knowing what to do with it.

@mwjacksonmsft can you provide a short paragraph that answers these questions, which I can edit into some finished prose? I don't know the answers to these questions.

I didn't want to write new prose for @mwjacksonmsft here, but went looking for existing text on the subject.

The spec doesn't really elaborate, but the original design doc does.

Stealing from there:

Summary

Web applications may suffer from bimodal distribution in page load performance, due to factors outside of the web application’s control. For example:

When a user agent first launches (a "cold start" scenario), it must perform many expensive initialization tasks that compete for resources on the system.

Browser extensions can affect the performance of a website. For instance, some extensions run additional code on every page you visit, which can increase CPU usage and result in slower response times.

When a machine is busy performing intensive tasks, it can lead to slower loading of web pages.

In these scenarios, content the web app attempts to load will be in competition with other work happening on the system. This makes it difficult to detect if performance issues exist within web applications themselves, or because of external factors.

Teams we have worked with have been surprised at the difference between real-world dashboard metrics and what they observe in page profiling tools. Without more information, it is challenging for developers to understand if (and when) their applications may be misbehaving or are simply being loaded in a contended period.

A new ‘confidence’ field on the PerformanceNavigationTiming object will enable developers to discern if the navigation timings are representative for their web application.

Also, re-reading this patch, the description in the main navigation timing doc section Performance timing confidence seems to answer these questions already, so may just need linking to?

If the question here is even more broad, such as: why do developers measure performance in the field? Then I would also point to existing docs on the subject rather than answer them here.

The point of this feature (confidence) is to help segment field data into two distinct groups, with the observation that the high-confidence results are more stable over time, but the relative distribution between the groups can change.

OK, thanks. I've done some research of my own, and added a bit of information on why unbiased aggregates are needed.

It seems to me that summary statistics just refers to the statistics you produce from the raw data, which you actually give people to read. I therefore don't think this needs a huge amount of explanation, but I have added a few more words to indicate that they are statistics based on the confidence data.

@hamishwillee, let me know what you think.

The comment @mmocny left captures the why.

Teams we have worked with have been surprised at the difference between real-world dashboard metrics and what they observe in page profiling tools. Without more information, it is challenging for developers to understand if (and when) their applications may be misbehaving or are simply being loaded in a contended period.

Developers can debias the data, and then focus on measuring and improving perf for things under their control.

Thanks, @mwjacksonmsft. I've added a couple more sentences to capture some of these thoughts.

hamishwillee · 2026-03-30T00:42:46Z

+
+{{APIRef("Performance API")}}{{SeeCompatTable}}
+
+The **`randomizedTriggerRate`** read-only property of the {{domxref("PerformanceTimingConfidence")}} interface is a number representing a percentage value that indicates how often noise is applied when exposing the {{domxref("PerformanceTimingConfidence.value")}}.


This is very complete and accurate, but it is hard to parse. Possibly it is not necessary to capture this all in one sentence here, since you should to that in the value.

Suggested change

The **`randomizedTriggerRate`** read-only property of the {{domxref("PerformanceTimingConfidence")}} interface is a number representing a percentage value that indicates how often noise is applied when exposing the {{domxref("PerformanceTimingConfidence.value")}}.

The **`randomizedTriggerRate`** read-only property of the {{domxref("PerformanceTimingConfidence")}} indicates how often noise is applied when exposing the {{domxref("PerformanceTimingConfidence.value")}}.

Either way

why do we add noise? We should say, and also state what a high rate actually means vs a low rate.

So 100% (1) would mean every PerformanceTimingConfidence has noise applied to the value.

If noise is applied does that flip the value.

Just trying to get a feel for what a developer might do or not do with this knowledge.

@mwjacksonmsft can you provide answers to these questions?

@hamishwillee I've done a bit of research here too, and added more details about the randomized trigger rate and noise. Let me know if that answers your questions.

hamishwillee

Looks pretty good. I have questions.

It might be nice to mention this in https://developer.mozilla.org/en-US/docs/Web/API/Performance_API/Navigation_timing

chrisdavidmills · 2026-03-30T11:36:30Z

It might be nice to mention this in https://developer.mozilla.org/en-US/docs/Web/API/Performance_API/Navigation_timing

Good point; I've added a section to cover it.

hamishwillee

Those changes look excellent. Can you ping me again direclty when you've got the remaining answers and integrated them?

chrisdavidmills · 2026-03-31T10:14:40Z

Those changes look excellent. Can you ping me again direclty when you've got the remaining answers and integrated them?

Yes, will do—cheers mate.

hamishwillee · 2026-05-01T03:54:37Z

Note, I haven't been pinged back on this one AFAIK.

chrisdavidmills · 2026-05-01T08:47:45Z

@mmocny @mwjacksonmsft, there are a couple of outstanding questions that came up in the editorial review that are blocking publication of this documentation. Can you look at them and help me with some answers? I've closed all the resolved comments, so they should be easy to find. Thanks!

mmocny · 2026-05-01T15:21:55Z

I spot only a single unresolved comment in the patch at this point-- but github UI claims there are two unresolved comments. (Perhaps one is stale from a line in the patch that has been removed, not sure).

Let me know if I haven't found all the questions that need answering.

hamishwillee · 2026-05-08T02:07:44Z

+### Interpreting confidence data
+
+Since the {{domxref("PerformanceTimingConfidence.randomizedTriggerRate", "randomizedTriggerRate")}} can vary across records, per-record weighting is needed to recover unbiased aggregates, to improve consistency of data, cut down on compound errors, and generally produce more realistic and reliable results. The procedures below illustrate how weighting based on {{domxref("PerformanceTimingConfidence.value", "value")}} can be applied before computing summary statistics based on the confidence data.
+
+Once you have debiased the data and computed realistic summary statistics, you can focus on measuring and improving performance for issues under your control.


@chrisdavidmills Same problem as I highlighted before - I didn't understand the "point" from this text and what you would do when you have the debiased data.
I asked Claude if this paragraph was just marketing and apparently it isn't :-). Apparently the term "recover unbiased aggregates" means something :-)

The point that is a bit buried is that value is not deterministic. The browser uses randomization to assign "high" or "low". When p = 0.1 (say), it means 10% of the time the value you see was randomly assigned regardless of actual conditions.

So you can't just filter out "low" records and average the "high" ones to work out your real performance — you'd be throwing away records that were actually fine but happened to get a random "low", and keeping records that were actually bad but got a random "high".

The debiasing math corrects for the random noise so that your aggregate statistics (mean, p75, etc.) are statistically valid. This is what the paragraph above did not make clear to me. Perhaps I am dim.

Claude says that what I'd do with the data if collecting navigation timing data (e.g. for a real-user monitoring dashboard):

Collect records via PerformanceObserver as normal.

For each record, also grab entry.confidence.value and entry.confidence.randomizedTriggerRate.

When computing your p75 LCP or mean load time, apply the weighting formulas instead of a plain average — this gives you separate, corrected metrics for "typical" loads vs. "degraded" loads.

Use the "high" confidence mean/percentile as your "real" performance baseline, and use the "low" one to understand how bad things get in cold-start scenarios.

This last bit is what I meant by "what do you do with the data" - use it as a new baseline.

Does my problem now make sense?

chrisdavidmills requested a review from a team as a code owner March 23, 2026 09:56

chrisdavidmills requested review from hamishwillee and removed request for a team March 23, 2026 09:56

github-actions Bot added Content:WebAPI Web API docs size/m [PR only] 51-500 LoC changed labels Mar 23, 2026

chrisdavidmills changed the title ~~Document PerformanceNavigationTiming.confidence~~ Technical review: Document PerformanceNavigationTiming.confidence Mar 23, 2026

mmocny reviewed Mar 26, 2026

View reviewed changes

Comment thread files/en-us/web/api/performancenavigationtiming/confidence/index.md Outdated

Comment thread files/en-us/web/api/performancenavigationtiming/confidence/index.md Outdated

mwjacksonmsft reviewed Mar 27, 2026

View reviewed changes

Comment thread files/en-us/web/api/performancenavigationtiming/index.md Outdated

mwjacksonmsft reviewed Mar 27, 2026

View reviewed changes

Comment thread files/en-us/web/api/performancetimingconfidence/value/index.md Outdated

chrisdavidmills requested review from mmocny and mwjacksonmsft March 27, 2026 16:39

mwjacksonmsft reviewed Mar 27, 2026

View reviewed changes

chrisdavidmills changed the title ~~Technical review: Document PerformanceNavigationTiming.confidence~~ Editorial review: Document PerformanceNavigationTiming.confidence Mar 27, 2026