Skip to content

Conversation

ldionne
Copy link
Member

@ldionne ldionne commented Oct 2, 2025

Currently, the Run page (e.g. /db_default/v4/nts/<runid>) provides a way to select the aggregation function used when there are multiple samples in the run (that option is located under "View Options"). The supported options are min and median.

Similarly, the Graph page (e.g. /db_default/v4/nts/graph?<params>) provides a way to use mean instead of min to aggregate multiple data points for the same order on the graph. That is provided as a checkbox under the "Settings 🔧" menu.

This patch makes both selection mechanisms consistent by using a dropdown menu that provides min, max, mean and median in both cases.

As a drive-by, it reimplements how we select the sample to use as a reference for related data: we now select the point closest to the aggregated value instead of selecting the first point unconditionally.

@ldionne ldionne requested review from lukel97 and jroelofs October 2, 2025 23:10
# When aggregating multiple samples, it becomes unclear which sample to use for
# associated data like the run date, the order, etc. Use the index of the closest
# value in all the samples.
closest_value = sorted(values, key=lambda val: abs(val - agg_value))[0]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change I called out about reimplementing how we select related data associated to the aggregated point. Previously, for min we'd use the actual data point that we extracted the minimum value from. For mean we would always pick the first point's data (see how agg_mean returned 0 for the index).

Now, we select the point closest to the value we selected. I can't really imagine a case where this would give us an incorrect answer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't all samples have the same run and so the same order? Or do we end up aggregating samples from different runs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% certain I follow. Are you suggesting that we could select any index at random and it wouldn't matter (assuming all samples are from the same run)?

So far, I have been operating under the assumption that you could have different runs that provide samples for the same test and the same order. For example, you could run benchmark foo today on commit 123abc and submit a run for that:

run order test execution_time
1 123abc foo 10

Then, a few days later you decide to do more runs for the same test and the same commit since you find that point to be a bit noisy, and want to take the median-of-3 instead. You can do that by submitting your results with lnt import /path/to/db report.json --merge append. Note that lnt import defaults to --merge replace, which will replace the data for the existing run for the same order, which IMO is a bit strange but that's how it works. Your database now looks like this:

run order test execution_time
1 123abc foo 10
... ... ... ...
45 123abc foo 11
... ... ... ...
78 123abc foo 8

The samples you have for foo at commit 123abc all come from different runs, and the question is what run information you'll pick to display on the graph. With my patch:

  • if you aggregate using median (which is 10), we'd end up selecting run information for run 1, which is correct
  • if you aggregate using mean (which is 9.67), we'd end up selecting run information for run 1 since that's the closest data point to 9.67. I feel there's not a perfect choice
  • if you aggregate using min, we'll pick run 78, which is correct
  • if you aggregate using max, we'll pick run 45, which is correct
  • if you hypothetically had another sample of value 10 and were aggregating with median (which would be 10 == median(8, 10, 10, 11)), we'd pick the run information for either of these two 10 samples. That seems acceptable.

I just tested this with synthetic data and it seems to behave the way I described above. Let me know if that makes sense to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge assuming you're happy with this explanation. If not, let's talk and I can address that in another PR.

Copy link
Contributor

@lukel97 lukel97 Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, your explanation makes sense. The way that we use aggregation on cc-perf.igalia.com is that we use --exec-multisample=3 to get multiple samples in a single run.

But I didn't consider that you might end up with multiple samples over multiple runs if you use --merge append. No problems with the fix here, thanks for talking it through.

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tried this out locally and it works great. Btw very happy to see others hacking on LNT again!

# When aggregating multiple samples, it becomes unclear which sample to use for
# associated data like the run date, the order, etc. Use the index of the closest
# value in all the samples.
closest_value = sorted(values, key=lambda val: abs(val - agg_value))[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't all samples have the same run and so the same order? Or do we end up aggregating samples from different runs?

…ages

Currently, the Run page (e.g. /db_default/v4/nts/<runid>) provides a
way to select the aggregation function used when there are multiple
samples in the run (that option is located under "View Options").
The supported options are min and median.

Similarly, the Graph page (e.g. /db_default/v4/nts/graph?<params>)
provides a way to use mean instead of min to aggregate multiple
data points for the same order on the graph. That is provided as a
checkbox under the "Settings 🔧" menu.

This patch makes both selection mechanisms consistent by using a
dropdown menu that provides min, max, mean and median in both cases.

As a drive-by, it reimplements how we select the sample to use as a
reference for related data: we now select the point closest to the
aggregated value instead of selecting the first point unconditionally.
@ldionne ldionne force-pushed the review/introduce-consistent-aggregation branch from 32fa495 to 6dafde9 Compare October 3, 2025 20:33
@ldionne ldionne merged commit a86db1b into llvm:main Oct 3, 2025
1 of 2 checks passed
@ldionne ldionne deleted the review/introduce-consistent-aggregation branch October 3, 2025 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants