ASV demo by Micky774 · Pull Request #487 · ROCm/TransformerEngine

Micky774 · 2026-03-16T23:20:12Z

Description

This PR is a port of #478 to ASV.

Benefits:

Since ASV is a popular and mature OSS project, and is broadly adopted by other large communities (e.g. pandas, scikit-learn, numpy, etc.), there is a significantly reduced maintenance and development burden. It's already feature-rich and tested through the community, allowing us to make use of it and focus on the actual benchmark design rather than implementation detials.
ASV accumulates results per commit on dev in Artifactory, so each CI run only benchmarks the
current commit on an update to dev.
Benchmark classes follow a well-documented convention (setup/time_*), making it easy for anyone familiar with ASV to add new benchmarks without understanding custom infrastructure. Similarly, people unfamiliar with ASV can quickly onboard with existing documentation.
Failures are reported per parameter combination without crashing the suite.
ASV stores results per commit as JSON in Artifactory, enabling asv publish to generate an HTML dashboard showing performance trends across the full commit history. This can be done statically via github pages in a separate repo for a permanent benchmark/regression dashboard.
The ASV CI steps are ~75 lines total (restore/run/upload), with no custom run management.

Downsides:

ASV relies on subprocess isolation per-config, which is very expensive for TE which has ~4-6s import time (resulting in minutes per benchmark, w/ 99.99% of time spent importing). This means that traditional ASV benchmarking can end up taking a while.
ASV is a CI-first benchmarking project, which means that adapting it for local/development benchmarking requires additional infrastructure to maintain.
Less initial flexibility in parsing results into e.g. custom analysis and visualizations -- though this can be remedied by allowing local (non-CI) runs to dump raw timings for custom handling.

Considerations

This PR comes with a helper script, and an adapter script to allow for direct same-process benchmarking (to avoid the subprocess TE import overhead) resulting in fast and efficient benchmarks. We could, if we wanted to, utilize that script for CI as well. This would greatly decrease the cost of benchmarking, but would be less robust and require more careful maintenance. For the most part, this maintenance cost isn't that great, since ASV only has a couple releases a year, and is a stable project, meaning we can pin our version and not need to worry about API/format changes.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Adds ASV benchmarks
Updates CI to read/generate/write ASV results to artifactory
Adds README.md for documentation

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Micky774 · 2026-03-17T14:52:50Z

Note the CI failure is unrelated

Micky774 · 2026-03-18T21:44:28Z

I've added a helper script like @alextmagro had suggested, as well as corresponding documentation to the README.md.

ipanfilo · 2026-03-19T18:01:29Z

.github/workflows/rocm-ci.yml

          EOF
          )"

+      - name: Restore previous ASV results


I think benchmarks should go separate workflow from CI. I.e. these microbenchmarks and ones that are already run with CI

Will doing so require a separate TE build and setup? I added it here so that we'd piggy-back off of already running CI.

ipanfilo · 2026-03-19T18:02:59Z

benchmarks/asv/asv.conf.json

@@ -0,0 +1,16 @@
+{


Does it need to be in root of TE?

No, I've updated it

ipanfilo · 2026-03-19T18:19:03Z

.github/workflows/rocm-ci.yml

+
+          # Derive a stable machine name from the runner label
+          case "${RUNNER_NAME}" in
+            linux-te-mi325*) MACHINE_NAME="mi325" ;;


Why do we need it if results are uploaded with just matrix.runner name?

So, my understanding is that the matrix.runner name is not 1-1 with the underlying system, i.e. different systems with different machine names can be part of a pool with the same runner name. ASV by default stores results by machine name. Here, we are manually specifying a generic machine name indexed by gpu arch so that each e.g. mi325 runner will store its results in a compatible way.

Ideally, we have dedicated machines for benchmarking (since this would likely be every commit or nightly even), but that's a constraint we'll need to discuss.

ipanfilo · 2026-03-19T18:19:36Z

.github/workflows/rocm-ci.yml

+          set -ex
+          pip install asv
+          cd /workspace
+          asv machine --yes --machine "$MACHINE_NAME"


Will it re-register machine if it exists already?

Yes, but it's registered in the container so it's transient

Micky774 added 2 commits March 16, 2026 18:02

Initial benchmark porting to ASV

d7c643c

Update casting benchmark

b829122

Micky774 marked this pull request as ready for review March 17, 2026 13:58

Micky774 requested review from ipanfilo, wangye805 and wenchenvincent as code owners March 17, 2026 13:58

Micky774 mentioned this pull request Mar 17, 2026

Microbenchmarking and CI performance regression test #478

Open

17 tasks

Added helper script and documentation

21678b4

Corrected local benchmarking

6cb91a5

ipanfilo reviewed Mar 19, 2026

View reviewed changes

Micky774 added 2 commits March 19, 2026 13:28

Added direct-run option to bypass subprocess overhead

1a98989

Refactor to prefer direct runs, and moved asv conf

498f16d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASV demo#487

ASV demo#487
Micky774 wants to merge 6 commits intodevfrom
zain/asv-demo

Micky774 commented Mar 16, 2026 •

edited

Loading

Uh oh!

Micky774 commented Mar 17, 2026

Uh oh!

Micky774 commented Mar 18, 2026

Uh oh!

ipanfilo Mar 19, 2026

Uh oh!

Micky774 Mar 20, 2026

Uh oh!

ipanfilo Mar 19, 2026

Uh oh!

Micky774 Mar 20, 2026

Uh oh!

ipanfilo Mar 19, 2026

Uh oh!

Micky774 Mar 20, 2026

Uh oh!

ipanfilo Mar 19, 2026

Uh oh!

Micky774 Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Micky774 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Benefits:

Downsides:

Considerations

Type of change

Changes

Checklist:

Uh oh!

Micky774 commented Mar 17, 2026

Uh oh!

Micky774 commented Mar 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Micky774 commented Mar 16, 2026 •

edited

Loading