RDNA backend support and architecture-aware tests by mawad-amd · Pull Request #109 · AMDResearch/intellikit

mawad-amd · 2026-04-01T06:46:08Z

Summary

Add RDNA2 (gfx1030) and RDNA3 (gfx1100) backends with counter block limits and YAML metric definitions
Add GFX12 counter block limits to the existing RDNA4 (gfx1201) backend
Make tests and the profiling API architecture-aware — profiles silently filter unavailable metrics instead of crashing, explicit metric requests give clear errors
Document counter block limit provenance (ROCm/rocm-systems aqlprofile headers)
Fix device_info.py to work on non-writable home dirs and cache compiled binaries in-process

Hardware validation

Arch	GPU	Status
gfx942	MI300X	done
gfx942	MI325X	done
gfx950	MI350X	done
gfx950	MI355X	done
gfx1030	Radeon PRO V620	done
gfx1103	Radeon 780M	done
gfx1151	Radeon (RDNA3)
gfx1201	Radeon AI PRO R9700	done

Test plan

Unit tests pass on gfx942, gfx1103, gfx1030
Integration tests pass (minus timeouts on slow iGPUs)
example.py runs end-to-end on each validated arch
Test on gfx1151

🤖 Generated with Claude Code

Add per-hardware-block counter limits for RDNA2/3/4 backends so the profiler can bin-pack counters optimally instead of falling back to naive 6-per-pass chunking. Also document the provenance of all block limit values (ROCm/rocm-systems aqlprofile headers). - gfx1201: add _get_counter_groups() + _get_counter_block_limits() with 22 GFX12 blocks (SQG, SQC, GL2C, CHA, CHC, UTCL1, etc.) - gfx1151: inherits GFX12 limits from gfx1201 automatically - gfx1030 (new): RDNA2 backend with 23 GFX10 block limits - gfx1100 (new): RDNA3 backend with 23 GFX11 block limits - gfx90a, gfx942: add source provenance to block limits docstrings - __init__.py: register gfx1030-1032 and gfx1100-1103 aliases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fall back to /tmp/metrix for the gpu_query cache when $HOME/.cache is not writable (e.g. shared cluster nodes where $HOME is on a read-only NFS mount). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Compile gpu_query.hip to a temp file instead of caching in $HOME/.cache/metrix. The cache caused PermissionError on cluster nodes where $HOME is on a read-only NFS mount. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Avoid recompiling gpu_query.hip on every get_backend() call by caching the compiled binary path in a module-level variable. One hipcc invocation per Python process instead of per backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

query_device_specs() used to raise on arch mismatch (e.g. requesting gfx1100 on a gfx1103 device). But get_backend() already maps aliases to the right backend class, so the strict check just breaks legitimate mappings like gfx1103 -> GFX1100Backend and gfx950 -> GFX942Backend. Now uses the arch reported by the hardware directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add gfx1030/gfx1100/gfx1103 definitions for metrics that have available hardware counters on RDNA: - compute.gpu_utilization (GRBM_GUI_ACTIVE / GRBM_COUNT) - memory.l2_hit_rate (GL2C_HIT / GL2C_MISS) - memory.l2_bandwidth (GL2C hits+misses * 128B cacheline) - memory.bytes_transferred_l2 (GL2C total * 128B) - memory.hbm_read_bandwidth (GL2C_EA_RDREQ 32B/64B/96B/128B) - memory.hbm_write_bandwidth (GL2C_MC_WRREQ / GL2C_EA_WRREQ_64B) - memory.hbm_bandwidth_utilization (read+write vs peak) - memory.bytes_transferred_hbm (total read+write bytes) - memory.lds_bank_conflicts (SQC_LDS_BANK_CONFLICT / SQC_LDS_IDX_ACTIVE) Metrics NOT ported (no counters on RDNA2/3): - compute.total_flops — no per-dtype VALU counters - compute.*_arithmetic_intensity — needs FLOPS - memory.l1_hit_rate — no TCP counters exposed - memory.coalescing_efficiency — needs TCP counters Counter names verified via rocprof --list-basic on real gfx1103 hw. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- API now filters unavailable metrics from profiles/presets with warnings instead of crashing. Explicit metric requests get a clear error listing available alternatives. Falls back to time-only mode when no metrics are available. - Add requires_metric() test decorator that skips based on actual backend metric availability (no hardcoded arch families). - Fix test_init_default hardcoded arch allowlist. - Integration tests skip gracefully when CDNA-only metrics (coalescing, FLOPS, arithmetic intensity, L1 hit rate) are unavailable on the current GPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…alog The catalog is display metadata only. Actual computation lives in counter_defs.yaml and the backend implementations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mawad-amd and others added 13 commits March 31, 2026 23:16

Fix PermissionError when home directory is not writable

2a1f45f

Fall back to /tmp/metrix for the gpu_query cache when $HOME/.cache is not writable (e.g. shared cluster nodes where $HOME is on a read-only NFS mount). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove cache dir from gpu_query compilation

8cdca7e

Compile gpu_query.hip to a temp file instead of caching in $HOME/.cache/metrix. The cache caused PermissionError on cluster nodes where $HOME is on a read-only NFS mount. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Cache compiled gpu_query binary in-process

aacdba6

Avoid recompiling gpu_query.hip on every get_backend() call by caching the compiled binary path in a module-level variable. One hipcc invocation per Python process instead of per backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add test script for validating metrix on any GPU architecture

f0ebe30

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix example.py crash when metric has no catalog entry

f503c98

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add compute.gpu_utilization to metric catalog, remove helper scripts

7baa22c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix ruff lint: remove extraneous f-string prefix

8aea61c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix ruff formatting

b46e374

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Strip dead formula/derived_from/interpretation fields from metric cat…

befff8e

…alog The catalog is display metadata only. Actual computation lives in counter_defs.yaml and the backend implementations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mawad-amd requested a review from mehdi-saeedi April 1, 2026 07:24

mawad-amd force-pushed the muhaawad/rdna branch from b2cd489 to befff8e Compare April 3, 2026 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDNA backend support and architecture-aware tests#109

RDNA backend support and architecture-aware tests#109
mawad-amd wants to merge 13 commits intomainfrom
muhaawad/rdna

mawad-amd commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mawad-amd commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Hardware validation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mawad-amd commented Apr 1, 2026 •

edited

Loading