Benchmarks for characterizing L2 cache coherence behavior on AMD CDNA3 architecture (MI300X, gfx942).
GPU configured in SPX mode (single partition, 8 XCDs).
- cache_latency_test: Measures L1/L2 hit latencies and determines L1 write-allocation policy.
- test 1: Regular load/store behavior.
- test 2: NT load (L1 bypass) and snoop filter interaction.
- test 3: SC1 store + NT load interaction.
- test 4: Regular store + WBL2 + NT load.
- test 5: Regular store + SC1 load.
- test 6:
buffer_inv sc1behavior on cache lines. - test 7: WBL2 + buffer invalidation interaction.
- granularity: Determines probe filter tracking granularity (128B L2 cache line level).
- capacity: Measures probe filter directory capacity per HBM stack.
- invalidation_latency: SC1 store + NT load cross-XCD invalidation latency with home sweep. Reveals ring topology (A-B-C-D-A) and IOD pairings {0,1}, {2,3}, {4,5}, {6,7}.