Skip to content

MrRoy09/AMD-CDNA3-Cache-MicroBenchmarking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CDNA3 L2 Coherence Benchmarks

Benchmarks for characterizing L2 cache coherence behavior on AMD CDNA3 architecture (MI300X, gfx942).

GPU configured in SPX mode (single partition, 8 XCDs).

Structure

01 — Cache Latency

  • cache_latency_test: Measures L1/L2 hit latencies and determines L1 write-allocation policy.

02 — XCD Coherence Tests

  • test 1: Regular load/store behavior.
  • test 2: NT load (L1 bypass) and snoop filter interaction.
  • test 3: SC1 store + NT load interaction.
  • test 4: Regular store + WBL2 + NT load.
  • test 5: Regular store + SC1 load.
  • test 6: buffer_inv sc1 behavior on cache lines.
  • test 7: WBL2 + buffer invalidation interaction.

03 — Probe Filter

  • granularity: Determines probe filter tracking granularity (128B L2 cache line level).
  • capacity: Measures probe filter directory capacity per HBM stack.
  • invalidation_latency: SC1 store + NT load cross-XCD invalidation latency with home sweep. Reveals ring topology (A-B-C-D-A) and IOD pairings {0,1}, {2,3}, {4,5}, {6,7}.

About

Reverse engineering bits of CDNA 3 L2 Coherency Behaviour

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors