Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 6, 2025

Problem

The codebase had hardcoded values for the number of Compute Units (CUs) / Streaming Multiprocessors (SMs) throughout examples and benchmark scripts:

  • Total SMs: hardcoded to 304 (MI300X) or 104 (MI250)
  • GEMM SMs: hardcoded to 288, 256, or 304 depending on algorithm
  • Communication SMs: hardcoded to 48 or similar values

These hardcoded values prevented the code from working optimally on different GPU architectures without manual modification.

Solution

Replaced hardcoded values with dynamic SM allocation logic inlined at each call site. The logic uses get_cu_count() from the HIP runtime API and applies algorithm-specific calculations:

1. all_scatter algorithm: Uses total CU count

  • Example: 304 CUs → 304 SMs for GEMM

2. wg_specialized algorithm: Uses next smaller power of 2

  • Example: 304 CUs → 256 SMs for GEMM

3. all_reduce algorithm: Reserves ~1/3 of leftover CUs for communication

  • Example: 304 CUs → 288 SMs for GEMM (304 - (304-256)÷3 = 288)
  • This leaves 16 SMs for communication operations

Changes

Updated 11 files across the repository:

  • GEMM examples (07, 08, 09, 10, 11, 12): Changed default argument values from hardcoded numbers to None, with inlined runtime detection logic in worker functions
  • Benchmark scripts: Added auto-detection with fallback to partition-based heuristics
  • Utility scripts: Updated to use dynamic detection where possible

All command-line arguments still support manual override for tuning and testing purposes.

Implementation

The SM calculation logic is inlined at each call site rather than using a helper function, making the allocation strategy explicit and localized:

# For all_scatter
args["gemm_sms"] = cu_count

# For wg_specialized
args["gemm_sms"] = 2 ** int(math.log2(cu_count)) if cu_count > 0 else 1

# For all_reduce
next_pow2 = 2 ** int(math.log2(cu_count)) if cu_count > 0 else 1
leftover = cu_count - next_pow2
args["gemm_sms"] = cu_count - leftover // 3

Testing

Verified that formulas produce expected results:

MI300X (304 CUs):

  • all_scatter: 304 ✓
  • wg_specialized: 256 ✓
  • all_reduce: 288 ✓
  • comm_sms: 48 ✓

MI250 (104 CUs):

  • all_scatter: 104 ✓
  • wg_specialized: 64 ✓
  • all_reduce: 91 ✓
  • comm_sms: 40 ✓

All files pass ruff linting and Python syntax validation.

Benefits

  • Portability: Code now works across different AMD GPU architectures without modification
  • Maintainability: No need to update hardcoded values when new GPUs are released
  • Accuracy: Always uses actual hardware capability
  • Flexibility: Manual override still available via CLI arguments
  • Clarity: Inlined logic makes SM allocation strategy explicit at each usage site

Fixes #197

Original prompt

This section details on the original issue you should resolve

<issue_title>Do not use hardcoded number of CUs</issue_title>
<issue_description>The code base has hardcoded values from number of CUs (or SMs) of 304 and other places where we define gemm_sms to be the next smaller power of 2 number (e.g., 256 When the number of CUs is 304 then comm_sms is 48).

All these numbers need to be programmatically set (even for default argument values). Use the hip module to get the number of CUs.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #197

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Do not use hardcoded number of CUs Replace hardcoded CU/SM values with dynamic detection using HIP runtime Oct 6, 2025
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 00:38
Copilot finished work on behalf of mawad-amd October 6, 2025 00:38
iris/hip.py Outdated
return cu_count.value


def get_default_gemm_sms(device_id=None, algorithm="all_scatter"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot do not add new functions. Inline the logic at call site.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the get_default_gemm_sms() function and inlined the SM calculation logic at each call site. Changes in commit 3b2e557.

…ction

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 00:50
Copilot finished work on behalf of mawad-amd October 6, 2025 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not use hardcoded number of CUs
2 participants