Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 6, 2025

Add CUDA Backend Support ✅

This PR implements CUDA backend support for Iris, enabling the framework to run seamlessly on both AMD GPUs (via HIP) and NVIDIA GPUs (via CUDA) with transparent backend auto-detection.

Summary

Iris now supports both AMD GPUs (HIP backend) and NVIDIA GPUs (CUDA backend) with automatic backend detection based on available GPU libraries. All backend logic is consolidated into a single iris/hip.py file with conditional branching.

Changes Made

  1. Unified Backend Module (iris/hip.py):

    • Auto-detects backend at module load time by trying to load libamdhip64.so or libcudart.so
    • Sets _is_amd_backend flag for internal use
    • All functions branch internally based on backend type
    • Dynamic IPC handle size (64 bytes for HIP, 128 bytes for CUDA)
    • Exports get_ipc_handle_size() for use by iris.py
  2. Dynamic Sizing (iris/iris.py):

    • Added get_ipc_handle_size() import
    • Uses dynamic IPC handle size instead of hardcoded 64
    • Ensures proper IPC handle allocation for each backend
  3. Key Features:

    • Automatic backend detection based on available GPU libraries
    • Minimal code changes - original hip.py structure preserved with conditional branching
    • Consolidated implementation - all backend logic in single file
    • Zero configuration required - works out of the box
    • Falls back to HIP by default for backward compatibility
  4. Code Quality:

    • Minimal changes to existing codebase
    • No separate backend files - everything consolidated in iris/hip.py
    • Clean conditional logic throughout
    • No build-time configuration or environment variables

Usage

# Just install normally - backend auto-detected
pip install git+https://github.com/ROCm/iris.git

# Works automatically on both AMD and NVIDIA GPUs
import iris
ctx = iris.iris(heap_size=1 << 30)

Implementation Details

Backend Detection (in iris/hip.py):

  • Tries to load libamdhip64.so → sets _is_amd_backend = True
  • Falls back to libcudart.so → sets _is_amd_backend = False
  • Defaults to HIP for backward compatibility if both fail

Conditional Functions:

  • hip_try() - branches to hipGetErrorString or cudaGetErrorString
  • hipIpcMemHandle_t - 64 bytes for HIP, 128 bytes for CUDA
  • open_ipc_handle() - calls hipIpcOpenMemHandle or cudaIpcOpenMemHandle
  • All device/memory functions branch based on _is_amd_backend

Backend-Specific Behavior:

  • get_rocm_version() - returns (-1, -1) for CUDA
  • get_arch_string() - returns GCN arch for HIP, compute capability for CUDA (e.g., "sm_90")
  • get_num_xcc() - returns actual count for AMD, 1 for NVIDIA

Code Statistics

  • Modified files: iris/hip.py, iris/iris.py
  • Total additions: ~40 lines (conditional logic)
  • Files changed: 2
  • Commits: 17

Backend Detection Logic

  1. Try to load libamdhip64.so → use HIP backend
  2. Try to load libcudart.so → use CUDA backend
  3. Default to HIP for backward compatibility

Supported Hardware

AMD GPUs (HIP Backend):

  • MI300X, MI350X, MI355X
  • Requirements: ROCm 6.3.1+, PyTorch with ROCm

NVIDIA GPUs (CUDA Backend):

  • All CUDA-capable GPUs with compute capability 7.0+
  • Requirements: CUDA 11.0+, PyTorch with CUDA

Notes

  • Minimal changes per review feedback - consolidated into single file with conditional branching
  • Pure auto-detection with no configuration needed
  • Original hip.py structure preserved - only added conditional logic
  • Zero breaking changes - fully backward compatible
  • All changes follow existing Iris code style and patterns

References

Original prompt

This section details on the original issue you should resolve

<issue_title>Add CUDA backend</issue_title>
<issue_description>#### Summary
Allow Iris to be installed with --config-settings backend=nvidia so that iris.hip.* imports transparently redirect to CUDA equivalents (iris.cuda.*). This enables NVIDIA backend support without changing source code.

Goal

At install or runtime, users should be able to do:

pip install git+https://github.com/ROCm/iris.git --config-settings backend=nvidia

and have:

from iris.hip.runtime import hip_malloc

automatically resolve to:

from iris.cuda.runtime import hip_malloc

Cherry pick commits from https://github.com/astroC86/iris/tree/astroC86/cuda-port and #118</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #199

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 5 commits October 6, 2025 00:47
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Add CUDA backend Add CUDA backend support for NVIDIA GPUs with transparent runtime switching Oct 6, 2025
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 00:59
Copilot finished work on behalf of mawad-amd October 6, 2025 00:59
README.md Outdated
To use Iris with NVIDIA GPUs, set the backend before importing:

```bash
export IRIS_BACKEND=cuda
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Let’s use pip install git+https://github.com/ROCm/iris.git --config-settings backend=nvidia
instead of env vars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to support pip install --config-settings backend=nvidia as the primary method. The implementation now:

  1. Uses a custom build backend (build_backend.py) that processes --config-settings
  2. Writes backend config to iris/.config/backend.txt at build time
  3. Prioritizes build-time config over environment variables
  4. Still supports IRIS_BACKEND env var as fallback for flexibility

All 11 unit tests pass, including 3 new tests for build-time configuration.

Commit: [c255dee...latest]

- Add custom build backend (build_backend.py) to support --config-settings
- Update pyproject.toml to use custom build backend
- Modify iris/hip.py to prioritize build-time config over env vars
- Update README with pip install --config-settings backend=nvidia examples
- Add 3 new unit tests for build-time configuration (11 tests total pass)
- Update example script to document both build-time and runtime methods
- Add iris/.config/ to .gitignore

Backend detection priority is now:
1. Build-time configuration (--config-settings backend=nvidia/hip)
2. IRIS_BACKEND environment variable (runtime)
3. Auto-detection based on available libraries
4. Default to HIP for backward compatibility

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI changed the title Add CUDA backend support for NVIDIA GPUs with transparent runtime switching Add CUDA backend support for NVIDIA GPUs with build-time and runtime switching Oct 6, 2025
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 01:09
Copilot finished work on behalf of mawad-amd October 6, 2025 01:09
Example demonstrating how to use Iris with CUDA backend for NVIDIA GPUs.
This script shows how to set the backend and verify it's correctly loaded.
This script shows how to use the CUDA backend and verify it's correctly loaded.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot remove this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the redundant line from the docstring.

Commit: 28965e6

README.md Outdated
```shell
# Note: Backend selection is via environment variable, not install-time config
IRIS_BACKEND=cuda pip install git+https://github.com/ROCm/iris.git
pip install git+https://github.com/ROCm/iris.git --config-settings backend=hip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot setup.py needs code like this

def build(setup_kwargs, config_settings=None):
backend = config_settings.get("backend", "amd")
os.environ["IRIS_BACKEND"] = backend
print(f"Building Iris with backend={backend}")
return setup_kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the build() function to setup.py as suggested. The function:

  • Handles config_settings.get("backend", "amd")
  • Sets os.environ["IRIS_BACKEND"] = backend
  • Normalizes backend names (nvidia→cuda, amd/rocm→hip)
  • Prints "Building Iris with backend={backend}"

Also updated build_backend.py to set the environment variable for consistency.

Commit: 0a5992d

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 02:44
Copilot finished work on behalf of mawad-amd October 6, 2025 02:44
2. Runtime (if not set at build time):
export IRIS_BACKEND=cuda
python cuda_backend_example.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot remove this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed examples/cuda_backend_example.py.

Commit: 698604c

- Add build() hook in setup.py to handle --config-settings
- Sets IRIS_BACKEND environment variable during build
- Normalizes backend names (nvidia→cuda, amd/rocm→hip)
- Updated build_backend.py to also set env var for consistency
- All 11 unit tests pass

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
old_env = os.environ.pop("IRIS_BACKEND", None)

try:
# Load hip.py directly to test detection logic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot remove this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed tests/unittests/test_backend_detection.py.

Commit: 698604c

@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 02:51
Copilot finished work on behalf of mawad-amd October 6, 2025 03:02
iris/_cuda.py Outdated
return cu_count.value


def get_rocm_version():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot remove this funciton

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the get_rocm_version() function from iris/_cuda.py. This function was ROCm-specific and not used by the CUDA backend.

Commit: a162b9e

iris/_cuda.py Outdated
return 1


def malloc_fine_grained(size):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot remove malloc and free functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed malloc_fine_grained(), hip_malloc(), and hip_free() functions from iris/_cuda.py. These functions were not being used.

Commit: eaf980f

build_backend.py Outdated
Custom build backend to support backend selection via --config-settings.
This allows users to install Iris with:
pip install . --config-settings backend=nvidia
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot do we need this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, build_backend.py is needed. It's referenced in pyproject.toml as the build backend (line 6: build-backend = "build_backend"). It wraps setuptools.build_meta and processes the --config-settings to write the iris/_backend_selected.py file during wheel/sdist builds. The build() function in setup.py provides similar functionality but for different build scenarios.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot setup.py already does this. Lets remove this file. Also remove any mention of env variables — we will not use env variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed build_backend.py and all environment variable support:

  • Deleted build_backend.py file
  • Updated pyproject.toml to use setuptools.build_meta directly
  • Removed all IRIS_BACKEND environment variable checks from iris/hip.py
  • Removed all env var documentation from README
  • Backend detection now: 1) build-time config, 2) auto-detection, 3) default to HIP

Commit: ee8a18a

iris/hip.py Outdated
# Load the appropriate backend module directly without triggering __init__.py
_module_dir = os.path.dirname(__file__)
if _backend == "cuda":
_module_path = os.path.join(_module_dir, "_cuda.py")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot can we just import everything based on the if statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use direct imports based on the if statement:

if _backend == "cuda":
    from iris._cuda import *
else:
    from iris._hip import *

This is much cleaner than the previous dynamic loading with importlib. Removed the importlib dependency as well.

Commit: eaf980f

- Removed get_rocm_version() function as it's not used in CUDA backend
- Function is ROCm-specific and not needed for NVIDIA GPUs
- get_num_xcc() in CUDA backend returns 1 directly without version check

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copilot finished work on behalf of mawad-amd October 6, 2025 03:07
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 03:07
- Remove malloc/free functions from _cuda.py (not used)
- Simplify iris/hip.py to use direct imports instead of dynamic loading
- Remove importlib.util dependency
- Cleaner and more readable code structure

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copilot finished work on behalf of mawad-amd October 6, 2025 03:12
- Delete build_backend.py (setup.py handles --config-settings)
- Update pyproject.toml to use setuptools.build_meta directly
- Remove IRIS_BACKEND environment variable checking from iris/hip.py
- Remove all env var documentation from README.md
- Backend now selected via: 1) build-time config, 2) auto-detection, 3) default to HIP

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI changed the title Add CUDA backend support for NVIDIA GPUs with build-time and runtime switching Add CUDA backend support for NVIDIA GPUs with build-time switching Oct 6, 2025
Copilot finished work on behalf of mawad-amd October 6, 2025 04:16
- Restore README.md to state before PR changes
- Keep only code changes without documentation updates
- All linting checks pass

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copilot finished work on behalf of mawad-amd October 6, 2025 04:24
setup.py Outdated
# Write backend selection to a Python file
with open("iris/_backend_selected.py", "w") as f:
f.write(f'BACKEND = "{backend}"\n')
print(f"Building Iris with backend={backend}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot lets remove all these options and just rely on auto detection logic. Minimize and remove any unneeded changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified to auto-detection only. Removed all build-time configuration:

  • Removed build() function from setup.py
  • Removed build-time config check from iris/hip.py
  • Backend now purely auto-detected by trying to load GPU libraries
  • Minimal changes kept

Commit: 96f462d

- Remove build() function from setup.py (back to original)
- Simplify iris/hip.py to use only auto-detection (no build-time config)
- Remove _backend_selected.py from .gitignore (not used)
- Backend selection now purely based on library availability
- Minimal changes - keeps only essential backend auto-detection

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI changed the title Add CUDA backend support for NVIDIA GPUs with build-time switching Add CUDA backend support for NVIDIA GPUs with automatic detection Oct 6, 2025
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 05:22
Copilot finished work on behalf of mawad-amd October 6, 2025 05:22
iris/_cuda.py Outdated
@@ -0,0 +1,120 @@
# SPDX-License-Identifier: MIT
# Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot let’s simplify even further and return to original hip.py and inside each function we branch for CUDA or HIP per is_amd_backend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated all backend logic into a single iris/hip.py file with conditional branching based on _is_amd_backend flag. Removed the separate _hip.py and _cuda.py files for maximum simplicity.

Commit: da69d1b

- Remove separate _hip.py and _cuda.py files
- Add backend auto-detection at module load time (_is_amd_backend flag)
- All functions now branch internally based on _is_amd_backend
- Minimal changes - keeps original hip.py structure
- Dynamic IPC handle size (64 for HIP, 128 for CUDA)
- Zero configuration required - works out of the box

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 05:40
Copilot finished work on behalf of mawad-amd October 6, 2025 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CUDA backend
2 participants