Add CUDA backend support for NVIDIA GPUs with automatic detection #200

Copilot · 2025-10-06T00:35:20Z

Add CUDA Backend Support ✅

This PR implements CUDA backend support for Iris, enabling the framework to run seamlessly on both AMD GPUs (via HIP) and NVIDIA GPUs (via CUDA) with transparent backend auto-detection.

Summary

Iris now supports both AMD GPUs (HIP backend) and NVIDIA GPUs (CUDA backend) with automatic backend detection based on available GPU libraries. All backend logic is consolidated into a single iris/hip.py file with conditional branching.

Changes Made

Unified Backend Module (iris/hip.py):
- Auto-detects backend at module load time by trying to load libamdhip64.so or libcudart.so
- Sets _is_amd_backend flag for internal use
- All functions branch internally based on backend type
- Dynamic IPC handle size (64 bytes for HIP, 128 bytes for CUDA)
- Exports get_ipc_handle_size() for use by iris.py
Dynamic Sizing (iris/iris.py):
- Added get_ipc_handle_size() import
- Uses dynamic IPC handle size instead of hardcoded 64
- Ensures proper IPC handle allocation for each backend
Key Features:
- Automatic backend detection based on available GPU libraries
- Minimal code changes - original hip.py structure preserved with conditional branching
- Consolidated implementation - all backend logic in single file
- Zero configuration required - works out of the box
- Falls back to HIP by default for backward compatibility
Code Quality:
- Minimal changes to existing codebase
- No separate backend files - everything consolidated in iris/hip.py
- Clean conditional logic throughout
- No build-time configuration or environment variables

Usage

# Just install normally - backend auto-detected
pip install git+https://github.com/ROCm/iris.git

# Works automatically on both AMD and NVIDIA GPUs
import iris
ctx = iris.iris(heap_size=1 << 30)

Implementation Details

Backend Detection (in iris/hip.py):

Tries to load libamdhip64.so → sets _is_amd_backend = True
Falls back to libcudart.so → sets _is_amd_backend = False
Defaults to HIP for backward compatibility if both fail

Conditional Functions:

hip_try() - branches to hipGetErrorString or cudaGetErrorString
hipIpcMemHandle_t - 64 bytes for HIP, 128 bytes for CUDA
open_ipc_handle() - calls hipIpcOpenMemHandle or cudaIpcOpenMemHandle
All device/memory functions branch based on _is_amd_backend

Backend-Specific Behavior:

get_rocm_version() - returns (-1, -1) for CUDA
get_arch_string() - returns GCN arch for HIP, compute capability for CUDA (e.g., "sm_90")
get_num_xcc() - returns actual count for AMD, 1 for NVIDIA

Code Statistics

Modified files: iris/hip.py, iris/iris.py
Total additions: ~40 lines (conditional logic)
Files changed: 2
Commits: 17

Backend Detection Logic

Try to load libamdhip64.so → use HIP backend
Try to load libcudart.so → use CUDA backend
Default to HIP for backward compatibility

Supported Hardware

AMD GPUs (HIP Backend):

MI300X, MI350X, MI355X
Requirements: ROCm 6.3.1+, PyTorch with ROCm

NVIDIA GPUs (CUDA Backend):

All CUDA-capable GPUs with compute capability 7.0+
Requirements: CUDA 11.0+, PyTorch with CUDA

Notes

Minimal changes per review feedback - consolidated into single file with conditional branching
Pure auto-detection with no configuration needed
Original hip.py structure preserved - only added conditional logic
Zero breaking changes - fully backward compatible
All changes follow existing Iris code style and patterns

References

Based on concepts from astroC86/iris cuda-port branch
Related to PR CUDA Port #118
Resolves issue Add CUDA backend #199

Original prompt

This section details on the original issue you should resolve

<issue_title>Add CUDA backend</issue_title>
<issue_description>#### Summary
Allow Iris to be installed with --config-settings backend=nvidia so that iris.hip.* imports transparently redirect to CUDA equivalents (iris.cuda.*). This enables NVIDIA backend support without changing source code.

Goal

At install or runtime, users should be able to do:
pip install git+https://github.com/ROCm/iris.git --config-settings backend=nvidia
and have:
from iris.hip.runtime import hip_malloc
automatically resolve to:
from iris.cuda.runtime import hip_malloc
Cherry pick commits from https://github.com/astroC86/iris/tree/astroC86/cuda-port and #118</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #199

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

mawad-amd · 2025-10-06T00:59:50Z

README.md

+To use Iris with NVIDIA GPUs, set the backend before importing:
+
+```bash
+export IRIS_BACKEND=cuda


@copilot Let’s use pip install git+https://github.com/ROCm/iris.git --config-settings backend=nvidia
instead of env vars

Updated to support pip install --config-settings backend=nvidia as the primary method. The implementation now:

Uses a custom build backend (build_backend.py) that processes --config-settings

Writes backend config to iris/.config/backend.txt at build time

Prioritizes build-time config over environment variables

Still supports IRIS_BACKEND env var as fallback for flexibility

All 11 unit tests pass, including 3 new tests for build-time configuration.

Commit: [c255dee...latest]

- Add custom build backend (build_backend.py) to support --config-settings - Update pyproject.toml to use custom build backend - Modify iris/hip.py to prioritize build-time config over env vars - Update README with pip install --config-settings backend=nvidia examples - Add 3 new unit tests for build-time configuration (11 tests total pass) - Update example script to document both build-time and runtime methods - Add iris/.config/ to .gitignore Backend detection priority is now: 1. Build-time configuration (--config-settings backend=nvidia/hip) 2. IRIS_BACKEND environment variable (runtime) 3. Auto-detection based on available libraries 4. Default to HIP for backward compatibility Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

mawad-amd · 2025-10-06T02:41:39Z

examples/cuda_backend_example.py

 Example demonstrating how to use Iris with CUDA backend for NVIDIA GPUs.

-This script shows how to set the backend and verify it's correctly loaded.
+This script shows how to use the CUDA backend and verify it's correctly loaded.


@copilot remove this

Removed the redundant line from the docstring.

Commit: 28965e6

README.md

mawad-amd · 2025-10-06T02:42:46Z

README.md

 ```shell
-# Note: Backend selection is via environment variable, not install-time config
-IRIS_BACKEND=cuda pip install git+https://github.com/ROCm/iris.git
+pip install git+https://github.com/ROCm/iris.git --config-settings backend=hip


@copilot setup.py needs code like this

def build(setup_kwargs, config_settings=None):
backend = config_settings.get("backend", "amd")
os.environ["IRIS_BACKEND"] = backend
print(f"Building Iris with backend={backend}")
return setup_kwargs

Added the build() function to setup.py as suggested. The function:

Handles config_settings.get("backend", "amd")

Sets os.environ["IRIS_BACKEND"] = backend

Normalizes backend names (nvidia→cuda, amd/rocm→hip)

Prints "Building Iris with backend={backend}"

Also updated build_backend.py to set the environment variable for consistency.

Commit: 0a5992d

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

mawad-amd · 2025-10-06T02:50:38Z

examples/cuda_backend_example.py

+
+2. Runtime (if not set at build time):
+   export IRIS_BACKEND=cuda
+   python cuda_backend_example.py


@copilot remove this file

Removed examples/cuda_backend_example.py.

Commit: 698604c

- Add build() hook in setup.py to handle --config-settings - Sets IRIS_BACKEND environment variable during build - Normalizes backend names (nvidia→cuda, amd/rocm→hip) - Updated build_backend.py to also set env var for consistency - All 11 unit tests pass Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

iris/_cuda.py

mawad-amd · 2025-10-06T02:51:17Z

tests/unittests/test_backend_detection.py

+    old_env = os.environ.pop("IRIS_BACKEND", None)
+
+    try:
+        # Load hip.py directly to test detection logic


@copilot remove this file

Removed tests/unittests/test_backend_detection.py.

Commit: 698604c

mawad-amd · 2025-10-06T03:03:14Z

iris/_cuda.py

+    return cu_count.value
+
+
+def get_rocm_version():


@copilot remove this funciton

Removed the get_rocm_version() function from iris/_cuda.py. This function was ROCm-specific and not used by the CUDA backend.

Commit: a162b9e

mawad-amd · 2025-10-06T03:03:32Z

iris/_cuda.py

+    return 1
+
+
+def malloc_fine_grained(size):


@copilot remove malloc and free functions

Removed malloc_fine_grained(), hip_malloc(), and hip_free() functions from iris/_cuda.py. These functions were not being used.

Commit: eaf980f

mawad-amd · 2025-10-06T03:04:37Z

build_backend.py

+Custom build backend to support backend selection via --config-settings.
+
+This allows users to install Iris with:
+    pip install . --config-settings backend=nvidia


@copilot do we need this file?

Yes, build_backend.py is needed. It's referenced in pyproject.toml as the build backend (line 6: build-backend = "build_backend"). It wraps setuptools.build_meta and processes the --config-settings to write the iris/_backend_selected.py file during wheel/sdist builds. The build() function in setup.py provides similar functionality but for different build scenarios.

@copilot setup.py already does this. Lets remove this file. Also remove any mention of env variables — we will not use env variables.

Removed build_backend.py and all environment variable support:

Deleted build_backend.py file

Updated pyproject.toml to use setuptools.build_meta directly

Removed all IRIS_BACKEND environment variable checks from iris/hip.py

Removed all env var documentation from README

Backend detection now: 1) build-time config, 2) auto-detection, 3) default to HIP

Commit: ee8a18a

mawad-amd · 2025-10-06T03:05:23Z

iris/hip.py

+# Load the appropriate backend module directly without triggering __init__.py
+_module_dir = os.path.dirname(__file__)
+if _backend == "cuda":
+    _module_path = os.path.join(_module_dir, "_cuda.py")


@copilot can we just import everything based on the if statement?

Changed to use direct imports based on the if statement:

if _backend == "cuda": from iris._cuda import * else: from iris._hip import *

This is much cleaner than the previous dynamic loading with importlib. Removed the importlib dependency as well.

Commit: eaf980f

- Removed get_rocm_version() function as it's not used in CUDA backend - Function is ROCm-specific and not needed for NVIDIA GPUs - get_num_xcc() in CUDA backend returns 1 directly without version check Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

- Remove malloc/free functions from _cuda.py (not used) - Simplify iris/hip.py to use direct imports instead of dynamic loading - Remove importlib.util dependency - Cleaner and more readable code structure Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

- Delete build_backend.py (setup.py handles --config-settings) - Update pyproject.toml to use setuptools.build_meta directly - Remove IRIS_BACKEND environment variable checking from iris/hip.py - Remove all env var documentation from README.md - Backend now selected via: 1) build-time config, 2) auto-detection, 3) default to HIP Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

- Restore README.md to state before PR changes - Keep only code changes without documentation updates - All linting checks pass Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

mawad-amd · 2025-10-06T05:16:24Z

setup.py

+    # Write backend selection to a Python file
+    with open("iris/_backend_selected.py", "w") as f:
+        f.write(f'BACKEND = "{backend}"\n')
+    print(f"Building Iris with backend={backend}")


@copilot lets remove all these options and just rely on auto detection logic. Minimize and remove any unneeded changes.

Simplified to auto-detection only. Removed all build-time configuration:

Removed build() function from setup.py

Removed build-time config check from iris/hip.py

Backend now purely auto-detected by trying to load GPU libraries

Minimal changes kept

Commit: 96f462d

- Remove build() function from setup.py (back to original) - Simplify iris/hip.py to use only auto-detection (no build-time config) - Remove _backend_selected.py from .gitignore (not used) - Backend selection now purely based on library availability - Minimal changes - keeps only essential backend auto-detection Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

mawad-amd · 2025-10-06T05:34:04Z

iris/_cuda.py

@@ -0,0 +1,120 @@
+# SPDX-License-Identifier: MIT
+# Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved.


@copilot let’s simplify even further and return to original hip.py and inside each function we branch for CUDA or HIP per is_amd_backend

Consolidated all backend logic into a single iris/hip.py file with conditional branching based on _is_amd_backend flag. Removed the separate _hip.py and _cuda.py files for maximum simplicity.

Commit: da69d1b

- Remove separate _hip.py and _cuda.py files - Add backend auto-detection at module load time (_is_amd_backend flag) - All functions now branch internally based on _is_amd_backend - Minimal changes - keeps original hip.py structure - Dynamic IPC handle size (64 for HIP, 128 for CUDA) - Zero configuration required - works out of the box Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Initial plan

d312f5e

Copilot AI assigned Copilot and mawad-amd Oct 6, 2025

Copilot started work on behalf of mawad-amd October 6, 2025 00:35 View session

Copilot AI and others added 5 commits October 6, 2025 00:47

Add CUDA backend support with runtime detection

c255dee

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Add unit tests and fix linting issues for CUDA backend

259afe6

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Update README with CUDA backend documentation

3f3e78e

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Add CUDA backend example script

c6a9aa3

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Fix linting issues in example script

0ba8573

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add CUDA backend~~ Add CUDA backend support for NVIDIA GPUs with transparent runtime switching Oct 6, 2025

Copilot AI requested a review from mawad-amd October 6, 2025 00:59

Copilot finished work on behalf of mawad-amd October 6, 2025 00:59

mawad-amd reviewed Oct 6, 2025

View reviewed changes

Copilot started work on behalf of mawad-amd October 6, 2025 01:00 View session

Copilot AI changed the title ~~Add CUDA backend support for NVIDIA GPUs with transparent runtime switching~~ Add CUDA backend support for NVIDIA GPUs with build-time and runtime switching Oct 6, 2025

Copilot AI requested a review from mawad-amd October 6, 2025 01:09

Copilot finished work on behalf of mawad-amd October 6, 2025 01:09

mawad-amd reviewed Oct 6, 2025

View reviewed changes

Copilot started work on behalf of mawad-amd October 6, 2025 02:42 View session

mawad-amd reviewed Oct 6, 2025

View reviewed changes

Remove redundant line from cuda_backend_example.py docstring

0a5992d

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI requested a review from mawad-amd October 6, 2025 02:44

Copilot finished work on behalf of mawad-amd October 6, 2025 02:44

Copilot started work on behalf of mawad-amd October 6, 2025 02:45 View session

mawad-amd reviewed Oct 6, 2025

View reviewed changes

iris/_cuda.py Outdated Show resolved Hide resolved

mawad-amd reviewed Oct 6, 2025

View reviewed changes

Copilot AI requested a review from mawad-amd October 6, 2025 02:51

Copilot finished work on behalf of mawad-amd October 6, 2025 03:02

mawad-amd reviewed Oct 6, 2025

View reviewed changes

Copilot started work on behalf of mawad-amd October 6, 2025 03:04 View session

mawad-amd reviewed Oct 6, 2025

View reviewed changes

Copilot finished work on behalf of mawad-amd October 6, 2025 03:07

Copilot AI requested a review from mawad-amd October 6, 2025 03:07

Copilot started work on behalf of mawad-amd October 6, 2025 03:08 View session

Copilot finished work on behalf of mawad-amd October 6, 2025 03:12

Copilot started work on behalf of mawad-amd October 6, 2025 04:11 View session

Copilot AI changed the title ~~Add CUDA backend support for NVIDIA GPUs with build-time and runtime switching~~ Add CUDA backend support for NVIDIA GPUs with build-time switching Oct 6, 2025

Copilot finished work on behalf of mawad-amd October 6, 2025 04:16

Copilot started work on behalf of mawad-amd October 6, 2025 04:21 View session

Revert README.md to original state

96f462d

- Restore README.md to state before PR changes - Keep only code changes without documentation updates - All linting checks pass Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot finished work on behalf of mawad-amd October 6, 2025 04:24

mawad-amd reviewed Oct 6, 2025

View reviewed changes

Copilot started work on behalf of mawad-amd October 6, 2025 05:17 View session

Copilot AI changed the title ~~Add CUDA backend support for NVIDIA GPUs with build-time switching~~ Add CUDA backend support for NVIDIA GPUs with automatic detection Oct 6, 2025

Copilot AI requested a review from mawad-amd October 6, 2025 05:22

Copilot finished work on behalf of mawad-amd October 6, 2025 05:22

mawad-amd reviewed Oct 6, 2025

View reviewed changes

Copilot started work on behalf of mawad-amd October 6, 2025 05:34 View session

Copilot AI requested a review from mawad-amd October 6, 2025 05:40

Copilot finished work on behalf of mawad-amd October 6, 2025 05:40

		@@ -0,0 +1,120 @@
		# SPDX-License-Identifier: MIT
		# Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved.

Add CUDA backend support for NVIDIA GPUs with automatic detection #200

Are you sure you want to change the base?

Add CUDA backend support for NVIDIA GPUs with automatic detection #200

Uh oh!

Conversation

Copilot AI commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!