-
Notifications
You must be signed in to change notification settings - Fork 17
Add CUDA backend support for NVIDIA GPUs with automatic detection #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
README.md
Outdated
To use Iris with NVIDIA GPUs, set the backend before importing: | ||
|
||
```bash | ||
export IRIS_BACKEND=cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Let’s use pip install git+https://github.com/ROCm/iris.git --config-settings backend=nvidia
instead of env vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to support pip install --config-settings backend=nvidia
as the primary method. The implementation now:
- Uses a custom build backend (
build_backend.py
) that processes--config-settings
- Writes backend config to
iris/.config/backend.txt
at build time - Prioritizes build-time config over environment variables
- Still supports
IRIS_BACKEND
env var as fallback for flexibility
All 11 unit tests pass, including 3 new tests for build-time configuration.
Commit: [c255dee...latest]
- Add custom build backend (build_backend.py) to support --config-settings - Update pyproject.toml to use custom build backend - Modify iris/hip.py to prioritize build-time config over env vars - Update README with pip install --config-settings backend=nvidia examples - Add 3 new unit tests for build-time configuration (11 tests total pass) - Update example script to document both build-time and runtime methods - Add iris/.config/ to .gitignore Backend detection priority is now: 1. Build-time configuration (--config-settings backend=nvidia/hip) 2. IRIS_BACKEND environment variable (runtime) 3. Auto-detection based on available libraries 4. Default to HIP for backward compatibility Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
examples/cuda_backend_example.py
Outdated
Example demonstrating how to use Iris with CUDA backend for NVIDIA GPUs. | ||
This script shows how to set the backend and verify it's correctly loaded. | ||
This script shows how to use the CUDA backend and verify it's correctly loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the redundant line from the docstring.
Commit: 28965e6
README.md
Outdated
```shell | ||
# Note: Backend selection is via environment variable, not install-time config | ||
IRIS_BACKEND=cuda pip install git+https://github.com/ROCm/iris.git | ||
pip install git+https://github.com/ROCm/iris.git --config-settings backend=hip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot setup.py needs code like this
def build(setup_kwargs, config_settings=None):
backend = config_settings.get("backend", "amd")
os.environ["IRIS_BACKEND"] = backend
print(f"Building Iris with backend={backend}")
return setup_kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the build()
function to setup.py as suggested. The function:
- Handles
config_settings.get("backend", "amd")
- Sets
os.environ["IRIS_BACKEND"] = backend
- Normalizes backend names (nvidia→cuda, amd/rocm→hip)
- Prints "Building Iris with backend={backend}"
Also updated build_backend.py to set the environment variable for consistency.
Commit: 0a5992d
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
examples/cuda_backend_example.py
Outdated
2. Runtime (if not set at build time): | ||
export IRIS_BACKEND=cuda | ||
python cuda_backend_example.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot remove this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed examples/cuda_backend_example.py.
Commit: 698604c
- Add build() hook in setup.py to handle --config-settings - Sets IRIS_BACKEND environment variable during build - Normalizes backend names (nvidia→cuda, amd/rocm→hip) - Updated build_backend.py to also set env var for consistency - All 11 unit tests pass Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
old_env = os.environ.pop("IRIS_BACKEND", None) | ||
|
||
try: | ||
# Load hip.py directly to test detection logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot remove this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed tests/unittests/test_backend_detection.py.
Commit: 698604c
iris/_cuda.py
Outdated
return cu_count.value | ||
|
||
|
||
def get_rocm_version(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot remove this funciton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the get_rocm_version()
function from iris/_cuda.py
. This function was ROCm-specific and not used by the CUDA backend.
Commit: a162b9e
iris/_cuda.py
Outdated
return 1 | ||
|
||
|
||
def malloc_fine_grained(size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot remove malloc and free functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed malloc_fine_grained()
, hip_malloc()
, and hip_free()
functions from iris/_cuda.py
. These functions were not being used.
Commit: eaf980f
build_backend.py
Outdated
Custom build backend to support backend selection via --config-settings. | ||
This allows users to install Iris with: | ||
pip install . --config-settings backend=nvidia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot do we need this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, build_backend.py
is needed. It's referenced in pyproject.toml
as the build backend (line 6: build-backend = "build_backend"
). It wraps setuptools.build_meta and processes the --config-settings
to write the iris/_backend_selected.py
file during wheel/sdist builds. The build()
function in setup.py
provides similar functionality but for different build scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot setup.py already does this. Lets remove this file. Also remove any mention of env variables — we will not use env variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed build_backend.py
and all environment variable support:
- Deleted
build_backend.py
file - Updated
pyproject.toml
to usesetuptools.build_meta
directly - Removed all
IRIS_BACKEND
environment variable checks fromiris/hip.py
- Removed all env var documentation from README
- Backend detection now: 1) build-time config, 2) auto-detection, 3) default to HIP
Commit: ee8a18a
iris/hip.py
Outdated
# Load the appropriate backend module directly without triggering __init__.py | ||
_module_dir = os.path.dirname(__file__) | ||
if _backend == "cuda": | ||
_module_path = os.path.join(_module_dir, "_cuda.py") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot can we just import everything based on the if statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to use direct imports based on the if statement:
if _backend == "cuda":
from iris._cuda import *
else:
from iris._hip import *
This is much cleaner than the previous dynamic loading with importlib. Removed the importlib dependency as well.
Commit: eaf980f
- Removed get_rocm_version() function as it's not used in CUDA backend - Function is ROCm-specific and not needed for NVIDIA GPUs - get_num_xcc() in CUDA backend returns 1 directly without version check Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
- Remove malloc/free functions from _cuda.py (not used) - Simplify iris/hip.py to use direct imports instead of dynamic loading - Remove importlib.util dependency - Cleaner and more readable code structure Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
- Delete build_backend.py (setup.py handles --config-settings) - Update pyproject.toml to use setuptools.build_meta directly - Remove IRIS_BACKEND environment variable checking from iris/hip.py - Remove all env var documentation from README.md - Backend now selected via: 1) build-time config, 2) auto-detection, 3) default to HIP Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
- Restore README.md to state before PR changes - Keep only code changes without documentation updates - All linting checks pass Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
setup.py
Outdated
# Write backend selection to a Python file | ||
with open("iris/_backend_selected.py", "w") as f: | ||
f.write(f'BACKEND = "{backend}"\n') | ||
print(f"Building Iris with backend={backend}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot lets remove all these options and just rely on auto detection logic. Minimize and remove any unneeded changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified to auto-detection only. Removed all build-time configuration:
- Removed
build()
function from setup.py - Removed build-time config check from iris/hip.py
- Backend now purely auto-detected by trying to load GPU libraries
- Minimal changes kept
Commit: 96f462d
- Remove build() function from setup.py (back to original) - Simplify iris/hip.py to use only auto-detection (no build-time config) - Remove _backend_selected.py from .gitignore (not used) - Backend selection now purely based on library availability - Minimal changes - keeps only essential backend auto-detection Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
iris/_cuda.py
Outdated
@@ -0,0 +1,120 @@ | |||
# SPDX-License-Identifier: MIT | |||
# Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot let’s simplify even further and return to original hip.py and inside each function we branch for CUDA or HIP per is_amd_backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidated all backend logic into a single iris/hip.py
file with conditional branching based on _is_amd_backend
flag. Removed the separate _hip.py
and _cuda.py
files for maximum simplicity.
Commit: da69d1b
- Remove separate _hip.py and _cuda.py files - Add backend auto-detection at module load time (_is_amd_backend flag) - All functions now branch internally based on _is_amd_backend - Minimal changes - keeps original hip.py structure - Dynamic IPC handle size (64 for HIP, 128 for CUDA) - Zero configuration required - works out of the box Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Add CUDA Backend Support ✅
This PR implements CUDA backend support for Iris, enabling the framework to run seamlessly on both AMD GPUs (via HIP) and NVIDIA GPUs (via CUDA) with transparent backend auto-detection.
Summary
Iris now supports both AMD GPUs (HIP backend) and NVIDIA GPUs (CUDA backend) with automatic backend detection based on available GPU libraries. All backend logic is consolidated into a single
iris/hip.py
file with conditional branching.Changes Made
Unified Backend Module (
iris/hip.py
):_is_amd_backend
flag for internal useget_ipc_handle_size()
for use by iris.pyDynamic Sizing (
iris/iris.py
):get_ipc_handle_size()
importKey Features:
Code Quality:
Usage
Implementation Details
Backend Detection (in
iris/hip.py
):libamdhip64.so
→ sets_is_amd_backend = True
libcudart.so
→ sets_is_amd_backend = False
Conditional Functions:
hip_try()
- branches to hipGetErrorString or cudaGetErrorStringhipIpcMemHandle_t
- 64 bytes for HIP, 128 bytes for CUDAopen_ipc_handle()
- calls hipIpcOpenMemHandle or cudaIpcOpenMemHandle_is_amd_backend
Backend-Specific Behavior:
get_rocm_version()
- returns (-1, -1) for CUDAget_arch_string()
- returns GCN arch for HIP, compute capability for CUDA (e.g., "sm_90")get_num_xcc()
- returns actual count for AMD, 1 for NVIDIACode Statistics
iris/hip.py
,iris/iris.py
Backend Detection Logic
libamdhip64.so
→ use HIP backendlibcudart.so
→ use CUDA backendSupported Hardware
AMD GPUs (HIP Backend):
NVIDIA GPUs (CUDA Backend):
Notes
References
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.