Onumpy is a drop-in replacement for NumPy that automatically accelerates large array operations on AMD GPUs using OpenCL and rocBLAS.
- Automatic GPU Acceleration: Seamlessly uses GPU for large arrays (>100K elements)
- 100% NumPy Compatible: Drop-in replacement - no code changes needed
- AMD RDNA1 Optimized: Tuned for RX 5700 XT (wavefront 64, 20 CUs, 8GB VRAM)
- Smart Dispatch: Automatically selects GPU or CPU based on array size
- Memory Efficient: Offloads large arrays to GPU VRAM
- Native Math Functions: 2-5x speedup with
native_exp,native_log, etc. - Vectorized Operations: 1.5-3x speedup with float4 vectorization
- 10-100x speedup for matrices >1000Γ1000
- Reduced CPU memory usage
- Better GPU utilization
- 2-5x speedup with native math functions (
native_exp,native_log) - 1.5-3x speedup with vectorized operations (float4)
- Parallel execution on 20 compute units
- 10-50x speedup for arrays >1M elements
- Efficient GPU parallel reduction
# Clone repository
# NOTE: Replace YOUR_GITHUB_USERNAME with your actual GitHub username
git clone https://github.com/YOUR_GITHUB_USERNAME/Onumpy.git
cd Onumpy
# Install dependencies
pip install numpy pyopencl
# Build and install
cd numpy_gpu
python setup.py build_ext --inplace
python setup.py installSee Installation Guide for detailed instructions.
from numpy_bridge import np
# Works exactly like NumPy
arr = np.array([1, 2, 3, 4, 5])
# Automatically uses GPU for large operations
large_array = np.random.rand(1000000)
result = np.exp(large_array) # Uses GPU automatically
# Explicit GPU methods
result = np.gpu_exp(large_array) # Force GPU usage
# Check GPU availability
if np.GPU_AVAILABLE:
print("GPU acceleration enabled!")from numpy_bridge import np
# Matrix operations automatically use GPU for large matrices
a = np.random.rand(2000, 2000).astype(np.float32)
b = np.random.rand(2000, 2000).astype(np.float32)
result = np.matmul(a, b) # Automatically uses GPU
# Element-wise operations
large_array = np.random.rand(1000000).astype(np.float32)
result = np.exp(large_array) # GPU-accelerated
# Explicit GPU control
result = np.gpu_matmul(a, b) # Force GPU usage
result = np.gpu_exp(large_array) # Force GPU usage- Installation Guide - Detailed setup instructions
- Performance Benchmarks - Actual performance data
- API Reference - Complete API documentation
- Examples - Practical usage examples
- Consolidation Guide - Migration from standard NumPy
- Python: 3.8+
- NumPy: 2.0+
- OpenCL: ROCm 5.7+ for AMD GPUs
- rocBLAS: Optional, for matrix operations (recommended)
- GPU: AMD GPU with ROCm support (RX 5700 XT tested)
| Feature | NumPy | CuPy | Onumpy |
|---|---|---|---|
| AMD GPU Support | β | β Native | |
| Drop-in Replacement | - | β | β |
| Automatic Dispatch | β | β | β |
| RDNA1 Optimized | β | β | β |
| OpenCL Support | β | β | β |
| rocBLAS Integration | β | β Full |
Onumpy is specifically designed for AMD GPUs and provides native OpenCL/ROCm support, making it the ideal choice for AMD GPU users who want GPU acceleration without switching to NVIDIA hardware.
Onumpy/
βββ numpy_bridge.py # Unified NumPy bridge (auto GPU dispatch)
βββ numpy_gpu/ # GPU-accelerated extension
β βββ api/ # Python API
β βββ src/ # C/C++ implementation
β βββ kernels/ # OpenCL kernels (RDNA1-optimized)
β βββ setup.py # Build configuration
βββ benchmarks/ # Performance benchmarks
βββ docs/ # Documentation
- Volkner: Mamba SSM and Transformer operations
- Large batch training data processing
- Model weight operations
- Large-scale numerical simulations
- Matrix operations on big datasets
- Element-wise operations on large arrays
- Vector operations on large datasets
- Mathematical transformations
- Statistical computations
# Run basic tests
cd numpy_gpu
python tests/test_basic.py
# Run benchmarks
python benchmarks/benchmark_matmul.py
# Run comprehensive benchmarks
cd ../benchmarks
python run_all_benchmarks.pyContributions welcome! Please see CONTRIBUTING.md for guidelines.
https://github.com/CharlesMcGowen/ONumpy/blob/main/LICENSE
- NumPy team for the excellent base library
- AMD for ROCm and OpenCL support
- OpenCL community for GPU computing standards
- This is a custom extension, not a fork of NumPy
- It extends NumPy with GPU capabilities while maintaining 100% compatibility
- Optimized specifically for AMD RX 5700 XT (RDNA1 architecture)
- Designed for Volkner's training workloads but works for any large array operations
Status: Alpha - Actively developed and tested on AMD RX 5700 XT