Onumpy - GPU-Accelerated NumPy for AMD GPUs

Onumpy is a drop-in replacement for NumPy that automatically accelerates large array operations on AMD GPUs using OpenCL and rocBLAS.

🚀 Key Features

Automatic GPU Acceleration: Seamlessly uses GPU for large arrays (>100K elements)
100% NumPy Compatible: Drop-in replacement - no code changes needed
AMD RDNA1 Optimized: Tuned for RX 5700 XT (wavefront 64, 20 CUs, 8GB VRAM)
Smart Dispatch: Automatically selects GPU or CPU based on array size
Memory Efficient: Offloads large arrays to GPU VRAM
Native Math Functions: 2-5x speedup with native_exp, native_log, etc.
Vectorized Operations: 1.5-3x speedup with float4 vectorization

📊 Performance

Matrix Operations (rocBLAS)

10-100x speedup for matrices >1000×1000
Reduced CPU memory usage
Better GPU utilization

Element-wise Operations (OpenCL)

2-5x speedup with native math functions (native_exp, native_log)
1.5-3x speedup with vectorized operations (float4)
Parallel execution on 20 compute units

Reductions

10-50x speedup for arrays >1M elements
Efficient GPU parallel reduction

🎯 Quick Start

Installation

# Clone repository
# NOTE: Replace YOUR_GITHUB_USERNAME with your actual GitHub username
git clone https://github.com/YOUR_GITHUB_USERNAME/Onumpy.git
cd Onumpy

# Install dependencies
pip install numpy pyopencl

# Build and install
cd numpy_gpu
python setup.py build_ext --inplace
python setup.py install

See Installation Guide for detailed instructions.

Basic Usage

from numpy_bridge import np

# Works exactly like NumPy
arr = np.array([1, 2, 3, 4, 5])

# Automatically uses GPU for large operations
large_array = np.random.rand(1000000)
result = np.exp(large_array)  # Uses GPU automatically

# Explicit GPU methods
result = np.gpu_exp(large_array)  # Force GPU usage

# Check GPU availability
if np.GPU_AVAILABLE:
    print("GPU acceleration enabled!")

Advanced Usage

from numpy_bridge import np

# Matrix operations automatically use GPU for large matrices
a = np.random.rand(2000, 2000).astype(np.float32)
b = np.random.rand(2000, 2000).astype(np.float32)
result = np.matmul(a, b)  # Automatically uses GPU

# Element-wise operations
large_array = np.random.rand(1000000).astype(np.float32)
result = np.exp(large_array)  # GPU-accelerated

# Explicit GPU control
result = np.gpu_matmul(a, b)  # Force GPU usage
result = np.gpu_exp(large_array)  # Force GPU usage

📖 Documentation

Installation Guide - Detailed setup instructions
Performance Benchmarks - Actual performance data
API Reference - Complete API documentation
Examples - Practical usage examples
Consolidation Guide - Migration from standard NumPy

🔧 Requirements

Python: 3.8+
NumPy: 2.0+
OpenCL: ROCm 5.7+ for AMD GPUs
rocBLAS: Optional, for matrix operations (recommended)
GPU: AMD GPU with ROCm support (RX 5700 XT tested)

🆚 Why Onumpy?

Feature	NumPy	CuPy	Onumpy
AMD GPU Support	❌	⚠️ Limited	✅ Native
Drop-in Replacement	-	✅	✅
Automatic Dispatch	❌	❌	✅
RDNA1 Optimized	❌	❌	✅
OpenCL Support	❌	❌	✅
rocBLAS Integration	❌	⚠️ Partial	✅ Full

Onumpy is specifically designed for AMD GPUs and provides native OpenCL/ROCm support, making it the ideal choice for AMD GPU users who want GPU acceleration without switching to NVIDIA hardware.

🏗️ Architecture

Onumpy/
├── numpy_bridge.py          # Unified NumPy bridge (auto GPU dispatch)
├── numpy_gpu/               # GPU-accelerated extension
│   ├── api/                 # Python API
│   ├── src/                 # C/C++ implementation
│   ├── kernels/             # OpenCL kernels (RDNA1-optimized)
│   └── setup.py             # Build configuration
├── benchmarks/              # Performance benchmarks
└── docs/                    # Documentation

🎓 Use Cases

Machine Learning Training

Volkner: Mamba SSM and Transformer operations
Large batch training data processing
Model weight operations

Scientific Computing

Large-scale numerical simulations
Matrix operations on big datasets
Element-wise operations on large arrays

Data Processing

Vector operations on large datasets
Mathematical transformations
Statistical computations

🧪 Testing

# Run basic tests
cd numpy_gpu
python tests/test_basic.py

# Run benchmarks
python benchmarks/benchmark_matmul.py

# Run comprehensive benchmarks
cd ../benchmarks
python run_all_benchmarks.py

🤝 Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

📄 License

https://github.com/CharlesMcGowen/ONumpy/blob/main/LICENSE

🙏 Acknowledgments

NumPy team for the excellent base library
AMD for ROCm and OpenCL support
OpenCL community for GPU computing standards

📝 Notes

This is a custom extension, not a fork of NumPy
It extends NumPy with GPU capabilities while maintaining 100% compatibility
Optimized specifically for AMD RX 5700 XT (RDNA1 architecture)
Designed for Volkner's training workloads but works for any large array operations

Status: Alpha - Actively developed and tested on AMD RX 5700 XT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
analysis		analysis
benchmarks		benchmarks
docs		docs
examples		examples
numpy_gpu		numpy_gpu
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
numpy_bridge.py		numpy_bridge.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Onumpy - GPU-Accelerated NumPy for AMD GPUs

🚀 Key Features

📊 Performance

Matrix Operations (rocBLAS)

Element-wise Operations (OpenCL)

Reductions

🎯 Quick Start

Installation

Basic Usage

Advanced Usage

📖 Documentation

🔧 Requirements

🆚 Why Onumpy?

🏗️ Architecture

🎓 Use Cases

Machine Learning Training

Scientific Computing

Data Processing

🧪 Testing

🤝 Contributing

📄 License

🙏 Acknowledgments

📝 Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Onumpy - GPU-Accelerated NumPy for AMD GPUs

🚀 Key Features

📊 Performance

Matrix Operations (rocBLAS)

Element-wise Operations (OpenCL)

Reductions

🎯 Quick Start

Installation

Basic Usage

Advanced Usage

📖 Documentation

🔧 Requirements

🆚 Why Onumpy?

🏗️ Architecture

🎓 Use Cases

Machine Learning Training

Scientific Computing

Data Processing

🧪 Testing

🤝 Contributing

📄 License

🙏 Acknowledgments

📝 Notes

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages