|
1 | 1 | # GPU Programming 101 🚀 |
2 | 2 |
|
3 | 3 | [](https://opensource.org/licenses/MIT) |
4 | | -[](https://developer.nvidia.com/cuda-toolkit) |
5 | | -[](https://rocmdocs.amd.com/) |
| 4 | +[](https://developer.nvidia.com/cuda-toolkit) |
| 5 | +[](https://rocmdocs.amd.com/) |
6 | 6 | [](https://www.docker.com/) |
| 7 | +[](modules/) |
7 | 8 | [](https://github.com/features/actions) |
8 | 9 |
|
9 | 10 | **A comprehensive, hands-on educational project for mastering GPU programming with CUDA and HIP** |
@@ -118,20 +119,79 @@ cd modules/module1/examples |
118 | 119 | ## 🛠️ Prerequisites |
119 | 120 |
|
120 | 121 | ### Hardware Requirements |
121 | | -- **GPU**: NVIDIA GTX 1060+ or AMD RX 580+ (4GB+ VRAM recommended) |
122 | | -- **System**: 8GB+ RAM (16GB+ recommended for advanced modules) |
| 122 | + |
| 123 | +#### NVIDIA GPU Systems |
| 124 | +- **Minimum GPU**: GTX 1060 6GB, GTX 1650, RTX 2060 or better |
| 125 | +- **Recommended GPU**: RTX 3070/4070 (12GB+), RTX 3080/4080 (16GB+) |
| 126 | +- **Professional/Advanced**: RTX 4090 (24GB), RTX A6000 (48GB), Tesla/Quadro series |
| 127 | +- **Architecture Support**: Maxwell, Pascal, Volta, Turing, Ampere, Ada Lovelace, Hopper |
| 128 | +- **Compute Capability**: 5.0+ (Maxwell architecture or newer) |
| 129 | + |
| 130 | +#### AMD GPU Systems |
| 131 | +- **Minimum GPU**: RX 580 8GB, RX 6600, RX 7600 or better |
| 132 | +- **Recommended GPU**: RX 6700 XT/7700 XT (12GB+), RX 6800 XT/7800 XT (16GB+) |
| 133 | +- **Professional/Advanced**: RX 7900 XTX (24GB), Radeon PRO W7800 (48GB), Instinct MI series |
| 134 | +- **Architecture Support**: RDNA2, RDNA3, RDNA4, GCN 5.0+, CDNA series |
| 135 | +- **ROCm Compatibility**: Officially supported AMD GPUs only |
| 136 | + |
| 137 | +#### System Memory & CPU |
| 138 | +- **Minimum RAM**: 16GB system RAM |
| 139 | +- **Recommended RAM**: 32GB+ for advanced modules and multi-GPU setups |
| 140 | +- **Professional Setup**: 64GB+ for large-scale scientific computing |
| 141 | +- **CPU Requirements**: |
| 142 | + - **Intel**: Haswell (2013) or newer for PCIe atomics support |
| 143 | + - **AMD**: Zen 1 (2017) or newer for PCIe atomics support |
| 144 | +- **Storage**: 20GB+ free space for Docker containers and examples |
123 | 145 |
|
124 | 146 | ### Software Requirements |
125 | | -- **OS**: Linux (recommended), Windows 10/11, or macOS |
126 | | -- **CUDA**: 11.0+ for NVIDIA GPUs |
127 | | -- **ROCm**: 5.0+ for AMD GPUs |
128 | | -- **Compiler**: GCC 7+, Clang 8+, or MSVC 2019+ |
129 | | -- **Docker**: For containerized development (recommended) |
| 147 | + |
| 148 | +#### Operating System Support |
| 149 | +- **Linux** (Recommended): Ubuntu 22.04 LTS, RHEL 8/9, SLES 15 SP5 |
| 150 | +- **Windows**: Windows 10/11 with WSL2 recommended for optimal compatibility |
| 151 | +- **macOS**: macOS 12+ (Metal Performance Shaders for basic GPU compute) |
| 152 | + |
| 153 | +#### GPU Computing Platforms |
| 154 | +- **CUDA Toolkit**: 12.0+ (Docker uses CUDA 12.9.1) |
| 155 | + - **Driver Requirements**: |
| 156 | + - Linux: 550.54.14+ for CUDA 12.4+ |
| 157 | + - Windows: 551.61+ for CUDA 12.4+ |
| 158 | +- **ROCm Platform**: 6.0+ (Docker uses ROCm 6.4.3) |
| 159 | + - **Driver Requirements**: Latest AMDGPU-PRO or open-source AMDGPU drivers |
| 160 | + - **Kernel Support**: Linux kernel 5.4+ recommended |
| 161 | + |
| 162 | +#### Development Environment |
| 163 | +- **Compilers**: |
| 164 | + - **GCC**: 9.0+ (GCC 11+ recommended for C++17 features) |
| 165 | + - **Clang**: 10.0+ (Clang 14+ recommended) |
| 166 | + - **MSVC**: 2019+ (2022 17.10+ for CUDA 12.4+ support) |
| 167 | +- **Build Tools**: Make 4.0+, CMake 3.18+ (optional) |
| 168 | +- **Docker**: 20.10+ with GPU runtime support (nvidia-container-toolkit or ROCm containers) |
| 169 | + |
| 170 | +#### Additional Tools (Included in Docker) |
| 171 | +- **Profiling**: Nsight Compute, Nsight Systems (NVIDIA), rocprof (AMD) |
| 172 | +- **Debugging**: cuda-gdb, rocgdb, compute-sanitizer |
| 173 | +- **Libraries**: cuBLAS, cuFFT, rocBLAS, rocFFT (for advanced modules) |
| 174 | + |
| 175 | +### Performance Expectations by Hardware Tier |
| 176 | + |
| 177 | +| Hardware Tier | Example GPUs | VRAM | Expected Performance | Suitable Modules | |
| 178 | +|---------------|--------------|------|---------------------|------------------| |
| 179 | +| **Entry Level** | GTX 1060 6GB, RX 580 8GB | 6-8GB | 10-50x CPU speedup | Modules 1-3 | |
| 180 | +| **Mid-Range** | RTX 3060 Ti, RX 6700 XT | 12GB | 50-200x CPU speedup | Modules 1-6 | |
| 181 | +| **High-End** | RTX 4070 Ti, RX 7800 XT | 16GB | 100-500x CPU speedup | All modules | |
| 182 | +| **Professional** | RTX 4090, RX 7900 XTX | 24GB | 200-1000x+ CPU speedup | All modules + research | |
130 | 183 |
|
131 | 184 | ### Programming Knowledge |
132 | | -- **C/C++**: Intermediate level (pointers, memory management) |
133 | | -- **Command Line**: Basic terminal/shell usage |
134 | | -- **Math**: Linear algebra basics helpful but not required |
| 185 | +- **C/C++**: Intermediate level (pointers, memory management, basic templates) |
| 186 | +- **Parallel Programming**: Basic understanding of threads and synchronization helpful |
| 187 | +- **Command Line**: Comfortable with terminal/shell operations |
| 188 | +- **Mathematics**: Linear algebra and calculus basics beneficial for advanced modules |
| 189 | +- **Version Control**: Basic Git knowledge for contributing |
| 190 | + |
| 191 | +### Network Requirements (Docker Setup) |
| 192 | +- **Internet Connection**: Required for initial Docker image downloads (~8GB total) |
| 193 | +- **Bandwidth**: 50+ Mbps recommended for efficient container downloads |
| 194 | +- **Storage**: Additional 20GB for Docker images and build cache |
135 | 195 |
|
136 | 196 | ## 🐳 Docker Development |
137 | 197 |
|
|
0 commit comments