Skip to content

Commit 4ce87e0

Browse files
authored
Merge pull request #8 from AIComputing101/coketaste/docker-update
Update docker image of CUDA
2 parents 3813568 + 2146721 commit 4ce87e0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+467
-140
lines changed

CONTRIBUTING.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ docker-compose up -d cuda-dev # For NVIDIA GPUs
4141
docker-compose up -d rocm-dev # For AMD GPUs
4242

4343
# Option 2: Native development
44-
# Install CUDA Toolkit 12.9.1+ or ROCm latest
44+
# Install CUDA Toolkit 13.0.1+ or ROCm latest
4545
# See modules/module1/README.md for detailed setup instructions
4646

4747
# Build all examples
@@ -242,7 +242,7 @@ When reporting bugs, please include:
242242
- **Operating System**: (Ubuntu 22.04, Windows 11, etc.)
243243
- **GPU**: (RTX 4090, RX 7900 XTX, etc.)
244244
- **Driver Version**: (NVIDIA 535.x, ROCm latest, etc.)
245-
- **CUDA/HIP Version**: (12.9.1, 7.0, etc.)
245+
- **CUDA/HIP Version**: (13.0.1, 7.0.1, etc.)
246246
- **Docker**: (if using containerized development)
247247

248248
### Bug Description
@@ -258,7 +258,7 @@ When reporting bugs, please include:
258258
**Environment:**
259259
- OS: Ubuntu 22.04
260260
- GPU: RTX 4080
261-
- CUDA: 12.9.1
261+
- CUDA: 13.0.1
262262
- Driver: 535.98
263263

264264
**Description:**
@@ -313,4 +313,12 @@ By contributing, you agree that your contributions will be licensed under the MI
313313

314314
---
315315

316-
Thank you for contributing to GPU Programming 101! Your efforts help make GPU computing more accessible to developers worldwide. 🚀
316+
Thank you for contributing to GPU Programming 101! Your efforts help make GPU computing more accessible to developers worldwide. 🚀
317+
318+
## 🧩 Maintaining feature docs
319+
320+
If you update examples or module content to use new CUDA or ROCm capabilities, please also:
321+
322+
- Bump the versions in `CUDA_ROCM_FEATURES.md` and re‑scan the official release notes.
323+
- Update module READMEs to mention any new minimum driver/toolkit requirements.
324+
- Avoid marketing claims; prefer links to vendor docs and measured results in our own benchmarks.

CUDA_ROCM_FEATURES.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# CUDA and ROCm Feature Guide (Living Document)
2+
3+
Last updated: 2025-09-22
4+
5+
This guide summarizes current, officially documented features of NVIDIA CUDA and AMD ROCm that we leverage across this project. It is designed to be easy to maintain as new versions ship. Where possible, we link to authoritative sources instead of restating volatile details.
6+
7+
Tip: Prefer the linked release notes and programming guides for exact, version-specific behavior. Update checklist is at the end of this document.
8+
9+
---
10+
11+
## Current Versions at a Glance
12+
13+
- CUDA: 13.0 Update 1 (13.0.U1)
14+
- Source of truth: NVIDIA CUDA Toolkit Release Notes
15+
- Driver requirement overview: CUDA Compatibility Guide for Drivers
16+
- ROCm: 7.0.1
17+
- Source of truth: ROCm Release History and ROCm docs index
18+
19+
Reference links are provided at the bottom for maintenance.
20+
21+
---
22+
23+
## CUDA 13.x overview
24+
25+
Highlights pulled from NVIDIA’s official docs (see links):
26+
27+
- General platform
28+
- CUDA 13.x is ABI-stable within the major series; requires r580+ driver on Linux.
29+
- Increased MPS server client limits on Ampere and newer architectures (subject to architectural limits).
30+
- Compiler and runtime
31+
- NVCC/NVRTC updates; PTX ISA updates (see PTX 9.0 notes in release docs).
32+
- Programmatic Dependent Launch (PDL) support in select library kernels on sm_90+.
33+
- Developer tools
34+
- Nsight Systems and Nsight Compute continue as the primary profilers.
35+
- Compute Sanitizer updates; Visual Profiler and nvprof are removed in 13.0.
36+
- Deprecations and removals
37+
- Dropped offline compilation/library support for pre-Turing architectures (Maxwell, Pascal, Volta) in CUDA 13.0. Continue to use 12.x to target these.
38+
- Windows Toolkit no longer bundles a display driver (install separately).
39+
- Removed multi-device cooperative group launch APIs; several legacy headers removed.
40+
41+
Architectures and typical use cases (non-exhaustive):
42+
43+
- Blackwell/Blackwell Ultra (SM110+): next‑gen AI/HPC; FP4/FP8 workflows via libraries.
44+
- Hopper (H100/H200, SM90): transformer engine, thread block clusters, DPX; AI training/HPC.
45+
- Ada (RTX 40): workstation/development; AV1 encode; content creation/AI dev.
46+
- Ampere (A100/RTX 30): MIG, 3rd‑gen tensor cores; research/mixed workloads.
47+
48+
Core libraries snapshot (examples; see library release notes for specifics):
49+
50+
- cuBLAS/cuBLASLt: autotuning options; improvements on newer architectures; mixed precision and block‑scaled formats.
51+
- cuFFT: new error codes; performance changes; dropped pre‑Turing support.
52+
- cuSPARSE: generic API enhancements; 64‑bit indices in SpGEMM; various bug fixes.
53+
- Math/NPP/nvJPEG: targeted perf/accuracy improvements and API cleanups.
54+
55+
Authoritative references:
56+
57+
- CUDA Toolkit Release Notes (13.0 U1)
58+
- CUDA Compatibility Guide for Drivers
59+
- Nsight Systems Release Notes; Nsight Compute Release Notes
60+
- CUDA C++ Programming Guide changelog
61+
62+
---
63+
64+
## ROCm 7.0.x overview
65+
66+
Highlights from AMD’s official docs (see links):
67+
68+
- ROCm 7.0.1 is the latest as of 2025‑09‑17; consult the release history for point updates.
69+
- HIP as the primary programming model, with CUDA‑like APIs and HIP‑Clang toolchain.
70+
- Windows support targets HIP SDK for development; full ROCm stack targets Linux.
71+
- ROCm Libraries monorepo: multiple core math and support libraries are consolidated in the ROCm Libraries monorepo for unified CI/build. Projects included (as of rocm‑7.0.1): composablekernel, hipblas, hipblas-common, hipblaslt, hipcub, hipfft, hiprand, hipsolver, hipsparse, hipsparselt, miopen, rocblas, rocfft, rocprim, rocrand, rocsolver, rocsparse, rocthrust. Shared components: rocroller, tensile, mxdatagenerator. Most of these are marked “Completed” in the monorepo migration status and the monorepo is the source of truth; see its README for current status.
72+
- Tooling and system components: ROCr runtime, ROCm SMI, rocprof/rocprofiler, rocgdb/rocm‑debug‑agent.
73+
74+
Nomenclature: project names in the monorepo are standardized to match released package names (for example, hipblas/hipfft/rocsparse instead of mixed casing).
75+
76+
Architectures (illustrative, not exhaustive):
77+
78+
- CDNA3 (MI300 family): AI training and HPC; unified memory on APUs (MI300A), large HBM configs (MI300X).
79+
- RDNA3 (Radeon 7000 series): workstation/gaming; AV1 encode/decode; hardware ray tracing.
80+
81+
Common libraries (see ROCm Libraries reference and monorepo):
82+
83+
- BLAS/solver/sparse: rocBLAS / hipBLAS, hipBLASLt, rocSOLVER / hipSOLVER, rocSPARSE / hipSPARSE, hipSPARSElt.
84+
- FFT/random/core: rocFFT / hipFFT, rocRAND / hipRAND, rocPRIM / hipCUB, rocThrust.
85+
- Kernel building blocks: composablekernel; shared dependencies like Tensile and rocRoller (used by rocBLAS/hipBLASLt).
86+
- ML/DL: MIOpen; framework integrations via the ROCm for AI guide.
87+
88+
Authoritative references:
89+
90+
- ROCm Docs index (What is ROCm?, install, reference)
91+
- ROCm Release History (7.0.1, 7.0.0, …)
92+
- ROCm libraries reference; tools/compilers/runtimes reference
93+
- ROCm Libraries monorepo (status, structure, releases): https://github.com/ROCm/rocm-libraries
94+
95+
---
96+
97+
## Cross‑platform mapping (CUDA ⇄ HIP)
98+
99+
Quick mapping for common concepts. Always check specific APIs for support and behavior differences.
100+
101+
- Kernel launch
102+
- CUDA: <<<grid, block, shared, stream>>>; HIP: hipLaunchKernelGGL
103+
- Memory management
104+
- CUDA: cudaMalloc/cudaMemcpy/etc.; HIP: hipMalloc/hipMemcpy/etc.
105+
- Streams and events
106+
- CUDA: cudaStream_t/cudaEvent_t; HIP: hipStream_t/hipEvent_t
107+
- Graphs
108+
- CUDA: cudaGraph_t and Graph Exec; HIP: hipGraph_t and equivalents; feature coverage evolves, verify against ROCm docs.
109+
- Cooperative groups
110+
- CUDA: cooperative_groups; HIP: HIP cooperative groups header; multi‑device variants differ (and some CUDA multi‑device APIs removed in 13.0).
111+
- Libraries
112+
- cuBLAS ↔ hipBLAS/rocBLAS; cuFFT ↔ hipFFT/rocFFT; cuSPARSE ↔ hipSPARSE/rocSPARSE; Thrust/CUB ↔ rocThrust/hipCUB/rocPRIM.
113+
114+
Porting aids:
115+
116+
- hipify (perl/python) for source translation; hip‑clang for compilation.
117+
118+
---
119+
120+
## Compatibility and supported platforms
121+
122+
- CUDA drivers and OS
123+
- See the CUDA Compatibility Guide for minimum driver versions by toolkit series (e.g., 13.x requires r580+ on Linux). Windows driver no longer bundled starting with 13.0.
124+
- CUDA architectures
125+
- 13.0 drops offline compilation/library support for Maxwell/Pascal/Volta; continue to use 12.x for those targets.
126+
- ROCm OS/GPU support
127+
- See ROCm install guides and GPU/accelerator support references for Linux and Windows HIP SDK system requirements.
128+
129+
---
130+
131+
## Educational integration (this repository)
132+
133+
This course demonstrates both CUDA and HIP across modules. Key tool updates to note:
134+
135+
- Profiling and analysis
136+
- NVIDIA: Nsight Systems, Nsight Compute, CUPTI changes in 13.x, Compute Sanitizer
137+
- AMD: rocprof/rocprofiler, ROCm SMI
138+
- Memory and graphs
139+
- CUDA: CUDA Graphs; memory pools and VMM; asynchronous copy
140+
- ROCm: HIP graph APIs (coverage evolves); ROCr runtime memory features
141+
142+
Example module alignment (indicative; see each module’s README for details):
143+
144+
- Module 1: Runtime APIs, device queries, build/tooling
145+
- Module 2: Memory management (device, pinned, unified/coherent where available)
146+
- Module 3: Synchronization and cooperation (warp/wavefront‑level, cooperative groups)
147+
- Module 4: Streams, events, graphs, and multi‑GPU basics
148+
- Module 5: Profiling and debugging (Nsight Tools, Compute Sanitizer, rocprof, rocm‑smi)
149+
- Module 6+: Libraries (BLAS/FFT/SPARSE) and domain examples (AI/HPC)
150+
151+
### New features by module (CUDA 13.x and ROCm 7.0.x)
152+
153+
| Module | CUDA (what you’ll learn) | ROCm/HIP (what you’ll learn) |
154+
|---|---|---|
155+
| Module 1: Getting Started | Toolchain (nvcc), project layout, kernel launch basics (grid/block/thread indexing), device vs host code, cudaMalloc/cudaMemcpy, device query and error handling | Toolchain (hipcc/hip-clang), hipLaunchKernelGGL, hipMalloc/hipMemcpy, hipGetDeviceProperties, mapping CUDA concepts to HIP |
156+
| Module 2: Memory & Data Movement | Global/shared/constant/texture memory usage, coalesced access, pinned memory, unified memory and prefetch, async copies and measuring bandwidth | HIP memory APIs and ROCr memory model, pinned host buffers, unified/coherent memory notes, async transfers, using rocm-smi/rocprof to observe bandwidth |
157+
| Module 3: Parallel Patterns & Sync | Reductions, scans, sorting; warp-level primitives; cooperative groups; shared memory tiling; atomics and barriers; occupancy considerations | rocPRIM/hipCUB/rocThrust equivalents; wavefront-level ops; HIP cooperative groups; LDS usage; atomics and synchronization semantics |
158+
| Module 4: Concurrency, Streams & Multi‑GPU | Streams/events, priorities, CUDA Graphs (capture/instantiate/launch), peer-to-peer (UVA/P2P), basic multi‑GPU patterns | hipStream/hipEvent, HIP Graph API coverage and usage, peer access where supported, multi‑GPU fundamentals with ROCm tools |
159+
| Module 5: Profiling, Debugging & Sanitizers | Nsight Systems (timeline/tracing), Nsight Compute (kernel analysis), Compute Sanitizer (racecheck/memcheck), intro to CUPTI-based profiling | rocprof/rocprofiler for traces and metrics, rocm-smi telemetry, rocgdb/ROCm Debug Agent basics, best practices for profiling |
160+
| Module 6: Math & Core Libraries | cuBLAS/cuBLASLt (GEMM, batched ops, mixed precision), cuFFT, cuSPARSE, Thrust/CUB algorithms, choosing/tuning library routines | rocBLAS/hipBLAS, rocFFT/hipFFT, rocSPARSE/hipSPARSE, rocThrust/hipCUB/rocPRIM; Tensile-backed tuning in rocBLAS; API parity tips |
161+
| Module 7: Advanced Algorithms & Optimization | Tiling and cache use, shared memory bank conflicts, cooperative groups for complex patterns, intro to memory pools/VMM, kernel fusion patterns | Wavefront-aware tuning, LDS patterns, rocPRIM building blocks, HIP-specific perf tips, memory behavior across devices |
162+
| Module 8: AI/ML Workflows | cuDNN basics, TensorRT concepts (dynamic shapes/precision), mixed precision (FP16/BF16/FP8 via libs), graphs for inference pipelines | MIOpen basics, framework setup on ROCm (PyTorch/TF where supported), MIGraphX or framework runtimes, mixed precision support |
163+
| Module 9: Packaging, Deployment & Containers | CUDA containers (base/runtime-devel), driver/runtime compatibility, minimal deployment artifacts, reproducible builds | ROCm container bases (rocm/dev), runtime setup (kernel modules, groups/permissions), compatibility guidance and reproducibility |
164+
165+
---
166+
167+
## Maintenance: how to update this document
168+
169+
When CUDA or ROCm releases a new version, follow this checklist:
170+
171+
1) Update versions at the top
172+
- CUDA: consult CUDA Toolkit Release Notes page; record the latest major.minor (e.g., 13.0 Update 1) and driver requirements.
173+
- ROCm: consult ROCm Release History; record latest (e.g., 7.0.1).
174+
2) Scan notable changes
175+
- CUDA: skim “New Features”, “Deprecated or Dropped Features”, and library sections (cuBLAS/cuFFT/…); note any course‑impacting changes.
176+
- ROCm: skim “What is ROCm?”, “ROCm libraries”, and “Tools/Compilers/Runtimes” sections for new features or renamed packages.
177+
3) Verify cross‑platform notes
178+
- Confirm HIP Graph API coverage and any caveats; update mapping if needed.
179+
4) Update references
180+
- Keep the link reference list (below) current; avoid copying long tables—link out to authoritative docs.
181+
5) Record the date in “Last updated”.
182+
183+
Tip: Avoid claiming specific percentage speedups unless you include a citation. Prefer phrasing like “performance improvements in X; see release notes.”
184+
185+
---
186+
187+
## Reference links (authoritative sources)
188+
189+
- NVIDIA
190+
- CUDA Toolkit Release Notes: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
191+
- CUDA Compatibility Guide (drivers): https://docs.nvidia.com/deploy/cuda-compatibility/index.html
192+
- CUDA C++ Programming Guide (changelog): https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#changelog
193+
- Nsight Systems Release Notes: https://docs.nvidia.com/nsight-systems/ReleaseNotes/index.html
194+
- Nsight Compute Release Notes: https://docs.nvidia.com/nsight-compute/ReleaseNotes/index.html
195+
- AMD
196+
- ROCm docs index: https://rocm.docs.amd.com/en/latest/index.html
197+
- ROCm release history: https://rocm.docs.amd.com/en/latest/release/versions.html
198+
- ROCm libraries reference: https://rocm.docs.amd.com/en/latest/reference/api-libraries.html
199+
- ROCm tools/compilers/runtimes: https://rocm.docs.amd.com/en/latest/reference/rocm-tools.html
200+
- HIP documentation: https://rocm.docs.amd.com/projects/HIP/en/latest/index.html

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# GPU Programming 101 🚀
22

33
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4-
[![CUDA](https://img.shields.io/badge/CUDA-12.9.1-76B900?logo=nvidia)](https://developer.nvidia.com/cuda-toolkit)
4+
[![CUDA](https://img.shields.io/badge/CUDA-13.0.1-76B900?logo=nvidia)](https://developer.nvidia.com/cuda-toolkit)
55
[![ROCm](https://img.shields.io/badge/ROCm-7.0-red?logo=amd)](https://rocmdocs.amd.com/)
66
[![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker)](https://www.docker.com/)
77
[![Examples](https://img.shields.io/badge/Examples-71-green)](modules/)
@@ -197,7 +197,7 @@ This architectural knowledge is essential for writing efficient GPU code and is
197197
|---------|-------------|
198198
| 🎯 **Complete Curriculum** | 9 progressive modules from basics to advanced topics |
199199
| 💻 **Cross-Platform** | Full CUDA and HIP support for NVIDIA and AMD GPUs |
200-
| 🐳 **Docker Ready** | Complete containerized development environment with CUDA 12.9.1 & ROCm 7.0 |
200+
| 🐳 **Docker Ready** | Complete containerized development environment with CUDA 13.0.1 & ROCm 7.0.1 |
201201
| 🔧 **Professional Quality** | Professional build systems, auto-detection, testing, and profiling |
202202
| 📊 **Performance Focus** | Optimization techniques and benchmarking throughout |
203203
| 🌐 **Community Driven** | Open source with comprehensive contribution guidelines |
@@ -319,7 +319,7 @@ Module 5: Performance Tuning
319319
- **macOS**: macOS 12+ (Metal Performance Shaders for basic GPU compute)
320320

321321
#### GPU Computing Platforms
322-
- **CUDA Toolkit**: 12.0+ (Docker uses CUDA 12.9.1)
322+
- **CUDA Toolkit**: 13.0+ recommended (Docker uses CUDA 13.0.1)
323323
- **Driver Requirements**:
324324
- Linux: 550.54.14+ for CUDA 12.4+
325325
- Windows: 551.61+ for CUDA 12.4+
@@ -385,7 +385,7 @@ Experience the full development environment with zero setup:
385385
- 🧹 Easy cleanup when done
386386

387387
**Container Specifications:**
388-
- **CUDA**: NVIDIA CUDA 12.9.1 on Ubuntu 22.04
388+
- **CUDA**: NVIDIA CUDA 13.0.1 on Ubuntu 24.04
389389
- **ROCm**: AMD ROCm 7.0 on Ubuntu 24.04
390390
- **Libraries**: Professional toolchains with debugging support
391391

docker/README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@ This directory contains Docker configurations for comprehensive GPU programming
44

55
## 🚀 Latest Versions (2025)
66

7-
- **CUDA**: 12.9.1 (Latest stable release)
8-
- **ROCm**: 7.0 (Latest stable release)
9-
- **Ubuntu**: 22.04 LTS
10-
- **Nsight Tools**: 2025.1.1
7+
- **CUDA**: 13.0.1 (Toolkit 13.0 U1)
8+
- **ROCm**: 7.0.1 (Latest stable release)
9+
- **Ubuntu**: 24.04 LTS
10+
- **Nsight Tools**: 2025.3.x
1111

1212
## 🚀 Quick Start
1313

@@ -58,27 +58,27 @@ docker/
5858

5959
### CUDA Development Container
6060
**Image**: `gpu-programming-101:cuda`
61-
**Base**: `nvidia/cuda:12.9.1-devel-ubuntu22.04`
61+
**Base**: `nvidia/cuda:13.0.1-cudnn-devel-ubuntu24.04`
6262

6363
**Features**:
64-
- CUDA 12.9.1 with development tools
64+
- CUDA 13.0.1 (cuDNN devel) with development tools
6565
- NVIDIA Nsight Systems & Compute profilers
6666
- Python 3 with scientific libraries
6767
- GPU monitoring and debugging tools
6868

6969
**GPU Requirements**:
70-
- NVIDIA GPU with compute capability 3.5+
71-
- NVIDIA drivers 535+
70+
- NVIDIA GPU supported by CUDA 13.x (Turing and newer recommended for new toolchain features)
71+
- NVIDIA drivers r580+
7272
- nvidia-container-toolkit
7373

7474
### ROCm Development Container
7575
**Image**: `gpu-programming-101:rocm`
76-
**Base**: `rocm/dev-ubuntu-22.04:7.0-complete`
76+
**Base**: `rocm/dev-ubuntu-24.04:7.0.1-complete`
7777

7878
**Features**:
7979
- ROCm 7.0 with HIP development environment
8080
- Cross-platform GPU programming (AMD/NVIDIA)
81-
- ROCm profiling tools (rocprof, roctracer)
81+
- ROCm profiling tools (rocprof, rocprofiler)
8282
- Python 3 with scientific libraries
8383

8484
**GPU Requirements**:
@@ -282,7 +282,7 @@ nvidia-smi # For NVIDIA
282282
rocm-smi # For AMD
283283

284284
# Verify Docker GPU support
285-
docker run --rm --gpus all nvidia/cuda:12.9.1-base nvidia-smi
285+
docker run --rm --gpus all nvidia/cuda:13.0.1-base-ubuntu24.04 nvidia-smi
286286

287287
# Check container runtime
288288
docker run --rm --device=/dev/kfd rocm/dev-ubuntu-22.04:7.0 rocminfo
@@ -297,8 +297,8 @@ docker system prune -a
297297
sudo apt update && sudo apt upgrade docker-ce docker-compose
298298

299299
# Check base image availability
300-
docker pull nvidia/cuda:12.9.1-devel-ubuntu22.04
301-
docker pull rocm/dev-ubuntu-22.04:7.0-complete
300+
docker pull nvidia/cuda:13.0.1-cudnn-devel-ubuntu24.04
301+
docker pull rocm/dev-ubuntu-24.04:7.0.1-complete
302302
```
303303

304304
**"Permission denied errors"**

0 commit comments

Comments
 (0)