Skip to content

Commit 44429a9

Browse files
committed
Implement CUDA 12.9 architecture suffixes
Add support for CUDA 12.9's family-specific ('f') and architecture-specific ('a') suffixes to NvvmArch enum. These suffixes provide different PTX compatibility modes as specified by NVIDIA.
1 parent 5a70839 commit 44429a9

File tree

3 files changed

+596
-120
lines changed

3 files changed

+596
-120
lines changed

crates/cuda_builder/src/lib.rs

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -94,14 +94,26 @@ pub struct CudaBuilder {
9494
/// Maxwell (5.x) will be deprecated in CUDA 12 and we anticipate for that. Moreover,
9595
/// `6.x` contains support for things like f64 atomic add and half precision float ops.
9696
///
97-
/// ## Target Features for Conditional Compilation
97+
/// ## Architecture Suffixes (CUDA 12.9+)
98+
///
99+
/// Starting with CUDA 12.9, architectures can have suffixes:
100+
///
101+
/// - **No suffix** (e.g., `Compute70`): Forward-compatible across all future GPUs.
102+
/// Best for general compatibility.
103+
/// - **'f' suffix** (e.g., `Compute100f`): Family-specific features, forward-compatible
104+
/// within same major version (10.0, 10.3, etc.) but NOT across major versions.
105+
/// - **'a' suffix** (e.g., `Compute100a`): Architecture-specific features (mainly Tensor Cores).
106+
/// Code ONLY runs on that exact compute capability, no compatibility with any other GPU.
98107
///
99-
/// The chosen architecture enables a target feature that can be used for
100-
/// conditional compilation with `#[cfg(target_feature = "compute_XX")]`.
101-
/// This feature means "at least this capability", matching NVIDIA's semantics.
108+
/// Most applications should use base architectures (no suffix). Only use 'f' or 'a'
109+
/// if you need specific features and understand the compatibility trade-offs.
102110
///
103-
/// For other patterns (exact ranges, maximum capabilities), use boolean `cfg` logic.
104-
/// See the compute capabilities guide for examples.
111+
/// ## Target Features for Conditional Compilation
112+
///
113+
/// The chosen architecture enables target features for conditional compilation:
114+
/// - Base arch: `#[cfg(target_feature = "compute_70")]` - enabled on 7.0+
115+
/// - Family variant: `#[cfg(target_feature = "compute_100f")]` - enabled only on 10.x family
116+
/// - Arch variant: `#[cfg(target_feature = "compute_100a")]` - enabled only on exact 10.0
105117
///
106118
/// For example, with `.arch(NvvmArch::Compute61)`:
107119
/// ```ignore
@@ -110,6 +122,8 @@ pub struct CudaBuilder {
110122
/// // Code that requires compute capability 6.1+
111123
/// }
112124
/// ```
125+
///
126+
/// See: <https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/>
113127
pub arch: NvvmArch,
114128
/// Flush denormal values to zero when performing single-precision floating point operations.
115129
/// `false` by default.

0 commit comments

Comments
 (0)