Releases: NVIDIA/cuda-tile
Releases · NVIDIA/cuda-tile
v13.2.0: [Release] CUDA Tile IR 13.2.0
This release is aligned with the CUDA Tile IR specification included in CUDA Toolkit 13.2. New in CUDA Tile 13.2 open-source: * Extended architecture support: CUDA Tile now supports compute capability 8.X (Ampere, Ada) in addition to 10.X, 11.X, and 12.X (Blackwell) architectures. * New atan2 math operation. * Added overflow attribute to cuda_tile.negi to control integer overflow behavior. * Added rounding_mode attribute to cuda_tile.tanh to control floating-pointrounding behavior. * Added token result to cuda_tile.print_tko for memory ordering support. * Added unsignedCmp flag to cuda_tile.for to support unsigned integer comparison for loop termination. * Renamed cuda_tile.print to cuda_tile.print_tko in the textual format. Bytecode encoding is unchanged and remains backward compatible. * Bytecode version 13.2 with explicit type tag versioning for improved forward and backward compatibility. For more information: * CUDA Tile IR Spec 13.2: https://docs.nvidia.com/cuda/tile-ir/13.2/index.html * CUDA 13.2 Blog: https://developer.nvidia.com/blog/cuda-13-2-introduces-enhanced-cuda-tile-support-and-new-python-features
v13.1.8: [LLVM-FIX] Breaking commit 13c00cbc2aa2 - 2026-03-13
Fix breakage caused by LLVM commit:
commit 13c00cbc2aa2ddc9aae2e72b02bc6cb2a482e0e7
Author: Matthias Springer <me@m-sp.org>
Date: Fri Mar 13 17:27:23 2026 +0100
[mlir][IR] Rename `DenseIntOrFPElementsAttr` to `DenseTypedElementsAttr` (#185687)
v13.1.7: [LLVM-FIX] Breaking commit 5a4a5db4776f - 2026-02-24
Fix breakage caused by LLVM commit:
commit 5a4a5db4776f50a43624392eb2e18863504e5372
Author: Mehdi Amini <joker.eph@gmail.com>
Date: Tue Feb 24 17:50:38 2026 +0100
[MLIR] Remove deprecated setting usePropertiesForAttributes (#182327)
v13.1.6 - [Release] CUDA Tile IR 13.1.6
Breaking Changes:
- Remove StringType (cuda_tile.string) and all associated bytecode support.
- Remove cuda_tile_utils.py (mutex_synchronize, printf_sync_tile).
Python Bindings:
- Decouple from internal dependencies. Define self-contained element type wrappers (Int8, Int32, Float16, Float32, etc.) directly in cuda_tile_ops.py. All public APIs accept both wrappers and raw MLIR types. Fixes #4.
Test Infrastructure:
- Build and install
nottool when testing is enabled. Fixes #2. - Add Python lit test support with %PYTHON substitution and ASAN preloading on Linux. Fixes #3.
- Only install LLVM test tools when CUDA_TILE_ENABLE_TESTING is on.
Documentation:
- Clarify cuLaunchKernel grid/block dim semantics in README. Fixes #7.
- Add versioning policy section to README.
Code Quality:
- Add .clang-format (LLVM-based style).
- Migrate to
Op::create(builder, ...)API across bytecode reader, dialect, and transforms.
v13.1.5
[LLVM-FIX] Breaking commit b82c7fc65229 - 2026-02-20
Fix breakage caused by LLVM commit:
commit b82c7fc65229c8b2b6a964f023f6ec59b3cf9210
Author: Alexis Engelke engelke@in.tum.de
Date: Fri Feb 20 12:07:18 2026 +0100
[CMake][LLVM] Add PCH infrastructure and LLVMSupport PCH (#176420)
v13.1.4
[LLVM-FIX] Breaking commit 34eb59dd4bb2 - 2026-02-11
Fix breakage caused by LLVM commit:
commit 34eb59dd4bb26cab248cc3a29b57b8dbe8d46849
Author: Matthias Springer me@m-sp.org
Date: Wed Feb 11 17:59:20 2026 +0100
[mlir][IR][NFC] Simplify "splat" handling in `DenseIntOrFPElementsAttr` (#180965)
v13.1.3
[LLVM-FIX] Breaking commit 3d7018c70 - 2025-12-19
Fix breakage caused by LLVM commit:
commit 3d7018c70b97e6a3d6dfe08e9f11dede96242d1f
Author: Maksim Levental <maksim.levental@gmail.com>
Date: Fri Dec 19 12:51:22 2025 -0500
[MLIR][Python] remove pybind11 support (#172581)
v13.1.2
v13.1.1
[LLVM-FIX] Breaking commit cfbb4cc3121 - 2025-10-31
Fix breakage caused by LLVM commit:
commit cfbb4cc31215d615f605466aef0bcfb42aa9faa5
Author: Kazu Hirata <kazu@google.com>
Date: Fri Oct 31 09:42:07 2025 -0700
[ADT] Remove ArrayRef(std::nullopt_t) (#165831)
v13.1.0 - Initial release
[Release] CUDA Tile IR 13.1 - Initial open-source release CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for tile-based GPU programming targeting NVIDIA Tensor Cores. This release is aligned with the CUDA Tile IR specification included in CUDA Toolkit 13.1. Core components: * CUDA Tile Dialect: MLIR dialect for tile-based computations * Python Bindings: Python API for IR construction and manipulation * Bytecode: Binary serialization/deserialization support * Conformance Test Suite: Specification compliance tests For more information: * NVIDIA Developer: https://developer.nvidia.com/cuda/tile * CUDA Tile IR Spec 13.1: https://docs.nvidia.com/cuda/tile-ir/13.1/index.html