Skip to content

onurbingol/cuda-tile

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cutile-basic

cutile-basic

A BASIC to CUDA Tile IR compiler. Write GPU kernels in BASIC, compile them to .cubin files via cuTile bytecode and tileiras, and launch them on NVIDIA GPUs.

Documentation

Overview

cutile-basic extends classic BASIC with tile-based GPU operations (TILE, OUTPUT, BID) and built-in functions like MMA, enabling concise expression of GPU kernels such as vector addition and matrix multiplication.

Installation

pip install git+https://github.com/nvidia/cuda-tile.git@basic-experimental

Quick Start

Clone the repository and check out the basic-experimental branch:

git clone https://github.com/nvidia/cuda-tile.git
cd cuda-tile
git checkout basic-experimental

Compile a BASIC program to a .cubin:

python -m cutile_basic.cli examples/vector_add.bas -o vector_add.cubin

Run an end-to-end GPU demo:

python examples/vector_add.py

Or use the Python API:

from cutile_basic import compile_basic_to_cubin

source = """
10 INPUT N, A(), B()
20 DIM A(N), B(N), C(N)
30 TILE A(128), B(128), C(128)
40 LET C(BID) = A(BID) + B(BID)
50 OUTPUT C
60 END
"""

result = compile_basic_to_cubin(source)
print(result.cubin_path)   # path to the compiled .cubin
print(result.meta)         # kernel metadata (arrays, tile shapes, etc.)

Examples

Program Description
examples/hello.bas Variables, arithmetic, conditionals, loops
examples/vector_add.bas GPU vector addition using BID
examples/gemm.bas Tiled GPU matrix multiply (MMA)

End-to-end GPU demos:

python examples/vector_add.py
python examples/gemm.py
python examples/hello.py

Prerequisites

Hardware:

  • NVIDIA GPU with Compute Capability 8.x (Ampere), 10.x, 11.x, or 12.x (Blackwell)

Software:

  • NVIDIA Driver r580 or later
  • Python 3.10 or later
  • CUDA Toolkit 13.1 or later
  • cuda-tile[tileiras], cuda-python, cuda-core[cu13], cupy-cuda13x[ctk]

License

Apache License 2.0 with LLVM Exceptions. See LICENSE for details.

About

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA tensor core units.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • C++ 45.5%
  • MLIR 42.8%
  • Python 9.1%
  • CMake 2.5%
  • Other 0.1%