Skip to content

cuTile.jl-related crash in tileiras #17

@maleadt

Description

@maleadt

I have a Tile IR snippet that crashes in tileiras from CTK 13.2 when using -O1 or higher. cuda-tile-translate doesn't reveal anything, so the IR looks superficially well formed.

Julia MWE (using cuTile.jl#main):

function mwe_kernel(A::ct.TileArray{Float16, 2},
                    B::ct.TileArray{Float16, 2},
                    C::ct.TileArray{Float16, 2},
                    indices::ct.TileArray{Int32, 1},
                    TM::Int, TN::Int, TK::Int)
    bid = [ct.bid](https://ct.bid/)(1)
    row_indices = ct.gather(indices, ct.arange(TM))

    acc = zeros(Float32, TM, TN)
    num_k = cld(size(A, 2), Int32(TK))

    k = Int32(1)
    while k <= num_k
        col_indices = (k - Int32(1)) * Int32(TK) .+ ct.arange(TK)
        a = ct.gather(A, (reshape(row_indices, (TM, 1)),
                          reshape(col_indices, (1, TK))))
        b = ct.load(B; index=(k, bid), shape=(TK, TN),
                    padding_mode=[ct.PaddingMode.Zero](https://ct.paddingmode.zero/))
        acc = muladd(a, b, acc)
        k += Int32(1)
    end

    c_col_indices = (bid - Int32(1)) * Int32(TN) .+ ct.arange(TN)
    ct.scatter(C, (reshape(row_indices, (TM, 1)),
                   reshape(c_col_indices, (1, TN))),
               convert(ct.Tile{Float16}, acc))
    return nothing
end

M, K, N = 128, 512, 128
A = CUDA.rand(Float16, M, K)
B = CUDA.rand(Float16, K, N)
C = CUDA.zeros(Float16, M, N)
indices = CuArray(Int32.(1:128))

ct.launch(mwe_kernel, cld(N, 128), A, B, C, indices,
          ct.Constant(128), ct.Constant(128), ct.Constant(64))

Is there anything invalid in my IR? Comparing to the one generated by cuTile Python, there are 2 token iter_values joined outside of the loop, where Python has none, but changing our codegen to avoid that doesn't work around the issue.

I'll attach the full IR: mwe.zip
It crashes as follows:

❯ cuda-tile-translate --cudatilebc-to-mlir mwe.tile
cuda_tile.module @kernels {
    # works
}

❯ tileiras mwe.tile -o /tmp/mwe.cubin --gpu-name sm_120 -O1
error: failed to compile Tile IR program

❯ tileiras mwe.tile -o /tmp/mwe.cubin --gpu-name sm_120 -O0
# works

For future reference: are there ways to debug this on my end? Or better validate the IR I generate?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions