-
Notifications
You must be signed in to change notification settings - Fork 69
cuTile.jl-related crash in tileiras #17
Copy link
Copy link
Open
Description
I have a Tile IR snippet that crashes in tileiras from CTK 13.2 when using -O1 or higher. cuda-tile-translate doesn't reveal anything, so the IR looks superficially well formed.
Julia MWE (using cuTile.jl#main):
function mwe_kernel(A::ct.TileArray{Float16, 2},
B::ct.TileArray{Float16, 2},
C::ct.TileArray{Float16, 2},
indices::ct.TileArray{Int32, 1},
TM::Int, TN::Int, TK::Int)
bid = [ct.bid](https://ct.bid/)(1)
row_indices = ct.gather(indices, ct.arange(TM))
acc = zeros(Float32, TM, TN)
num_k = cld(size(A, 2), Int32(TK))
k = Int32(1)
while k <= num_k
col_indices = (k - Int32(1)) * Int32(TK) .+ ct.arange(TK)
a = ct.gather(A, (reshape(row_indices, (TM, 1)),
reshape(col_indices, (1, TK))))
b = ct.load(B; index=(k, bid), shape=(TK, TN),
padding_mode=[ct.PaddingMode.Zero](https://ct.paddingmode.zero/))
acc = muladd(a, b, acc)
k += Int32(1)
end
c_col_indices = (bid - Int32(1)) * Int32(TN) .+ ct.arange(TN)
ct.scatter(C, (reshape(row_indices, (TM, 1)),
reshape(c_col_indices, (1, TN))),
convert(ct.Tile{Float16}, acc))
return nothing
end
M, K, N = 128, 512, 128
A = CUDA.rand(Float16, M, K)
B = CUDA.rand(Float16, K, N)
C = CUDA.zeros(Float16, M, N)
indices = CuArray(Int32.(1:128))
ct.launch(mwe_kernel, cld(N, 128), A, B, C, indices,
ct.Constant(128), ct.Constant(128), ct.Constant(64))Is there anything invalid in my IR? Comparing to the one generated by cuTile Python, there are 2 token iter_values joined outside of the loop, where Python has none, but changing our codegen to avoid that doesn't work around the issue.
I'll attach the full IR: mwe.zip
It crashes as follows:
❯ cuda-tile-translate --cudatilebc-to-mlir mwe.tile
cuda_tile.module @kernels {
# works
}
❯ tileiras mwe.tile -o /tmp/mwe.cubin --gpu-name sm_120 -O1
error: failed to compile Tile IR program
❯ tileiras mwe.tile -o /tmp/mwe.cubin --gpu-name sm_120 -O0
# worksFor future reference: are there ways to debug this on my end? Or better validate the IR I generate?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels