In huawei-csl/pto-dsl#12 I made an arbitrary-dynamic-shape kernel example by masking-out out-of-bound tiles and elements.
However, I had to adjust valid_col inside the for-loop:
for i in scf.for_(c0, tiles_to_process, c1):
...
# change`c_tile_actual` according to remaining elements
tb0 = pto.AllocTileOp(
tile_buf_dynamic, valid_row=c1, valid_col=c_tile_actual
).result
Because AllocTileOp is lowered to new Tile allocation + TASSIGN, VEC UB space is wasted, not reused across iterations.
Note that buffer-reuse is doable if I don't need to adjust valid_shape, ref this less-flexible version of dynamic shape kernel: huawei-csl/pto-dsl#11. But supporting arbitrary shape is very important for practical use cases, such as for dynamic batch dim and sequence dim.
Proposed solution
Provide a set_valid_shape API to modify the valid_shape attribute of an already-allocated tile?