Skip to content

How to reuse buffer space for pto.AllocTileOp with varying valid_row/valid_col ? #111

@learning-chip

Description

@learning-chip

In huawei-csl/pto-dsl#12 I made an arbitrary-dynamic-shape kernel example by masking-out out-of-bound tiles and elements.

However, I had to adjust valid_col inside the for-loop:

for i in scf.for_(c0, tiles_to_process, c1):
    ...
    # change`c_tile_actual` according to remaining elements
    tb0 = pto.AllocTileOp(
        tile_buf_dynamic, valid_row=c1, valid_col=c_tile_actual
    ).result

Because AllocTileOp is lowered to new Tile allocation + TASSIGN, VEC UB space is wasted, not reused across iterations.

Note that buffer-reuse is doable if I don't need to adjust valid_shape, ref this less-flexible version of dynamic shape kernel: huawei-csl/pto-dsl#11. But supporting arbitrary shape is very important for practical use cases, such as for dynamic batch dim and sequence dim.

Proposed solution

Provide a set_valid_shape API to modify the valid_shape attribute of an already-allocated tile?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions