Hi all,
I've been working on a project that uses Triton as a frontend for Tile IR, and I managed to get a full end-to-end path working inside torch.compile (PyTorch ops → Triton kernels → Tile IR). The whole-network compilation is running fine now.
I'd like to use this as a base for some research — either on the performance side or more academically focused on Tile IR itself — but honestly I'm a bit lost on where to start.
So I figured I'd just ask here:
- Are there any known pain points or open problems in Tile IR that are worth looking into?
- Any directions you'd find interesting or impactful (scheduling, memory analysis, op fusion, etc.)?
- Any benchmarks you'd recommend for measuring improvements at the Tile IR level?
Any thoughts welcome, thanks!