Skip to content

Commit 01569aa

Browse files
committed
[AMDGPU] Add TDM Descriptor Optimization Pass
AMDGPU code generation frequently creates tensor and address descriptors using sequences of insertelement instructions. When these patterns have significant field reuse or cross-iteration dependencies, the current approach generates inefficient REG_SEQUENCE operations in MIR, instead of INSERT_SUBREG chains, leading to suboptimal register allocation. This PR introduces the AMDGPUTDMOptimization pass that: - Pattern Detection: Identifies descriptor creation chains with reusable constant fields - Grouping: Groups similar patterns to maximize optimization opportunities - Transformation: Converts insertelement chains to alloca + field updates in address space 5 - Integration: Works with SROA to generate INSERT_SUBREG operations for optimal register allocation
1 parent e3ef26d commit 01569aa

File tree

5 files changed

+1015
-0
lines changed

5 files changed

+1015
-0
lines changed

llvm/lib/Target/AMDGPU/AMDGPU.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ FunctionPass *createAMDGPUCodeGenPreparePass();
5959
FunctionPass *createAMDGPULateCodeGenPrepareLegacyPass();
6060
FunctionPass *createAMDGPUReserveWWMRegsPass();
6161
FunctionPass *createAMDGPURewriteOutArgumentsPass();
62+
FunctionPass *createAMDGPUTDMOptimizationPass();
6263
ModulePass *
6364
createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM = nullptr);
6465
ModulePass *createAMDGPULowerBufferFatPointersPass();
@@ -170,6 +171,8 @@ extern char &AMDGPUPrepareAGPRAllocLegacyID;
170171
void initializeAMDGPUReserveWWMRegsLegacyPass(PassRegistry &);
171172
extern char &AMDGPUReserveWWMRegsLegacyID;
172173

174+
void initializeAMDGPUTDMOptimizationPass(PassRegistry &);
175+
173176
void initializeAMDGPURewriteOutArgumentsPass(PassRegistry &);
174177
extern char &AMDGPURewriteOutArgumentsID;
175178

0 commit comments

Comments
 (0)