Skip to content

[ROCm] Add AMD GPU support for TRELLIS.2#155

Open
andyluo7 wants to merge 1 commit intomicrosoft:mainfrom
andyluo7:add-rocm-support
Open

[ROCm] Add AMD GPU support for TRELLIS.2#155
andyluo7 wants to merge 1 commit intomicrosoft:mainfrom
andyluo7:add-rocm-support

Conversation

@andyluo7
Copy link
Copy Markdown

Summary

Enable TRELLIS.2 to run on AMD Instinct GPUs (MI300X / gfx942) with ROCm.

Your setup.sh already detects ROCm and installs ROCm PyTorch + ROCm flash-attention, which is great! However, the C++ extensions (O-Voxel) and rendering libraries (nvdiffrast) don't build/work on ROCm. This PR fills those gaps.

Changes

O-Voxel HIP port (5 files)

  • Add #ifdef __HIP_PLATFORM_AMD__ guards for CUDA→HIP header mapping in all .cu files
  • setup.py: default arch gfx942 for AMD MI300X

nvdiffrast ROCm adapter (2 new files)

  • trellis2/renderers/nvdiffrast_rocm_adapter.py: Pure PyTorch drop-in replacements for dr.rasterize(), dr.interpolate(), dr.texture(), dr.antialias(), dr.DepthPeeler
  • trellis2/renderers/rocm_compat.py: Auto-patches import nvdiffrast.torch as dr when nvdiffrast is unavailable

Companion PR

Testing

  • ✅ AMD MI300X (gfx942), ROCm 7.0.2, PyTorch 2.9.1
  • ✅ O-Voxel compiles with hipcc
  • ✅ CuMesh compiles with hipcc (import cumesh works)
  • ✅ FlexGEMM compiles on ROCm (no changes needed)
  • ✅ Pipeline class imports and loads pretrained model
  • ✅ Rasterize/interpolate/texture adapter verified with unit tests
  • ✅ All code cross-compilable — CUDA builds unaffected

Usage on ROCm

# Add this before importing renderers:
import trellis2.renderers.rocm_compat

# Then use normally
from trellis2.pipelines import Trellis2ImageTo3DPipeline
pipeline = Trellis2ImageTo3DPipeline.from_pretrained('microsoft/TRELLIS.2-4B')
pipeline.cuda()
result = pipeline.run(image)

Known Limitations

  • nvdiffrast rendering uses pure PyTorch fallback (no antialiasing, slower than CUDA rasterizer)
  • nvdiffrec PBR lighting is stubbed (not ported)
  • flash-attention falls back to SDPA on ROCm 7.0.2 (works but slower)
  • Only tested on gfx942 (MI300X)

Enable TRELLIS.2 to run on AMD Instinct GPUs (MI300X) with ROCm:

## O-Voxel HIP port (5 files)
- Add #ifdef __HIP_PLATFORM_AMD__ guards for CUDA→HIP header mapping
- setup.py: default arch gfx942 for AMD MI300X

## nvdiffrast ROCm adapter (new files)
- nvdiffrast_rocm_adapter.py: Pure PyTorch implementations of
  dr.rasterize(), dr.interpolate(), dr.texture(), dr.antialias(),
  dr.DepthPeeler — works on any PyTorch device
- rocm_compat.py: Auto-patches `import nvdiffrast.torch as dr`
  when nvdiffrast is not available (ROCm, CPU-only, etc.)

## Dependencies
- CuMesh HIP port: JeffreyXiang/CuMesh#31
- flash-attention: falls back to SDPA (PyTorch native) on ROCm
- nvdiffrec: stubbed with warning (PBR rendering not available)

## Testing
- Tested on AMD MI300X (gfx942) with ROCm 7.0.2 + PyTorch 2.9.1
- Pipeline imports and loads pretrained model successfully
- Core 3D generation (DiT inference → sparse voxels) works
- Rendering uses pure PyTorch fallback (functional but slower)

## Known limitations
- nvdiffrast rendering uses pure PyTorch (no antialiasing, slower)
- nvdiffrec PBR lighting is stubbed out
- flash-attention builds from ROCm fork but may need manual setup

Signed-off-by: Andy Luo <andyluo7@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant