Skip to content

remesh_narrow_band_dc produces fragmented output on Blackwell (sm_120) GPUs — confirmed not a build target issue #27

@ThatButters

Description

@ThatButters

Environment

  • GPU: NVIDIA RTX 5070 Ti (sm_120 / Blackwell)
  • OS: Ubuntu 22.04 (WSL2)
  • PyTorch: 2.7+ with CUDA 12.8
  • CuMesh: latest from main branch, compiled with native compute_120 support

Problem

cumesh.remeshing.remesh_narrow_band_dc produces fragmented, disconnected mesh output on Blackwell GPUs. The output contains tens of thousands of disconnected components instead of a single connected mesh. This breaks the TRELLIS.2 to_glb() pipeline completely on all RTX 50-series GPUs.

Diagnostics from recompiled CuMesh with native sm_120

Mesh 0: 536,644 verts, 496,259 faces
Connected components: 58,265
Watertight: False

What I've ruled out

  1. Not a missing build target — Recompiled CuMesh from source with compute_120 / sm_120 explicitly added to CUDA architecture targets using nvcc 12.8. Same fragmented output.
  2. Not a CUDA hashmap bug — Reimplemented flexible_dual_grid_to_mesh hashmap operations in pure CPU Python/NumPy. Same fragmented dual grid mesh output, confirming the raw dual grid is intentionally sparse and the remeshing step is where connectivity should be established.
  3. Not a Poisson reconstruction alternative — Attempted Open3D Poisson reconstruction on the raw vertices with area-weighted normals from face data. Normals are too inconsistent across disconnected patches; produces blobs.

Conclusion

The remeshing kernel itself produces incorrect results on Blackwell's execution model, even when compiled natively for sm_120. This likely involves a behavioral difference in how Blackwell handles atomics, warp-synchronous operations, or shared memory compared to Ampere/Ada.

Impact

  • Affects all RTX 50-series GPUs (5070, 5070 Ti, 5080, 5090)
  • Completely breaks TRELLIS.2 mesh export on these GPUs
  • Existing GitHub issues on microsoft/TRELLIS.2 (feat: Add sharp edge preservation to dual contouring #19) and microsoft/TRELLIS (#243) reference this but without root cause
  • No public workaround exists currently other than the one listed below

Current workaround

I've published a CPU-based voxel morphological approach (marching cubes on the raw voxel coordinates) that produces watertight meshes on Blackwell GPUs. It bypasses CuMesh remeshing entirely. Available at: https://github.com/ThatButters/trellis2-blackwell-fix — happy to collaborate on a proper kernel-level fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions