You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CuMesh: latest from main branch, compiled with native compute_120 support
Problem
cumesh.remeshing.remesh_narrow_band_dc produces fragmented, disconnected mesh output on Blackwell GPUs. The output contains tens of thousands of disconnected components instead of a single connected mesh. This breaks the TRELLIS.2 to_glb() pipeline completely on all RTX 50-series GPUs.
Diagnostics from recompiled CuMesh with native sm_120
Not a missing build target — Recompiled CuMesh from source with compute_120 / sm_120 explicitly added to CUDA architecture targets using nvcc 12.8. Same fragmented output.
Not a CUDA hashmap bug — Reimplemented flexible_dual_grid_to_mesh hashmap operations in pure CPU Python/NumPy. Same fragmented dual grid mesh output, confirming the raw dual grid is intentionally sparse and the remeshing step is where connectivity should be established.
Not a Poisson reconstruction alternative — Attempted Open3D Poisson reconstruction on the raw vertices with area-weighted normals from face data. Normals are too inconsistent across disconnected patches; produces blobs.
Conclusion
The remeshing kernel itself produces incorrect results on Blackwell's execution model, even when compiled natively for sm_120. This likely involves a behavioral difference in how Blackwell handles atomics, warp-synchronous operations, or shared memory compared to Ampere/Ada.
Impact
Affects all RTX 50-series GPUs (5070, 5070 Ti, 5080, 5090)
Completely breaks TRELLIS.2 mesh export on these GPUs
No public workaround exists currently other than the one listed below
Current workaround
I've published a CPU-based voxel morphological approach (marching cubes on the raw voxel coordinates) that produces watertight meshes on Blackwell GPUs. It bypasses CuMesh remeshing entirely. Available at: https://github.com/ThatButters/trellis2-blackwell-fix — happy to collaborate on a proper kernel-level fix.
Environment
Problem
cumesh.remeshing.remesh_narrow_band_dcproduces fragmented, disconnected mesh output on Blackwell GPUs. The output contains tens of thousands of disconnected components instead of a single connected mesh. This breaks the TRELLIS.2to_glb()pipeline completely on all RTX 50-series GPUs.Diagnostics from recompiled CuMesh with native sm_120
What I've ruled out
compute_120/sm_120explicitly added to CUDA architecture targets using nvcc 12.8. Same fragmented output.flexible_dual_grid_to_meshhashmap operations in pure CPU Python/NumPy. Same fragmented dual grid mesh output, confirming the raw dual grid is intentionally sparse and the remeshing step is where connectivity should be established.Conclusion
The remeshing kernel itself produces incorrect results on Blackwell's execution model, even when compiled natively for sm_120. This likely involves a behavioral difference in how Blackwell handles atomics, warp-synchronous operations, or shared memory compared to Ampere/Ada.
Impact
Current workaround
I've published a CPU-based voxel morphological approach (marching cubes on the raw voxel coordinates) that produces watertight meshes on Blackwell GPUs. It bypasses CuMesh remeshing entirely. Available at: https://github.com/ThatButters/trellis2-blackwell-fix — happy to collaborate on a proper kernel-level fix.