remesh_narrow_band_dc produces fragmented output on Blackwell (sm_120) GPUs — confirmed not a build target issue

## Environment
- **GPU:** NVIDIA RTX 5070 Ti (sm_120 / Blackwell)
- **OS:** Ubuntu 22.04 (WSL2)
- **PyTorch:** 2.7+ with CUDA 12.8
- **CuMesh:** latest from main branch, compiled with native compute_120 support

## Problem

`cumesh.remeshing.remesh_narrow_band_dc` produces fragmented, disconnected mesh output on Blackwell GPUs. The output contains tens of thousands of disconnected components instead of a single connected mesh. This breaks the TRELLIS.2 `to_glb()` pipeline completely on all RTX 50-series GPUs.

## Diagnostics from recompiled CuMesh with native sm_120

```
Mesh 0: 536,644 verts, 496,259 faces
Connected components: 58,265
Watertight: False
```

## What I've ruled out

1. **Not a missing build target** — Recompiled CuMesh from source with `compute_120` / `sm_120` explicitly added to CUDA architecture targets using nvcc 12.8. Same fragmented output.
2. **Not a CUDA hashmap bug** — Reimplemented `flexible_dual_grid_to_mesh` hashmap operations in pure CPU Python/NumPy. Same fragmented dual grid mesh output, confirming the raw dual grid is intentionally sparse and the remeshing step is where connectivity should be established.
3. **Not a Poisson reconstruction alternative** — Attempted Open3D Poisson reconstruction on the raw vertices with area-weighted normals from face data. Normals are too inconsistent across disconnected patches; produces blobs.

## Conclusion

The remeshing kernel itself produces incorrect results on Blackwell's execution model, even when compiled natively for sm_120. This likely involves a behavioral difference in how Blackwell handles atomics, warp-synchronous operations, or shared memory compared to Ampere/Ada.

## Impact

- Affects all RTX 50-series GPUs (5070, 5070 Ti, 5080, 5090)
- Completely breaks TRELLIS.2 mesh export on these GPUs
- Existing GitHub issues on microsoft/TRELLIS.2 (#19) and microsoft/TRELLIS (#243) reference this but without root cause
- No public workaround exists currently other than the one listed below

## Current workaround

I've published a CPU-based voxel morphological approach (marching cubes on the raw voxel coordinates) that produces watertight meshes on Blackwell GPUs. It bypasses CuMesh remeshing entirely. Available at: https://github.com/ThatButters/trellis2-blackwell-fix — happy to collaborate on a proper kernel-level fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remesh_narrow_band_dc produces fragmented output on Blackwell (sm_120) GPUs — confirmed not a build target issue #27

Environment

Problem

Diagnostics from recompiled CuMesh with native sm_120

What I've ruled out

Conclusion

Impact

Current workaround

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

remesh_narrow_band_dc produces fragmented output on Blackwell (sm_120) GPUs — confirmed not a build target issue #27

Description

Environment

Problem

Diagnostics from recompiled CuMesh with native sm_120

What I've ruled out

Conclusion

Impact

Current workaround

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions