feat(newton): NVIDIA Warp/Newton simulation backend — GPU-native SimEngine

## Summary

Implement `NewtonSimulation(SimEngine)` — a GPU-native simulation backend built on **NVIDIA Warp + Newton 1.x**. Second concrete `SimEngine` subclass after MuJoCo (#85), targeted at **massive-parallel policy training** (4096+ envs on a single GPU), **differentiable simulation**, and soft-body / cloth / MPM workloads.

---

## Motivation

Newton fills the "fleet-scale GPU training" slot that MuJoCo (#85) cannot. It is also a multi-solver engine — under the hood it can run MuJoCo-Warp, Featherstone, XPBD, VBD, Style3D, Semi-Implicit, or Implicit MPM — which we expose through a single unified interface.

| Capability | MuJoCo (#85) | **Newton (this issue)** | Isaac Sim (#TBD) |
|---|:---:|:---:|:---:|
| CPU-native fast iteration | ✅ | ⚠️ (kernel compile) | ❌ |
| GPU-native | ❌ | **✅** | ✅ |
| Multi-env parallelism | 1–8 | **4096+** | 4096+ |
| Differentiable sim | ❌ | **✅ (Warp autodiff)** | partial |
| Soft bodies / cloth / MPM | ❌ | **✅** | ✅ |
| Apple Silicon support | ✅ | ❌ (CUDA-only) | ❌ |
| Setup friction | low | medium | high |

MuJoCo stays for inference / debugging / macOS. Newton becomes the training workhorse.

---

## Design

### Directory layout

```
strands_robots/simulation/
├── base.py                  # SimEngine ABC (#84)
├── models.py                # SimWorld / SimRobot / SimObject
├── factory.py               # create_simulation()
├── mujoco/                  # #85
└── newton/                  # ⭐ this issue
    ├── __init__.py          # Export NewtonSimulation
    ├── simulation.py        # class NewtonSimulation(SimEngine)
    ├── config.py            # @dataclass NewtonConfig
    ├── solvers.py           # SOLVER_MAP + per-solver adapters
    ├── procedural.py        # Procedural URDF builder for common robots
    ├── diffsim.py           # Differentiable simulation helpers
    └── tests/
        ├── test_unit.py
        └── test_gpu_integ.py  # @pytest.mark.gpu
```

### Core class

```python
from strands_robots.simulation.base import SimEngine
from strands_robots.simulation.newton.config import NewtonConfig

class NewtonSimulation(SimEngine):
    """GPU-native simulation backend built on NVIDIA Warp + Newton 1.x.
    
    Implements the SimEngine ABC. Every method delegates to Warp kernels
    where possible; falls back to host code only for I/O.
    """

    def __init__(self, config: NewtonConfig | None = None) -> None: ...

    # --- Required SimEngine methods ---
    def create_world(self, timestep=None, gravity=None, ground_plane=True) -> dict: ...
    def destroy(self) -> dict: ...
    def reset(self, env_ids: list[int] | None = None) -> dict: ...
    def step(self, n_steps: int = 1) -> dict: ...
    def get_state(self) -> dict: ...
    def add_robot(self, name, urdf_path=None, data_config=None,
                  position=None, orientation=None) -> dict: ...
    def remove_robot(self, name) -> dict: ...
    def add_object(self, name, shape="box", **kwargs) -> dict: ...
    def remove_object(self, name) -> dict: ...
    def get_observation(self, robot_name=None, camera_name=None) -> dict: ...
    def send_action(self, action, robot_name=None, n_substeps=1) -> None: ...
    def render(self, camera_name="default", width=None, height=None) -> dict: ...

    # --- Optional overrides ---
    def load_scene(self, scene_path) -> dict: ...
    def run_policy(self, robot_name, policy_provider="mock", **kwargs) -> dict: ...
    def randomize(self, **kwargs) -> dict: ...
    def get_contacts(self) -> dict: ...
    def cleanup(self) -> None: ...

    # --- Newton-specific extensions ---
    def replicate(self, num_envs: int | None = None) -> dict: ...
    def run_diffsim(self, num_steps, loss_fn, optimize_params,
                    lr=0.02, iterations=200) -> dict: ...
    def solve_ik(self, robot_name, target_position,
                 target_orientation=None) -> dict: ...
    def add_cloth(self, name, **kwargs) -> dict: ...
    def add_cable(self, name, **kwargs) -> dict: ...
    def add_particles(self, name, **kwargs) -> dict: ...  # MPM
    def add_sensor(self, name, kind, **kwargs) -> dict: ...
    def read_sensor(self, name) -> dict: ...
    def enable_dual_solver(self, articulated="featherstone",
                           soft="vbd") -> None: ...
```

### `NewtonConfig`

```python
@dataclass
class NewtonConfig:
    num_envs: int = 1
    device: str = "cuda:0"               # "cuda:N" | "cpu" (host fallback)
    solver: str = "mujoco"               # SOLVER_MAP key
    physics_dt: float = 1.0 / 60.0
    substeps: int = 4
    render_backend: str = "null"         # opengl | rerun | viser | null
    enable_cuda_graph: bool = True
    enable_differentiable: bool = False
    broad_phase: str = "sap"             # sap | bvh | none
    soft_contact_margin: float = 1e-3
    up_axis: str = "Y"                   # Axis enum (#50)
```

### Solver support matrix

| Solver | Primary use | Blocker on current Newton release | Status |
|---|---|:---:|---|
| `mujoco` | Rigid-body manipulation (default) | none | Production |
| `featherstone` | Articulated rigid bodies | Warp 1.11 ABI mismatch | Blocked, re-test on 1.12 |
| `semi_implicit` | Soft-contact rigid bodies | none | Production |
| `xpbd` | Soft bodies / cloth-lite | NVRTC warning (benign) | Production |
| `vbd` | Soft-body only | No revolute joint support | Document as soft-only |
| `style3d` | Cloth only | No rigid support | Document as cloth-only |
| `implicit_mpm` | Granular / fluid | Needs voxel_size config | Document as MPM-only |

---

## Definition of Done

### Functional
- [ ] `create_simulation("newton")` returns a live `NewtonSimulation`
- [ ] `create_simulation("newton", solver="mujoco")` works end-to-end
- [ ] `sim.add_robot("so100")` works without explicit URDF (procedural fallback)
- [ ] `sim.add_robot("unitree_g1", urdf_path=...)` works with URDF
- [ ] `sim.step(100)` stable at 4096 envs on a single modern GPU
- [ ] `sim.get_observation("so100")` returns `{joint_q, joint_qd, body_q}` as numpy arrays
- [ ] `sim.render(camera_name="default")` returns RGB frame (OpenGL backend)
- [ ] `sim.run_policy("so100", policy_provider="gr00t")` integration passes

### Advanced
- [ ] `sim.replicate(num_envs=4096)` — **target: >50k aggregate steps/sec on A100-class or Jetson Thor**
- [ ] `sim.run_diffsim(...)` converges on a toy task (e.g. initial-velocity optimization)
- [ ] `sim.solve_ik("so100", target_position=[0.3, 0, 0.2])` returns valid joint_q
- [ ] At least 3 solvers working end-to-end (mujoco, xpbd, semi_implicit)

### Registry + API
- [ ] `factory._BUILTIN_BACKENDS["newton"] = ("strands_robots.simulation.newton.simulation", "NewtonSimulation")`
- [ ] Alias `"warp"` resolves to `"newton"`
- [ ] `list_backends()` includes `newton`, `warp`
- [ ] `Robot("so100", mode="sim", backend="newton")` auto-creates a Newton sim

### Quality
- [ ] `pip install strands-robots[newton]` extras resolves (warp-lang, newton-physics)
- [ ] Lazy imports — importing `strands_robots.simulation` does NOT trigger Warp import
- [ ] 30+ unit tests covering world lifecycle, entity management, observation/action
- [ ] `@pytest.mark.gpu` integration suite (separate GPU CI job)
- [ ] NumPy-style docstrings + type hints on every public method
- [ ] No `Any` in public signatures except `**kwargs`

### Documentation
- [ ] `docs/backends/newton.md` — installation, solver selection, troubleshooting
- [ ] `examples/newton_fleet_training.py` — 4096-env RL training demo
- [ ] `examples/newton_diffsim.py` — differentiable-sim toy example
- [ ] README backend comparison table updated with Newton row

---

## Non-goals

- Distributed multi-GPU Newton (single GPU is enough for v0.4)
- Full USD pipeline (scope belongs to Isaac Sim backend)
- Custom Warp kernels from user code (power-user escape hatch only, not public API)
- Apple Silicon support (CUDA-only; MuJoCo remains for macOS)

---

## Implementation plan (7 PRs)

1. **`feat(newton): stub NewtonSimulation(SimEngine)`** — skeleton, config, registry entry, lazy-import. All methods raise `NotImplementedError`. ~200 LOC.
2. **`feat(newton): world lifecycle + so100 procedural`** — `create_world/destroy/reset/step`, `add_robot` for so100. Smoke test passes. ~600 LOC.
3. **`feat(newton): observation/action API`** — `get_observation/send_action`, joint_q / body_q plumbing. ~300 LOC.
4. **`feat(newton): rendering`** — OpenGL backend, camera, `render()`. ~400 LOC.
5. **`feat(newton): replicate() + 4096-env throughput benchmark`** — GPU integ test. ~250 LOC.
6. **`feat(newton): solve_ik + diffsim`** — higher-level capabilities. ~350 LOC.
7. **`docs(newton): examples + backend docs`** — docs + 2 examples.

**Total estimated**: ~2.1K LOC + ~1K tests + ~500 docs across 7 PRs.

---

## Risks & mitigations

| Risk | Severity | Mitigation |
|---|:---:|---|
| Warp version churn (1.11 → 1.12) breaks ABI | HIGH | Pin Warp in `[newton]` extras; matrix-test on 2 versions |
| Blackwell sm_101 adjoint issue in some solvers | MEDIUM | Skip affected solvers on sm_101 with warning |
| URDF / MJCF asset loader drift | MEDIUM | Reuse MuJoCo backend's asset resolver from #85 |
| Kernel recompilation slow on first run | LOW | Cache CUDA graphs under `~/.cache/strands-robots/newton/` |

---

## Acceptance test

```bash
python -c "
from strands_robots.simulation import create_simulation
sim = create_simulation('newton', solver='mujoco', num_envs=4096)
sim.create_world()
sim.add_robot('so100')
sim.replicate(4096)
import time; t0 = time.time()
sim.step(100)
print(f'4096 envs × 100 steps = {time.time()-t0:.2f}s')
# Target: < 2.0s on A100-class or Jetson Thor
sim.destroy()
"
```

---

## Related

- Depends on: #84 (SimEngine ABC), #50 (Newton `up_axis` Axis-enum fix)
- Reference: #85 (MuJoCo backend — mirror this port pattern)
- Roadmap: #94
- Newton upstream: https://github.com/newton-physics/newton (Linux Foundation: NVIDIA + Disney + DeepMind)
- Warp upstream: https://github.com/NVIDIA/warp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(newton): NVIDIA Warp/Newton simulation backend — GPU-native SimEngine #96

Summary

Motivation

Design

Directory layout

Core class

`NewtonConfig`

Solver support matrix

Definition of Done

Functional

Advanced

Registry + API

Quality

Documentation

Non-goals

Implementation plan (7 PRs)

Risks & mitigations

Acceptance test

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Capability	MuJoCo (#85)	Newton (this issue)	Isaac Sim (#TBD)
CPU-native fast iteration	✅	⚠️ (kernel compile)	❌
GPU-native	❌	✅	✅
Multi-env parallelism	1–8	4096+	4096+
Differentiable sim	❌	✅ (Warp autodiff)	partial
Soft bodies / cloth / MPM	❌	✅	✅
Apple Silicon support	✅	❌ (CUDA-only)	❌
Setup friction	low	medium	high

Solver	Primary use	Blocker on current Newton release	Status
`mujoco`	Rigid-body manipulation (default)	none	Production
`featherstone`	Articulated rigid bodies	Warp 1.11 ABI mismatch	Blocked, re-test on 1.12
`semi_implicit`	Soft-contact rigid bodies	none	Production
`xpbd`	Soft bodies / cloth-lite	NVRTC warning (benign)	Production
`vbd`	Soft-body only	No revolute joint support	Document as soft-only
`style3d`	Cloth only	No rigid support	Document as cloth-only
`implicit_mpm`	Granular / fluid	Needs voxel_size config	Document as MPM-only

Risk	Severity	Mitigation
Warp version churn (1.11 → 1.12) breaks ABI	HIGH	Pin Warp in `[newton]` extras; matrix-test on 2 versions
Blackwell sm_101 adjoint issue in some solvers	MEDIUM	Skip affected solvers on sm_101 with warning
URDF / MJCF asset loader drift	MEDIUM	Reuse MuJoCo backend's asset resolver from #85
Kernel recompilation slow on first run	LOW	Cache CUDA graphs under `~/.cache/strands-robots/newton/`

feat(newton): NVIDIA Warp/Newton simulation backend — GPU-native SimEngine #96

Description

Summary

Motivation

Design

Directory layout

Core class

NewtonConfig

Solver support matrix

Definition of Done

Functional

Advanced

Registry + API

Quality

Documentation

Non-goals

Implementation plan (7 PRs)

Risks & mitigations

Acceptance test

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`NewtonConfig`