Skip to content

feat(newton): NVIDIA Warp/Newton simulation backend — GPU-native SimEngine #96

@cagataycali

Description

@cagataycali

Summary

Implement NewtonSimulation(SimEngine) — a GPU-native simulation backend built on NVIDIA Warp + Newton 1.x. Second concrete SimEngine subclass after MuJoCo (#85), targeted at massive-parallel policy training (4096+ envs on a single GPU), differentiable simulation, and soft-body / cloth / MPM workloads.


Motivation

Newton fills the "fleet-scale GPU training" slot that MuJoCo (#85) cannot. It is also a multi-solver engine — under the hood it can run MuJoCo-Warp, Featherstone, XPBD, VBD, Style3D, Semi-Implicit, or Implicit MPM — which we expose through a single unified interface.

Capability MuJoCo (#85) Newton (this issue) Isaac Sim (#TBD)
CPU-native fast iteration ⚠️ (kernel compile)
GPU-native
Multi-env parallelism 1–8 4096+ 4096+
Differentiable sim ✅ (Warp autodiff) partial
Soft bodies / cloth / MPM
Apple Silicon support ❌ (CUDA-only)
Setup friction low medium high

MuJoCo stays for inference / debugging / macOS. Newton becomes the training workhorse.


Design

Directory layout

strands_robots/simulation/
├── base.py                  # SimEngine ABC (#84)
├── models.py                # SimWorld / SimRobot / SimObject
├── factory.py               # create_simulation()
├── mujoco/                  # #85
└── newton/                  # ⭐ this issue
    ├── __init__.py          # Export NewtonSimulation
    ├── simulation.py        # class NewtonSimulation(SimEngine)
    ├── config.py            # @dataclass NewtonConfig
    ├── solvers.py           # SOLVER_MAP + per-solver adapters
    ├── procedural.py        # Procedural URDF builder for common robots
    ├── diffsim.py           # Differentiable simulation helpers
    └── tests/
        ├── test_unit.py
        └── test_gpu_integ.py  # @pytest.mark.gpu

Core class

from strands_robots.simulation.base import SimEngine
from strands_robots.simulation.newton.config import NewtonConfig

class NewtonSimulation(SimEngine):
    """GPU-native simulation backend built on NVIDIA Warp + Newton 1.x.
    
    Implements the SimEngine ABC. Every method delegates to Warp kernels
    where possible; falls back to host code only for I/O.
    """

    def __init__(self, config: NewtonConfig | None = None) -> None: ...

    # --- Required SimEngine methods ---
    def create_world(self, timestep=None, gravity=None, ground_plane=True) -> dict: ...
    def destroy(self) -> dict: ...
    def reset(self, env_ids: list[int] | None = None) -> dict: ...
    def step(self, n_steps: int = 1) -> dict: ...
    def get_state(self) -> dict: ...
    def add_robot(self, name, urdf_path=None, data_config=None,
                  position=None, orientation=None) -> dict: ...
    def remove_robot(self, name) -> dict: ...
    def add_object(self, name, shape="box", **kwargs) -> dict: ...
    def remove_object(self, name) -> dict: ...
    def get_observation(self, robot_name=None, camera_name=None) -> dict: ...
    def send_action(self, action, robot_name=None, n_substeps=1) -> None: ...
    def render(self, camera_name="default", width=None, height=None) -> dict: ...

    # --- Optional overrides ---
    def load_scene(self, scene_path) -> dict: ...
    def run_policy(self, robot_name, policy_provider="mock", **kwargs) -> dict: ...
    def randomize(self, **kwargs) -> dict: ...
    def get_contacts(self) -> dict: ...
    def cleanup(self) -> None: ...

    # --- Newton-specific extensions ---
    def replicate(self, num_envs: int | None = None) -> dict: ...
    def run_diffsim(self, num_steps, loss_fn, optimize_params,
                    lr=0.02, iterations=200) -> dict: ...
    def solve_ik(self, robot_name, target_position,
                 target_orientation=None) -> dict: ...
    def add_cloth(self, name, **kwargs) -> dict: ...
    def add_cable(self, name, **kwargs) -> dict: ...
    def add_particles(self, name, **kwargs) -> dict: ...  # MPM
    def add_sensor(self, name, kind, **kwargs) -> dict: ...
    def read_sensor(self, name) -> dict: ...
    def enable_dual_solver(self, articulated="featherstone",
                           soft="vbd") -> None: ...

NewtonConfig

@dataclass
class NewtonConfig:
    num_envs: int = 1
    device: str = "cuda:0"               # "cuda:N" | "cpu" (host fallback)
    solver: str = "mujoco"               # SOLVER_MAP key
    physics_dt: float = 1.0 / 60.0
    substeps: int = 4
    render_backend: str = "null"         # opengl | rerun | viser | null
    enable_cuda_graph: bool = True
    enable_differentiable: bool = False
    broad_phase: str = "sap"             # sap | bvh | none
    soft_contact_margin: float = 1e-3
    up_axis: str = "Y"                   # Axis enum (#50)

Solver support matrix

Solver Primary use Blocker on current Newton release Status
mujoco Rigid-body manipulation (default) none Production
featherstone Articulated rigid bodies Warp 1.11 ABI mismatch Blocked, re-test on 1.12
semi_implicit Soft-contact rigid bodies none Production
xpbd Soft bodies / cloth-lite NVRTC warning (benign) Production
vbd Soft-body only No revolute joint support Document as soft-only
style3d Cloth only No rigid support Document as cloth-only
implicit_mpm Granular / fluid Needs voxel_size config Document as MPM-only

Definition of Done

Functional

  • create_simulation("newton") returns a live NewtonSimulation
  • create_simulation("newton", solver="mujoco") works end-to-end
  • sim.add_robot("so100") works without explicit URDF (procedural fallback)
  • sim.add_robot("unitree_g1", urdf_path=...) works with URDF
  • sim.step(100) stable at 4096 envs on a single modern GPU
  • sim.get_observation("so100") returns {joint_q, joint_qd, body_q} as numpy arrays
  • sim.render(camera_name="default") returns RGB frame (OpenGL backend)
  • sim.run_policy("so100", policy_provider="gr00t") integration passes

Advanced

  • sim.replicate(num_envs=4096)target: >50k aggregate steps/sec on A100-class or Jetson Thor
  • sim.run_diffsim(...) converges on a toy task (e.g. initial-velocity optimization)
  • sim.solve_ik("so100", target_position=[0.3, 0, 0.2]) returns valid joint_q
  • At least 3 solvers working end-to-end (mujoco, xpbd, semi_implicit)

Registry + API

  • factory._BUILTIN_BACKENDS["newton"] = ("strands_robots.simulation.newton.simulation", "NewtonSimulation")
  • Alias "warp" resolves to "newton"
  • list_backends() includes newton, warp
  • Robot("so100", mode="sim", backend="newton") auto-creates a Newton sim

Quality

  • pip install strands-robots[newton] extras resolves (warp-lang, newton-physics)
  • Lazy imports — importing strands_robots.simulation does NOT trigger Warp import
  • 30+ unit tests covering world lifecycle, entity management, observation/action
  • @pytest.mark.gpu integration suite (separate GPU CI job)
  • NumPy-style docstrings + type hints on every public method
  • No Any in public signatures except **kwargs

Documentation

  • docs/backends/newton.md — installation, solver selection, troubleshooting
  • examples/newton_fleet_training.py — 4096-env RL training demo
  • examples/newton_diffsim.py — differentiable-sim toy example
  • README backend comparison table updated with Newton row

Non-goals

  • Distributed multi-GPU Newton (single GPU is enough for v0.4)
  • Full USD pipeline (scope belongs to Isaac Sim backend)
  • Custom Warp kernels from user code (power-user escape hatch only, not public API)
  • Apple Silicon support (CUDA-only; MuJoCo remains for macOS)

Implementation plan (7 PRs)

  1. feat(newton): stub NewtonSimulation(SimEngine) — skeleton, config, registry entry, lazy-import. All methods raise NotImplementedError. ~200 LOC.
  2. feat(newton): world lifecycle + so100 proceduralcreate_world/destroy/reset/step, add_robot for so100. Smoke test passes. ~600 LOC.
  3. feat(newton): observation/action APIget_observation/send_action, joint_q / body_q plumbing. ~300 LOC.
  4. feat(newton): rendering — OpenGL backend, camera, render(). ~400 LOC.
  5. feat(newton): replicate() + 4096-env throughput benchmark — GPU integ test. ~250 LOC.
  6. feat(newton): solve_ik + diffsim — higher-level capabilities. ~350 LOC.
  7. docs(newton): examples + backend docs — docs + 2 examples.

Total estimated: ~2.1K LOC + ~1K tests + ~500 docs across 7 PRs.


Risks & mitigations

Risk Severity Mitigation
Warp version churn (1.11 → 1.12) breaks ABI HIGH Pin Warp in [newton] extras; matrix-test on 2 versions
Blackwell sm_101 adjoint issue in some solvers MEDIUM Skip affected solvers on sm_101 with warning
URDF / MJCF asset loader drift MEDIUM Reuse MuJoCo backend's asset resolver from #85
Kernel recompilation slow on first run LOW Cache CUDA graphs under ~/.cache/strands-robots/newton/

Acceptance test

python -c "
from strands_robots.simulation import create_simulation
sim = create_simulation('newton', solver='mujoco', num_envs=4096)
sim.create_world()
sim.add_robot('so100')
sim.replicate(4096)
import time; t0 = time.time()
sim.step(100)
print(f'4096 envs × 100 steps = {time.time()-t0:.2f}s')
# Target: < 2.0s on A100-class or Jetson Thor
sim.destroy()
"

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestnewtonNVIDIA Newton / Warp simulation backendsimulation

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions