Summary
Implement NewtonSimulation(SimEngine) — a GPU-native simulation backend built on NVIDIA Warp + Newton 1.x. Second concrete SimEngine subclass after MuJoCo (#85), targeted at massive-parallel policy training (4096+ envs on a single GPU), differentiable simulation, and soft-body / cloth / MPM workloads.
Motivation
Newton fills the "fleet-scale GPU training" slot that MuJoCo (#85) cannot. It is also a multi-solver engine — under the hood it can run MuJoCo-Warp, Featherstone, XPBD, VBD, Style3D, Semi-Implicit, or Implicit MPM — which we expose through a single unified interface.
| Capability |
MuJoCo (#85) |
Newton (this issue) |
Isaac Sim (#TBD) |
| CPU-native fast iteration |
✅ |
⚠️ (kernel compile) |
❌ |
| GPU-native |
❌ |
✅ |
✅ |
| Multi-env parallelism |
1–8 |
4096+ |
4096+ |
| Differentiable sim |
❌ |
✅ (Warp autodiff) |
partial |
| Soft bodies / cloth / MPM |
❌ |
✅ |
✅ |
| Apple Silicon support |
✅ |
❌ (CUDA-only) |
❌ |
| Setup friction |
low |
medium |
high |
MuJoCo stays for inference / debugging / macOS. Newton becomes the training workhorse.
Design
Directory layout
strands_robots/simulation/
├── base.py # SimEngine ABC (#84)
├── models.py # SimWorld / SimRobot / SimObject
├── factory.py # create_simulation()
├── mujoco/ # #85
└── newton/ # ⭐ this issue
├── __init__.py # Export NewtonSimulation
├── simulation.py # class NewtonSimulation(SimEngine)
├── config.py # @dataclass NewtonConfig
├── solvers.py # SOLVER_MAP + per-solver adapters
├── procedural.py # Procedural URDF builder for common robots
├── diffsim.py # Differentiable simulation helpers
└── tests/
├── test_unit.py
└── test_gpu_integ.py # @pytest.mark.gpu
Core class
from strands_robots.simulation.base import SimEngine
from strands_robots.simulation.newton.config import NewtonConfig
class NewtonSimulation(SimEngine):
"""GPU-native simulation backend built on NVIDIA Warp + Newton 1.x.
Implements the SimEngine ABC. Every method delegates to Warp kernels
where possible; falls back to host code only for I/O.
"""
def __init__(self, config: NewtonConfig | None = None) -> None: ...
# --- Required SimEngine methods ---
def create_world(self, timestep=None, gravity=None, ground_plane=True) -> dict: ...
def destroy(self) -> dict: ...
def reset(self, env_ids: list[int] | None = None) -> dict: ...
def step(self, n_steps: int = 1) -> dict: ...
def get_state(self) -> dict: ...
def add_robot(self, name, urdf_path=None, data_config=None,
position=None, orientation=None) -> dict: ...
def remove_robot(self, name) -> dict: ...
def add_object(self, name, shape="box", **kwargs) -> dict: ...
def remove_object(self, name) -> dict: ...
def get_observation(self, robot_name=None, camera_name=None) -> dict: ...
def send_action(self, action, robot_name=None, n_substeps=1) -> None: ...
def render(self, camera_name="default", width=None, height=None) -> dict: ...
# --- Optional overrides ---
def load_scene(self, scene_path) -> dict: ...
def run_policy(self, robot_name, policy_provider="mock", **kwargs) -> dict: ...
def randomize(self, **kwargs) -> dict: ...
def get_contacts(self) -> dict: ...
def cleanup(self) -> None: ...
# --- Newton-specific extensions ---
def replicate(self, num_envs: int | None = None) -> dict: ...
def run_diffsim(self, num_steps, loss_fn, optimize_params,
lr=0.02, iterations=200) -> dict: ...
def solve_ik(self, robot_name, target_position,
target_orientation=None) -> dict: ...
def add_cloth(self, name, **kwargs) -> dict: ...
def add_cable(self, name, **kwargs) -> dict: ...
def add_particles(self, name, **kwargs) -> dict: ... # MPM
def add_sensor(self, name, kind, **kwargs) -> dict: ...
def read_sensor(self, name) -> dict: ...
def enable_dual_solver(self, articulated="featherstone",
soft="vbd") -> None: ...
NewtonConfig
@dataclass
class NewtonConfig:
num_envs: int = 1
device: str = "cuda:0" # "cuda:N" | "cpu" (host fallback)
solver: str = "mujoco" # SOLVER_MAP key
physics_dt: float = 1.0 / 60.0
substeps: int = 4
render_backend: str = "null" # opengl | rerun | viser | null
enable_cuda_graph: bool = True
enable_differentiable: bool = False
broad_phase: str = "sap" # sap | bvh | none
soft_contact_margin: float = 1e-3
up_axis: str = "Y" # Axis enum (#50)
Solver support matrix
| Solver |
Primary use |
Blocker on current Newton release |
Status |
mujoco |
Rigid-body manipulation (default) |
none |
Production |
featherstone |
Articulated rigid bodies |
Warp 1.11 ABI mismatch |
Blocked, re-test on 1.12 |
semi_implicit |
Soft-contact rigid bodies |
none |
Production |
xpbd |
Soft bodies / cloth-lite |
NVRTC warning (benign) |
Production |
vbd |
Soft-body only |
No revolute joint support |
Document as soft-only |
style3d |
Cloth only |
No rigid support |
Document as cloth-only |
implicit_mpm |
Granular / fluid |
Needs voxel_size config |
Document as MPM-only |
Definition of Done
Functional
Advanced
Registry + API
Quality
Documentation
Non-goals
- Distributed multi-GPU Newton (single GPU is enough for v0.4)
- Full USD pipeline (scope belongs to Isaac Sim backend)
- Custom Warp kernels from user code (power-user escape hatch only, not public API)
- Apple Silicon support (CUDA-only; MuJoCo remains for macOS)
Implementation plan (7 PRs)
feat(newton): stub NewtonSimulation(SimEngine) — skeleton, config, registry entry, lazy-import. All methods raise NotImplementedError. ~200 LOC.
feat(newton): world lifecycle + so100 procedural — create_world/destroy/reset/step, add_robot for so100. Smoke test passes. ~600 LOC.
feat(newton): observation/action API — get_observation/send_action, joint_q / body_q plumbing. ~300 LOC.
feat(newton): rendering — OpenGL backend, camera, render(). ~400 LOC.
feat(newton): replicate() + 4096-env throughput benchmark — GPU integ test. ~250 LOC.
feat(newton): solve_ik + diffsim — higher-level capabilities. ~350 LOC.
docs(newton): examples + backend docs — docs + 2 examples.
Total estimated: ~2.1K LOC + ~1K tests + ~500 docs across 7 PRs.
Risks & mitigations
| Risk |
Severity |
Mitigation |
| Warp version churn (1.11 → 1.12) breaks ABI |
HIGH |
Pin Warp in [newton] extras; matrix-test on 2 versions |
| Blackwell sm_101 adjoint issue in some solvers |
MEDIUM |
Skip affected solvers on sm_101 with warning |
| URDF / MJCF asset loader drift |
MEDIUM |
Reuse MuJoCo backend's asset resolver from #85 |
| Kernel recompilation slow on first run |
LOW |
Cache CUDA graphs under ~/.cache/strands-robots/newton/ |
Acceptance test
python -c "
from strands_robots.simulation import create_simulation
sim = create_simulation('newton', solver='mujoco', num_envs=4096)
sim.create_world()
sim.add_robot('so100')
sim.replicate(4096)
import time; t0 = time.time()
sim.step(100)
print(f'4096 envs × 100 steps = {time.time()-t0:.2f}s')
# Target: < 2.0s on A100-class or Jetson Thor
sim.destroy()
"
Related
Summary
Implement
NewtonSimulation(SimEngine)— a GPU-native simulation backend built on NVIDIA Warp + Newton 1.x. Second concreteSimEnginesubclass after MuJoCo (#85), targeted at massive-parallel policy training (4096+ envs on a single GPU), differentiable simulation, and soft-body / cloth / MPM workloads.Motivation
Newton fills the "fleet-scale GPU training" slot that MuJoCo (#85) cannot. It is also a multi-solver engine — under the hood it can run MuJoCo-Warp, Featherstone, XPBD, VBD, Style3D, Semi-Implicit, or Implicit MPM — which we expose through a single unified interface.
MuJoCo stays for inference / debugging / macOS. Newton becomes the training workhorse.
Design
Directory layout
Core class
NewtonConfigSolver support matrix
mujocofeatherstonesemi_implicitxpbdvbdstyle3dimplicit_mpmDefinition of Done
Functional
create_simulation("newton")returns a liveNewtonSimulationcreate_simulation("newton", solver="mujoco")works end-to-endsim.add_robot("so100")works without explicit URDF (procedural fallback)sim.add_robot("unitree_g1", urdf_path=...)works with URDFsim.step(100)stable at 4096 envs on a single modern GPUsim.get_observation("so100")returns{joint_q, joint_qd, body_q}as numpy arrayssim.render(camera_name="default")returns RGB frame (OpenGL backend)sim.run_policy("so100", policy_provider="gr00t")integration passesAdvanced
sim.replicate(num_envs=4096)— target: >50k aggregate steps/sec on A100-class or Jetson Thorsim.run_diffsim(...)converges on a toy task (e.g. initial-velocity optimization)sim.solve_ik("so100", target_position=[0.3, 0, 0.2])returns valid joint_qRegistry + API
factory._BUILTIN_BACKENDS["newton"] = ("strands_robots.simulation.newton.simulation", "NewtonSimulation")"warp"resolves to"newton"list_backends()includesnewton,warpRobot("so100", mode="sim", backend="newton")auto-creates a Newton simQuality
pip install strands-robots[newton]extras resolves (warp-lang, newton-physics)strands_robots.simulationdoes NOT trigger Warp import@pytest.mark.gpuintegration suite (separate GPU CI job)Anyin public signatures except**kwargsDocumentation
docs/backends/newton.md— installation, solver selection, troubleshootingexamples/newton_fleet_training.py— 4096-env RL training demoexamples/newton_diffsim.py— differentiable-sim toy exampleNon-goals
Implementation plan (7 PRs)
feat(newton): stub NewtonSimulation(SimEngine)— skeleton, config, registry entry, lazy-import. All methods raiseNotImplementedError. ~200 LOC.feat(newton): world lifecycle + so100 procedural—create_world/destroy/reset/step,add_robotfor so100. Smoke test passes. ~600 LOC.feat(newton): observation/action API—get_observation/send_action, joint_q / body_q plumbing. ~300 LOC.feat(newton): rendering— OpenGL backend, camera,render(). ~400 LOC.feat(newton): replicate() + 4096-env throughput benchmark— GPU integ test. ~250 LOC.feat(newton): solve_ik + diffsim— higher-level capabilities. ~350 LOC.docs(newton): examples + backend docs— docs + 2 examples.Total estimated: ~2.1K LOC + ~1K tests + ~500 docs across 7 PRs.
Risks & mitigations
[newton]extras; matrix-test on 2 versions~/.cache/strands-robots/newton/Acceptance test
Related
up_axisAxis-enum fix)