feat(sim/mujoco): MjSpec-based builder (stages 0-2 of IDEA.md, closes part of #121) by cagataycali · Pull Request #127 · strands-labs/robots

cagataycali · 2026-05-05T05:10:43Z

Summary

First installment of the MJCF AST refactor proposed in IDEA.md. Stages 0, 1, 2 of 7. Tracked by umbrella issue #121.

Goal: replace the 273-line string-concat MJCFBuilder with mujoco.MjSpec (MuJoCo's official editable MJCF AST shipped in 3.2+). This eliminates ~600 lines of hand-rolled helpers across mjcf_builder.py + scene_ops.py (xyaxes math, name sanitization, ElementTree round-trips, mesh-path regex patching) and unlocks agent-authored raw MJCF.

This PR lands the additive portion. New builder lives alongside the old, behind STRANDS_SIM_USE_MJSPEC=1. Both paths tested in CI.

What this PR does

Stage 0 (prep)

Bump mujoco>=3.0.0 -> >=3.2.0 in pyproject.toml (hatch env already at 3.8.0).
Commit IDEA.md at repo root so the full staged plan is visible.

Stage 1 (new builder, feature-flagged)

New module strands_robots/simulation/mujoco/spec_builder.py (~260 LOC):
- SpecBuilder.build(world) -> mujoco.MjSpec produces compile-equivalent output to MJCFBuilder.build_objects_only.
- _geom_type(shape) - extends the shape vocabulary with ellipsoid (free bonus; legacy rejects it).
- _normalize_size(shape, size) - per-type size convention (box halves, sphere radius, cylinder/capsule radius+half-height, plane, mesh).
Simulation._compile_world gets a feature flag (STRANDS_SIM_USE_MJSPEC). Stashes the spec in _backend_state['spec'] for Stage 5 recompile use, exports spec.to_xml() to _backend_state['xml'] for legacy readers.
Simulation._use_mjspec() helper centralises env-var parsing.

Stage 2 (camera xyaxes -> quat)

_target_quat(pos, target) computes the look-at quaternion via mujoco.mju_mat2Quat - MuJoCo's own quaternion math, no hand-rolled linear algebra.
Under the new path, cameras emit quat= instead of xyaxes=. Compiled cam_mat0 differs by < 4e-7 (pure float noise).

Test updates

Two tests asserted on raw xyaxes="..." XML strings. Rewrote them to assert on model.cam_mat0 (representation-agnostic, works under BOTH paths):
- test_xyaxes_emitted_in_xml -> test_camera_orientation_written
- test_different_targets_produce_different_xyaxes -> test_different_targets_produce_different_orientations
New test file tests/simulation/mujoco/test_spec_builder.py (19 tests):
- Module-helper coverage (_geom_type including ellipsoid, _normalize_size per shape, _target_quat including degenerate cases).
- Parity class: legacy vs. spec produce identical nbody/ngeom/ncam/nu/njnt/nq/nv, identical body positions/masses, identical camera rotation matrices.
- Ellipsoid bonus test - proves the new shape the legacy rejects.

What this PR does NOT do

Does not touch scene_ops.py (Stages 5-6 will).
Does not touch robot composition (Stages 3-4 will).
Does not remove mjcf_builder.py (Stage 7).
Does not add replace_scene_mjcf / patch_scene_mjcf (Stage 6).

Safety

All existing tests pass under BOTH code paths:

	Legacy path (default)	MjSpec path (`STRANDS_SIM_USE_MJSPEC=1`)
`hatch run test tests/simulation/`	562 passed, 1 skipped	562 passed, 1 skipped
`hatch run lint`	clean	clean
`hatch run format`	clean	clean

Flag default is OFF so no existing consumer sees behaviour change. The MjSpec path is opt-in via env var for now; Stage 7 will flip the default + remove the flag once Stages 3-6 are proven.

Base branch

Based on feat/mujoco-backend (PR #85). Will rebase cleanly onto main once PR #85 merges.

Follow-ups on the project board

Tracked by umbrella #121:

sim/mujoco: MjSpec stage 3 - single-robot attach via spec.attach() (#121) #122 - Stage 3: single-robot attach via spec.attach()
sim/mujoco: MjSpec stage 4 - multi-robot compose via repeated attach() (#121) #123 - Stage 4: multi-robot compose via repeated attach
sim/mujoco: MjSpec stage 5 - port scene_ops inject_/eject_ to spec.recompile() (#121) #124 - Stage 5: port inject_*/eject_* to spec.recompile()
sim/mujoco: MjSpec stage 6 - agent-authored raw MJCF (replace_scene_mjcf / patch_scene_mjcf) (#121) #125 - Stage 6: agent-authored raw MJCF (replace_scene_mjcf, patch_scene_mjcf)
sim/mujoco: MjSpec stage 7 - remove feature flag, delete mjcf_builder.py (#121) #126 - Stage 7: remove feature flag, delete mjcf_builder.py

Complete MuJoCo simulation backend composed of focused mixins: Simulation(AgentTool) ├── PhysicsMixin # raycasting, jacobians, energy, forces, │ # mass matrix, checkpoints, inverse dynamics ├── PolicyRunnerMixin # run_policy, eval_policy, replay_episode ├── RenderingMixin # RGB/depth offscreen rendering, observations ├── RecordingMixin # LeRobot dataset recording └── RandomizationMixin # domain randomization (colors, lighting, physics) Supporting modules: - backend.py: lazy mujoco import + headless GL auto-config (EGL/OSMesa/GLFW) - mjcf_builder.py: procedural MJCF XML generation from dataclasses - scene_ops.py: XML round-trip for runtime object/camera injection - simulation.py: orchestrator dispatching 35 actions via tool_spec.json - dataset_recorder.py: LeRobot v3 format recorder (parquet + video) Key design decisions: - Simulation extends AgentTool directly: Agent(tools=[Simulation()]) works - Lazy MuJoCo import via _ensure_mujoco() — only when first needed - XML round-trip for scene modification (standard: dm_control, robosuite) - Same Policy ABC for sim and real — zero code changes for transfer Tests: 47 new tests (12 E2E + 35 physics unit tests) All use self-contained inline XML robots (no external files needed).

…anup HIGH: - Simulation now inherits SimulationBackend ABC (isinstance works) - start_policy rejects concurrent execution per robot (thread-safety) - XML injection protection via _sanitize_name() in MJCFBuilder MEDIUM: - overwrite defaults to False in start_recording - Silent frame dropping now respects strict=True (AGENTS.md strands-labs#5) LOW: - Remove dead _numpy_ify code - Replace insecure tempfile.mktemp with NamedTemporaryFile - Remove unimplemented total_reward from eval_policy - Reuse ThreadPoolExecutor in _async_utils (50Hz perf fix)

- mjcf_builder.py: move 'import re' after docstring into import block - simulation.py: sort SimulationBackend import alphabetically - dataset_recorder.py: add strict param to __init__ signature - Run ruff format on both files All checks pass: ruff check ✅, ruff format ✅, 335 tests ✅

- Wrap data.ctrl writes + mj_step calls with self._lock in run_policy and eval_policy to prevent concurrent MuJoCo data access - Apply _sanitize_name() to ALL user-provided names interpolated into MJCF XML (geom, joint, mesh, camera), not just body names - Import _sanitize_name in scene_ops for camera name validation Addresses review comments on thread-safety and XML injection. ruff check ✅, ruff format ✅, 335 tests ✅

Adds mujoco>=3.0.0,<4.0.0 to the [sim] optional-dependencies group, and includes it in [all] so CI installs it via 'uv sync --extra all --extra dev'.

Address yinsong1986's feedback to namespace the optional dependency group as [sim-mujoco] for clarity when additional sim backends are added.

…pdate lazy imports

…t stubs - Add mujoco.* to third-party ignore-missing-imports list - Add mypy override for simulation.mujoco.* with disable_error_code for attr-defined (cooperative mixin pattern), assignment (implicit Optional), override (extended signatures), and misc (MRO conflicts) - Add mypy override for _async_utils and dataset_recorder (pre-existing) - Fix add_robot/add_object/add_camera/move_object signatures: use X | None - Fix set_gravity, cleanup, __enter__/__exit__/__del__ return annotations - Fix randomization seed: int → int | None - Fix backend _ensure_mujoco return type annotation - Fix __init__.py __getattr__ type annotation

Removed 4 error categories from disable_error_code (assignment, typeddict-item, index, return-value) by fixing the actual code: - physics.py: Fix 16 implicit Optional params (X = None → X | None = None) - rendering.py: Fix 3 implicit Optional params - recording.py: Fix 2 implicit Optional params + inline ignore for fallback - policy_runner.py: Fix 5 implicit Optional params + inline ignore for narrowed arg - simulation.py: Fix send_action/create_world signatures to match base, fix variable name reuse bug (result → recompile_result), inline ignore for TypedDict ** expansion Remaining suppressed (all legitimate): - attr-defined (137): cooperative mixin pattern (self._world on mixins) - misc (3): MRO conflicts + import fallback redefinition - override (1): add_object extends base with orientation/mesh_path params - import-not-found (1): imageio optional dep - import-untyped (1): internal zenoh_mesh - has-type (1): dynamic renderer cache

…able_error_code Replace blanket disable_error_code with proper type fixes: - Add TYPE_CHECKING attribute declarations to all 5 mixins (PhysicsMixin, RenderingMixin, RecordingMixin, PolicyRunnerMixin, RandomizationMixin) so mypy can verify self._world, self._lock, etc. - Add _push_to_hub field to SimWorld dataclass (was missing) - Add orientation + mesh_path params to SimEngine.add_object base signature - Add **kwargs to RandomizationMixin.randomize to match base - Simplify SimEngine.randomize to **kwargs (backends define own params) - Add assert guards for _world None checks in rendering methods - Restructure recording.py import fallback to avoid redefinition errors - Fix _apply_sim_action Protocol stubs to match real signatures Result: 0 mypy errors, 0 disable_error_code, only 2 inline type: ignore with specific codes (arg-type for narrowed var, typeddict-item for ** expansion)

- Replace bare with for consistent optional dependency handling per project conventions - Add imageio and imageio-ffmpeg to sim-mujoco extras in pyproject.toml - Add type: ignore comment for dynamic imageio writer attribute

@awsarron

…ng, tests, XML parsing Addresses all 8 unresolved review threads from @awsarron (Apr 10): 1. pyproject.toml: Remove empty [sim] extra, move robot_descriptions into [sim-mujoco]. Update extra= reference in backend.py. (yinsong1986 thread) 2. pyproject.toml: Keep sim-mujoco naming (not just mujoco) for consistency with future sim-isaac, sim-pybullet extras. (awsarron nit — reply only) 3. mujoco/__init__.py: Stop exporting private functions (_configure_gl_backend, _ensure_mujoco, _is_headless). Internal callers already import from backend directly. (awsarron thread) 4. simulation.py: Centralize _ensure_mujoco() to __init__ — fail fast at construction time. Store as self._mj, use throughout Simulation methods. Mixins retain their own _ensure_mujoco() calls since they may be used independently. (awsarron thread) 5. backend.py: Add docstring explaining why _is_headless() is Linux-only — Windows uses WGL, macOS uses CGL, both support offscreen natively. (awsarron thread) 6. policy_runner.py: Replace duplicated private function TYPE_CHECKING stubs with a shared SimulationProtocol in new types.py module. Eliminates coupling via signature duplication. (awsarron thread) 7. test_mujoco_e2e.py: Add TestToolSpecActionCoverage — iterates every action enum in tool_spec.json and asserts hasattr(Simulation, method) via the alias map. Catches drift between spec and implementation. (awsarron thread) 8. scene_ops.py: Standardize on ElementTree for all XML manipulation. Converted inject_object_into_scene, inject_camera_into_scene, and _patch_xml_paths from regex/string.replace to ET. Kept regex fallback in _patch_xml_paths for malformed fragments. (awsarron thread)

…olicyRunnerMixin The `_: SimulationProtocol` pattern declares a class variable named `_` but does NOT propagate Protocol member declarations to mypy's understanding of the class. This caused 34 attr-defined errors in policy_runner.py. Fix: Replace with direct attribute declarations under TYPE_CHECKING, matching the pattern used by PhysicsMixin, RenderingMixin, RecordingMixin, and RandomizationMixin. The SimulationProtocol in types.py is preserved for runtime checks and documentation — it's the TYPE_CHECKING usage pattern that was incorrect. Lint: 0 errors (ruff check + ruff format + mypy) Tests: 323 passed, 2 skipped, 0 failures

Post-strands-labs#84 merge: SimWorld no longer carries MuJoCo-specific private fields (_xml, _robot_base_xml, _recording, _trajectory, _dataset_recorder, _tmpdir, _push_to_hub). These are MuJoCo backend implementation details and now live in world._backend_state, as the SimWorld docstring requests (prefer _backend_state over new fields). Migrated call sites: - mjcf_builder.py: tmpdir - policy_runner.py: recording, trajectory, dataset_recorder - recording.py: recording, trajectory, dataset_recorder, push_to_hub - scene_ops.py: robot_base_xml - simulation.py: xml, robot_base_xml, recording, trajectory Reads use dict[] where preceded by a guard that guarantees initialization (e.g. start_recording() sets before policy_runner reads), and .get() with sensible defaults where the key may be unset. Tests: 392 passed, 2 skipped (5 pre-existing test_path_validation failures are on main too — unrelated). Lint: ruff + mypy clean on 75 source files.

…le step, run_policy lock, joint→ctrl mapping Bug strands-labs#1 (CRITICAL): add_robot replaces entire world model - add_robot now uses inject_robot_into_scene() for XML round-trip composition - Robot bodies/actuators/assets/sensors are merged into existing scene XML - Existing world state (gravity, objects, cameras, other robots) is preserved - Discovered and worked around MuJoCo mj_saveLastXML global state quirk: the function always saves the last-loaded XML regardless of which MjModel is passed — fixed by reloading stored scene XML to reset the global state Bug strands-labs#2 (CRITICAL): eval_policy runs double physics steps - _apply_sim_action already calls mj_step internally - eval_policy called mj_step again unconditionally after _apply_sim_action - Fixed: mj_step only called in eval_policy when no actions are available Bug strands-labs#3 (MEDIUM): run_policy missing thread lock - eval_policy and replay_episode both use self._lock ✓ - run_policy (submitted to ThreadPoolExecutor) had no lock protection - Fixed: wrapped recording + _apply_sim_action in with self._lock Bug strands-labs#4 (MEDIUM): _apply_sim_action uses joint ID as ctrl index - Joint IDs and actuator indices are independent in MuJoCo - Old code: data.ctrl[jnt_id] — wrong when ordering differs - Fixed: uses model.actuator_trnid to find the actuator driving a given joint All 49 existing tests pass (46 passed, 3 skipped for headless GL).

…99 integration tests Bug 1: replay_episode advanced MuJoCo data.time via mj_step but never synced self._world.sim_time or step_count. After replay, get_state() reported t=0.0 — stale values that would corrupt time-series data. Bug 2: eval_policy's no-actions branch called mj_step without syncing sim_time/step_count, same class of state-tracking bug. Fix: Add sim_time = data.time and step_count += N after both code paths. Tests: New test_mujoco_simulation.py with 99 behavioral integration tests covering 14 test classes — world lifecycle, object/robot/camera management, scene injection (XML round-trip), rendering, randomization, introspection, URDF registry, policy execution, action dispatch, context manager, tool spec, viewer, and error paths. All exercised through Simulation's public API, no isinstance checks or attribute-existence tests. Coverage lift (MuJoCo simulation package): simulation.py: 20% → 79% rendering.py: 10% → 87% scene_ops.py: 7% → 68% policy_runner.py: 8% → 54% randomization.py: 18% → 100% mjcf_builder.py: 13% → 52% backend.py: 40% → 57% Quality: ruff clean, mypy clean, 148/148 tests pass.

Bug: _can_render() probes rendering by creating mj.Renderer(), which uses GLFW by default. On headless Linux without EGL/OSMesa, GLFW calls glfw_init() → C abort() (SIGABRT), killing the entire Python process. This is uncatchable by try/except Exception. The abort prevented ALL tests from running — pytest crashed during collection of test_mujoco_simulation.py. Fix: Add early-return guard in _can_render(): if _is_headless() and MUJOCO_GL is not set (meaning _configure_gl_backend found neither EGL nor OSMesa), return False immediately without probing. Logic: _ensure_mujoco() calls _configure_gl_backend() before import. If _configure_gl_backend() found EGL or OSMesa, it sets MUJOCO_GL. If MUJOCO_GL is still unset, only GLFW remains — which will abort. So the guard predicate is necessary and sufficient. Test fix: Replace duplicated _has_opengl() probe (same SIGABRT vulnerability) with import of the now-safe _can_render(). Before: Entire test suite aborts at 38% — core dump. After: 439 passed, 14 skipped, 0 new failures. Lint: ruff clean, mypy clean.

Bug: add_robot() fails with 'Error opening file .../Base.stl' when robot XML uses meshdir='assets/' but the merged scene XML uses the parent directory as meshdir. Root cause: inject_robot_into_scene() merges robot <mesh file='X.stl'> elements into the scene XML, but the scene's <compiler meshdir=...> points to the robot's base directory (e.g. trs_so_arm100/) while the mesh files are in a subdirectory (trs_so_arm100/assets/). The merged XML inherits the scene's meshdir, so MuJoCo looks for X.stl in the wrong directory. Fix: Add _rewrite_mesh_paths() that adjusts mesh file= attributes when robot and scene meshdirs differ. Converts each mesh path to absolute (via robot's meshdir), then makes it relative to the scene's meshdir. This handles the common case where MuJoCo Menagerie robots use meshdir='assets/' but the scene compiler points to the robot's parent directory. Tests: 158 passed, 8 skipped, 0 failures. mypy clean (0 errors in 50 files). Verified end-to-end: create_world → add_robot(so100) → add_robot(panda) → add_object → step — all working.

…spec dispatcher Bug: Simulation._dispatch_action filtered kwargs through a hardcoded whitelist that omitted observation_mapping, action_mapping, data_config, host, port, api_token, trust_remote_code, actions_per_step, use_processor, processor_overrides, and any other policy-specific kwarg. Agents could not wire a policy (GR00T, SmolVLA, lerobot_local) to a simulated robot through the AgentTool interface — sim joint names and canonical model keys never got reconciled, breaking sim↔real transfer. Fix: - simulation.py::_dispatch_action — replace the whitelist with a mapping-aware passthrough: for methods that declare **policy_kwargs, forward every input field that isn't already matched to a named parameter. Actions without **kwargs stay strict. - tool_spec.json — advertise observation_mapping, action_mapping, host, port, api_token, trust_remote_code, actions_per_step, use_processor, processor_overrides, device so agents can discover and use them. - tests/test_tool_spec_dispatch_policy_kwargs.py — 5 regression tests pinning the forwarding for run_policy / eval_policy / start_policy and verifying non-policy actions stay strict. End-to-end validation (MacBook Pro M-series, MPS): - create_world → add_robot(so100) → add_object(red_cube) → add_camera(camera1) → add_camera(camera2) → run_policy(policy_provider='lerobot_local', pretrained_name_or_path='lerobot/smolvla_base', device='mps', observation_mapping={'camera1': 'observation.images.camera1', 'camera2': 'observation.images.camera2', 'joint_position': 'observation.state'}, action_mapping={'action': 'joint_position'}) - SmolVLA downloaded, loaded on MPS, produced actions, sim stepped. 2 control steps / 25.4s wall → proves the full chain works. Quality: - ruff + mypy: clean (77 files) - hatch run test: 5/5 new tests pass; only pre-existing test_path_validation failures remain (noted by author on strands-labs#84).

Move policy execution out of the MuJoCo-specific PolicyRunnerMixin into a backend-agnostic PolicyRunner class at strands_robots/simulation/ policy_runner.py. Isaac, Newton, and any future backend now get run_policy / replay_episode / eval_policy for free by implementing the SimEngine primitives (get_observation, send_action, step, reset, render, list_robots, robot_joint_names). Key changes: * NEW strands_robots/simulation/policy_runner.py (464 LOC) - PolicyRunner class: obs→act→step loop using only public SimEngine API - CooperativeStop: exception hook authors raise to gracefully end a run - Zero imports from simulation.mujoco.* (enforced by test) * SimEngine (base.py) gains two new abstract methods: - list_robots() -> list[str] - robot_joint_names(robot_name) -> list[str] * SimEngine provides run_policy / start_policy / replay_episode / eval_policy as concrete facades delegating to PolicyRunner. They used to be NotImplementedError stubs. * Policy-provider kwargs are now nested under a single policy_config dict instead of leaking as 12+ top-level tool_spec.json properties (observation_mapping, action_mapping, host, port, api_token, pretrained_name_or_path, trust_remote_code, actions_per_step, use_processor, processor_overrides, device, policy_host, policy_port, model_path). The dispatcher is now fully schema-driven — no more **kwargs passthrough. * MuJoCo Simulation: - PolicyRunnerMixin removed from MRO (class deleted) - types.py::SimulationProtocol deleted (was only used by the mixin) - Overrides _make_run_policy_hook for recording + cooperative stop - Overrides start_policy to reuse the ThreadPoolExecutor for async - list_robots now returns list[str] (ABC); the pretty-printed dict shape moved to list_robots_action (matches list_urdfs_action pattern) * Tests: - NEW tests/test_policy_runner_backend_agnostic.py (9 tests) - FakeSim stub proves PolicyRunner only touches public API - Asserts policy_runner module does not import mujoco - Verifies SimEngine facade works end-to-end with FakeSim - Rewrote tests/test_tool_spec_dispatch_policy_kwargs.py to pin the nested policy_config shape and the clean tool_spec.json. - Updated tests/test_simulation_foundation.py for the 2 new abstract methods. - Updated tests/test_mujoco_simulation.py list_robots tests to call both the ABC (list[str]) and action (dict) surfaces. Net: +534 insertions, -629 deletions. 553 tests passing, 14 new tests added, 0 new failures (6 remaining failures are pre-existing on pr-85). This addresses the smell flagged by the **kwargs passthrough fix in commit 646ff02: passing everything was the right *patch* but the wrong *design*. Now every dispatcher param is explicit and the simulation tool schema is honest about its boundary.

MuJoCo's Renderer binds a GL context to the thread that creates it (CGL on macOS, GLX on Linux). Previously, renderers were cached in a plain dict on the Simulation instance — worker threads (policy execution via ThreadPoolExecutor) created renderers, cached them there, and then cleanup() on the main thread called renderer.close() → cgl.free() → SIGSEGV. Fix: replace dict with threading.local(). Each thread gets its own renderer cache; renderers die when their owning thread exits (no cross-thread close). cleanup() drops the TLS reference only (main thread's renderers, if any). MuJoCo's Renderer.__del__ handles the actual GL context release on the correct thread. Before: pytest tests/test_mujoco_simulation.py → Fatal Segfault after TestPolicyExecution::test_start_policy_and_stop After: 419 passed, 1 pre-existing failure (factory test requiring mujoco uninstalled), 6 skipped.

Two tests in TestRendererThreadSafety: - test_renderer_cache_is_thread_local: asserts main and worker threads see distinct renderer instances (the core fix invariant) - test_cleanup_after_policy_thread_no_segfault: start_policy + stop + cleanup must succeed without SIGSEGV (was fatal pre-fix) Pairs with 30c758e (the fix).

…aths MuJoCo's mj_saveLastXML is a global-state function that always emits the *last loaded* model's XML, ignoring its 'model' argument. Any renderer creation (mj.Renderer()) or ancillary model load between our last scene compile and the save call poisons the global pointer. Symptom: after any render/run_policy, remove_object silently logged 'Body X not found in MJCF XML — skipping ejection' and left the body in the scene. Any subsequent inject/eject round-trip operated on a stale/foreign XML. The 'reset via MjModel.from_xml_string(stored_xml)' workaround was already used in inject_robot_into_scene. This commit: 1. Consolidates the workaround into _save_and_patch_xml, the shared helper used by all inject/eject code paths. 2. Updates _reload_scene_from_xml to persist the current scene XML into world._backend_state['xml'] after every reload, so the stored XML always reflects the live model (not a stale pre-injection snapshot). 3. Routes eject_body_from_scene through _save_and_patch_xml so it benefits from the workaround too (previously it called mj_saveLastXML directly). Regression tests in tests/test_mujoco_simulation.py:: TestMjSaveLastXMLGlobalState (+2 tests). Before: 421 passed, after: 423 passed. 1 pre-existing failure unchanged (test_default_backend_missing requires mujoco uninstalled).

Before: Adding two so101s to one scene failed hard with either - XML Error: repeated default class name - XML Error: repeated name 'base' in body - XML Error: repeated actuator name 'shoulder_pan' The PR's injection code blindly appended the robot's <default>, <body>, <actuator>, <sensor> children into the scene. With two same-config robots, every globally unique MJCF name collided → MuJoCo rejects. This blocks the core PR strands-labs#85 use case: sim.add_robot('arm0', data_config='so101') sim.add_robot('arm1', data_config='so101') # boom Fix (scene_ops.inject_robot_into_scene): 1. New _prefix_robot_names(robot_root, prefix) walks the robot XML and prefixes every globally-named element in worldbody/actuator/sensor/equality/tendon/contact/keyframe with '<robot_name>/'. Reference attributes (joint=, body=, site=, actuator=, joint1=/2=, body1=/2=) are rewritten to match. 2. <default> classes and <asset> meshes/materials are deduped by name (not prefixed) — same-config robots legitimately share those. 3. <actuator> and <sensor> children are also deduped by name, for the same reason. SimRobot.namespace (dataclass field that existed but was dead code per AGENTS.md 'No dead code' rule) is now wired up as the source of truth for the prefix. The API layer stays config-level: - robot.joint_names remains short ('shoulder_pan', ...) - get_observation() / get_robot_state() / _apply_sim_action() prefix on lookup, fall back to raw name for back-compat with the single-robot case. - Re-discovery in _reload_scene_from_xml and add_robot uses the same namespaced-then-raw lookup. Regression: the 'all actuators' fallback (used when joint_ids is empty) now only fires when len(world.robots) == 1 — otherwise an empty joint_ids means something is actually wrong for this specific robot, and we shouldn't paper over it by claiming *all* actuators belong to it. Tests (TestMultipleSameConfigRobots, +3): - test_three_same_config_robots: three robots, disjoint joint_ids - test_per_robot_action_isolation: send_action on arm0 doesn't touch arm1's or arm2's ctrl - test_observation_returns_short_keys: obs dict exposes 'shoulder', not 'arm0/shoulder' Validated end-to-end by /tmp/pr85_smoke_so101.py (3×so101 + 3 objects + SmolVLA on MPS) and the original exercise script (30 actions). Test suite: 426 passed (was 423), 1 pre-existing factory failure.

Two follow-ups to the multi-robot namespacing feat: 1. Physics mixin takes raw body/joint/site names from the caller. After namespacing, actual MuJoCo names are '<robot>/<name>' so lookups that worked in a single-robot scene now return -1. Fix: new PhysicsMixin._resolve_mj_name() tries the raw name first, then falls back to every registered robot's namespace. Every 'mj.mj_name2id(model, mjOBJ_{BODY,JOINT,GEOM,SITE}, X)' call inside physics.py now goes through the resolver. 2. Recording mixin enumerates scene cameras to register LeRobot dataset features. Namespaced cameras ('arm0/wrist_cam') leak '/' into LeRobot which rejects with 'Feature names should not contain /'. Fix: replace '/' with '__' for LeRobot feature keys. Tests (+3): TestPhysicsNameResolution + TestRecordingSafeCameraNames. Suite: 429 passed, 1 pre-existing factory failure.

get_observation() conflated two concepts: full robot observation (joints + all attached cameras, used by policies) and single-camera rendering (which render() already owns). The camera_name parameter was never used by PolicyRunner or any policy — only by one test. Lock the ABC contract so Newton/Isaac backends have a clear target: SimEngine.get_observation(robot_name) -> { '<joint_name>': float, # short names (multi-robot safe) '<camera_name>': np.ndarray, # RGB uint8, HxWx3 } For single-camera rendering, use render(camera_name=...). For future batched multi-robot obs, add a new get_observations() method — do not extend this one. Changes: - base.py: drop camera_name from get_observation abstractmethod; document the schema as a formal contract. - mujoco/simulation.py: drop camera_name param on the public method. - mujoco/rendering.py: simplify _get_sim_observation — always render all cameras, remove now-dead cam_name branch. - policy_runner.py: update header docstring reference. - tests: update mock signatures in 3 test files; rewrite the one real call site to use get_observation without camera_name; add two new tests: * test_get_observation_schema_joints_plus_cameras — locks the return type contract (joints=float, cameras=ndarray uint8 HxWx3). * test_get_observation_signature_has_no_camera_name — regression test using inspect.signature on both SimEngine and Simulation. No behavior change for existing callers: PolicyRunner always called get_observation(robot_name=...) without camera_name, and MuJoCo always rendered all cameras by default.

…zzer Two reviewer-flagged issues from PR strands-labs#85 addressed in one commit. 1) SimEngine.run_policy had 5 flat video-recording parameters — too wide a public API for a backend-agnostic base class. The tool_spec.json already didn't expose them individually; Python callers were the only consumers. Consolidated into a single 'video: dict' kwarg backed by a typed VideoConfig dataclass in policy_runner. Before: sim.run_policy(robot_name, ..., record_video='/t.mp4', video_fps=60, video_camera='wrist', video_width=640, video_height=480) After: sim.run_policy(robot_name, ..., video={'path': '/t.mp4', 'fps': 60, 'camera': 'wrist', 'width': 640, 'height': 480}) VideoConfig.from_dict accepts both canonical keys (path/fps/camera/ width/height) and legacy aliases (record_video/video_fps/video_camera/ video_width/video_height/output_path/camera_name) so tool_spec.json's flat output_path/fps/camera_name agent API continues to work. The MuJoCo _dispatch_action now folds flat tool_spec keys into the video dict before invoking run_policy, so agent-level and Python-level callers hit the same code path. Docstrings now call out on_frame as the public extension point for telemetry/recording — backends compose via hook, not subclassing PolicyRunner. 2) Added tests/test_sanitize_name_xml_injection.py — property-based fuzzer that runs 5000 random inputs (5 seeds × 1000 samples) through _sanitize_name and asserts: * Input is either rejected (ValueError) or round-trips unchanged. * Output never contains XML-dangerous chars (<>&"'). Plus explicit payload cases for common XML injection patterns targeting the 5 interpolation sites in mjcf_builder.py and scene_ops.py. No hypothesis dep — brute-force is sufficient for a single regex contract. 3) Added tests/test_video_config.py locking: * VideoConfig dataclass defaults, .enabled, frozen behavior. * VideoConfig.from_dict canonical + legacy key handling. * SimEngine / MuJoCo Simulation / PolicyRunner run_policy signatures do NOT expose flat video params (regression guard). * Dispatcher folds flat output_path/fps/camera_name into video dict when output_path is present, and passes video=None otherwise. * Explicit video dict from caller is not clobbered by flat keys. No behavior change for existing callers — all 5 old flat kwargs still accepted via VideoConfig.from_dict aliases.

Bug 1: TrajectoryStep export broken — added module-level import so `from strands_robots.simulation.policy_runner import *` works. Bug 2: Camera sanitization mismatch drops frames — add_frame now normalizes `/` → `__` in camera keys before reconciling against the declared schema (matching start_recording's sanitization). Bug 3: Thread-safety contract violation — send_action and step now acquire self._lock, serializing ctrl writes + mj_step against PolicyRunner's worker thread. Bug 4: apply_force docstring contradicts semantics — changed to single-shot: zero qfrc_applied before mj_applyFT so forces don't accumulate across frames. Bug 5: Flat-index state copy unsafe on injection — _reload_scene_from_xml now copies per-joint by name using jnt_qposadr/jnt_dofadr, handling layout shifts from body-tree reordering. Bug 6: First-wins robot_base_xml breaks multi-config — robot mesh paths are now converted to absolute during inject_robot_into_scene, eliminating dependency on scene-level meshdir. Bug 7+8: add_object/add_camera raise RuntimeError — changed to return structured error dict + clean up leaked state (consistent with every other action). Bug 9: start_policy drops user kwargs — now forwards control_frequency, action_horizon, and video to run_policy (matching tool_spec.json). Bug 10: dataset_recorder key enforcement — normalize observation.images keys with `/` → `__` before stripping undeclared cameras. Tests: 350 passed, 14 skipped (no MuJoCo). mypy clean. ruff clean.

@yinsong1986

…ype guard Thread-safety (PR strands-labs#85 review by @yinsong1986): - Wrap reset(), set_gravity(), set_timestep() in self._lock (simulation.py) - Wrap save_state(), load_state(), apply_force(), set_joint_positions(), set_joint_velocities() in self._lock (physics.py) - Wrap randomize() model/data mutations in self._lock (randomization.py) - Add TYPE_CHECKING annotations for _lock in mixin classes apply_force docstring (PR strands-labs#85 review): - MuJoCo does NOT reset qfrc_applied in mj_step — force is latched - Updated docstring to 'latched' contract: force persists until next call - Document: call apply_force(body, force=[0,0,0]) to stop scene_ops.py (PR strands-labs#85 review): - Add defensive type guard: skip joint copy if jnt_type changed between old and new model (prevents stride mismatch → silent corruption) - Remove dead elif branch (unreachable: robot_meshdir is falsy in else) Tests: - Add test_mujoco_regressions.py with: - TestFlatIndexStatePreservation: joint survives object injection - TestApplyForceLatchedBehavior: force persists, zero stops it - TestThreadSafety: concurrent step+reset and set_joint+step

…dation T32 — forward_kinematics now accepts optional body_name: * body_name=None: full-world dump (prev behaviour). * body_name='X': single-body position/quat; errors if body absent. Matches tool_spec.json which advertised body_name but the method ignored it (silent drop before T1). T33 — get_features now accepts optional robot_name: * None: global joint/actuator/camera/robots listing (prev behaviour). * 'X': scoped to that robot's namespace (joint/actuator names starting with '{namespace}/'); the robots map is filtered to just that entry. * Unknown robot → standard 'Robot X not found.' error. T35 — register_urdf(urdf_path='X') validates the path before handing it to the registry: non-empty check, existence check, file-not-dir check, and a readability smoke test (open the file). Missing files now produce 'register_urdf: file not found: ...' instead of the registry accepting a bad entry that blows up later. T42 — register_urdf no-args is already handled by the T1 router as 'Action register_urdf requires parameter data_config.' No code change needed; covered by test. New tests: TestFeatureFilters (4), TestRegisterUrdfValidation (3). 342 -> 349 passing.

T27 — render_all flags near-uniform camera frames (variance < 1) so the LLM can tell which cameras captured nothing useful. render() now emits a 'pixel_variance' / 'pixel_mean' stats block alongside each image so render_all can annotate without decoding PNGs twice. T28 — set_geom_properties accepts the bare object name as an alias for '{object_name}_geom' (what add_object actually injects into the MJCF). No more 'Geom not found' when the caller uses the natural object name. T29 — add_object(shape='plane') auto-sets is_static=True. Explicit is_static=False on a plane now errors cleanly (planes are infinite in MuJoCo and can't be dynamic). Default changed from False to None so the plane path can distinguish 'not passed' from 'passed False' without breaking non-plane defaults. T30/T41 — add_camera(name=existing) now errors with 'camera X already exists. Remove it first.' instead of silently overwriting the registry entry while leaving the XML unchanged (the old behaviour caused the first camera to keep rendering even after a re-add). T34 — eval_policy requires an explicit robot_name (was silently picking the first robot — surprising in multi-robot scenes) and n_episodes default lowered from 10 to 1 per DoD. New tests: * TestDuplicateCameraName, TestPlaneAutoStatic, TestSetGeomPropertiesAlias, TestEvalPolicyDefaults * Updated test_plane_object_rejected_as_dynamic_body → two new tests covering auto-static success + explicit-dynamic rejection. 349 -> 356 passing.

T31 — get_recording_status returns status='success' in every lifecycle state (no world / not recording / recording) with a distinguishing message so callers can poll it unconditionally. Previously the no-world branch went through _require_world() and returned error, forcing callers to try/except. T17 — Audit of stderr pollution: no remaining print() calls in strands_robots/simulation/; model_registry and physics already use logger.warning / logger.info. No code change needed; T17 is effectively complete. Tracking via TASKS.md. T37 — Regression test for list_robots policy-status reporting (was already working, pinning it so we don't break it). 356 -> 359 passing.

T22 — add_robot: the undocumented 'name'-as-registry fallback (resolve the SimRobot instance name as a model_registry key when no urdf_path or data_config is passed) now fires a DeprecationWarning telling the caller to use data_config='<key>' instead. Kept for one release to avoid breaking existing callers; will be removed next major. T23 — get_robot_state canonical parameter is robot_name; bidirectional name/robot_name router alias (since T1) keeps legacy calls working. Docstring updated to call out the canonical name. Also folded the 'No simulation running.' error into _require_world. T25 — run_policy and start_policy accept optional n_steps (primary) or max_steps (legacy) as alternatives to the duration/control_frequency pair. duration = n_steps / control_frequency when n_steps is set. The router now exposes both names so LLM callers can say 'n_steps=500' instead of computing 'duration=10.0, control_frequency=50.0'. Validates n_steps > 0 and control_frequency > 0 before doing the division. New tests: TestPolicyHorizonUnification, TestAddRobotDeprecation. 359 -> 362 passing.

…eanup Cleanup pass after all T1-T45 fixes shipped: * Stripped 'T<N>:' / 'T<N>/T<M>:' prefixes from ~60 inline comments and ~20 docstrings across the simulation mixins. The explanation text is preserved; only the issue-tracker tag is gone. Commit messages remain the audit trail. * Inlined the 'No world.' check at every call site (26 in mixins + 14 in simulation.py) instead of going through the _require_world() helper. Semantically identical — same error text, same return shape — but mypy can now narrow 'self._world is None' across the if-branch, which the walrus-assigned helper pattern couldn't do. The Simulation-level _require_world method stays for external callers but is no longer used internally. TYPE_CHECKING stubs for _require_no_running_policy added to the mixins that need it. * Fixed two pre-existing lint F841 'unused variable' errors (drive-by): - physics.py:set_joint_velocities — 'ignored' is now actually populated and reported in the response, matching set_joint_positions behaviour (parity win). - rendering.py:_list_camera_names — removed dead 'nm = _mj.mj_name2id # silence unused' stub that was never used. * Fixed three net-new mypy errors from this PR's code: - physics.py:multi_raycast — results list typed as list[dict[str, Any]] so mixed None / float / int values type-check cleanly. - rendering.py:render_depth T21 warning capture — guarded _sys.__stderr__ against being None (Python's docs allow it). Result: ruff clean, mypy clean (102 source files, zero errors), 362 tests pass.

…ove contact naming T40 — randomize() docstring now spells out flag semantics (opt-in per axis), defaults, destructive nature, and every argument. Previous one-liner left callers guessing whether 'no flags' meant 'randomize everything' or 'randomize nothing' (it's the latter). T47 — add_robot docstring: bodies and user-added objects share the MuJoCo name table. 'name' is the robot instance namespace and MUST NOT collide with existing object / body names. Prevents the cryptic 'duplicate name' MuJoCo compile errors. T48 — add_camera docstring: objects get MJCF geoms named '{name}_geom' so cameras only collide with other cameras and body names. Duplicate camera names are rejected upfront (see T30). T49 — add_robot docstring lists the three resolution paths: 1. urdf_path (explicit) -> 2. data_config (registry) -> 3. name (deprecated). Deprecation warning fires on the name-fallback path (T22). T50 — get_contacts now resolves unnamed geoms to their parent body name + geom id ('robot_name/geom_30'), giving the LLM a meaningful handle even when MJCF doesn't carry per-geom names. T51 — randomize() with randomize_physics=True now reports per-body mass scales and per-geom friction scales in the response text (previously only the range endpoints; now you can audit what was actually applied). Seedable so reproducible. Tests stay green (362 passing).

…abs#85 D1 — CHANGELOG.md (new file, 162 lines) enumerates every behavioural change users will notice in this PR: * Breaking: router validation, camera orientation, raycast guards, negative-value validation, plane auto-static, stop_policy needs robot_name, eval_policy defaults, register_urdf validation. * Recording backend split (start_recording vs start_cameras_recording). * Resource hygiene (renderer TLS cleanup, mj_forward before reads). * Concurrency guards (list of 10 action names). * Error message consistency (unified 'No world', '<Kind> X not found.', idempotent stop family). * Deprecation: add_robot name-as-registry fallback. * New / extended actions (10+ items). * Test deltas: 256 -> 362 passing. D4 — README.md: added a 'Simulation (MuJoCo)' section before Contributing. * Install instructions with and without [lerobot]. * Quick-start snippet. * All 58 actions grouped by concern. * Common footguns (6 callouts the finding-report flagged). * Self-healing features summary. * Pointer to test_agenttool_contract.py for the full contract. Lint + format clean, 362 tests still pass.

… path The hardcoded /Users/cagatay/robots/... path broke CI. Use strands_robots.simulation.mujoco.__file__ to locate tool_spec.json relative to the installed package, so the test works regardless of checkout location.

Missing blank line after import (E303). Fixes CI lint failure.

Found by agentic E2E test (strands.Agent + Simulation tool) asked to build a 3-robot scene with 2×so100 + 1×humanoid. The third robot consistently failed across h1/unitree_h1/op3/cassie/panda with MuJoCo's XML Error: repeated default class name Root cause ---------- Every MuJoCo Menagerie model declares a nested <default> tree with class names like 'visual', 'collision', 'foot', etc. MuJoCo flattens all <default class='X'> declarations into a single GLOBAL namespace at compile time, regardless of nesting depth. So so100's <default class='so_arm100'><default class='visual'/></default> and h1's <default class='h1'><default class='visual'/></default> collide on 'visual' even though they live under different parents in the source XML. The existing dedupe only worked when two robots shared the exact same data_config (second so100 reused the first's classes — fine). Different configs blew up. Fix --- - New helpers _collect_existing_class_names() and _namespace_robot_default_classes() in scene_ops. - Track which data_configs have been merged in world._backend_state['merged_configs']. - First time a new data_config arrives: rename every class in its <default> subtree to '{data_config}__{classname}' and rewrite every class=/childclass= reference in the robot's worldbody/actuator/sensor sections. - Subsequent robots with the same data_config: skip the <default> merge entirely and rewrite their class refs to point at the already- namespaced classes. Verified with 4-robot scene (2×so100 + 1×h1 + 1×panda = 40 joints) all stepping cleanly at 500 Hz with rendered overhead camera. Added 2 regression tests under TestMixedDataConfigRobots in test_agenttool_contract.py (362 -> 364 passing).

…piled MJCF Found by agentic E2E test while chaining add/remove cycles. Reproduction: sim.add_robot('alice', data_config='so100') sim.remove_robot('alice') # status='success' but... sim.add_robot('alice', data_config='so100') # FAILS with: # XML Error: repeated name 'alice/Base' in body Root cause ---------- The old remove_robot just did `del self._world.robots[name]`. It never touched the MJCF, so alice's bodies / joints / actuators / sensors stayed in the compiled model. Users saw: - njnt / nbody unchanged after remove_robot (stale DOFs consumed physics time per step), - re-adding the same name failed because MuJoCo rejects duplicate names on compile, - list_robots reported an empty list while get_features still reported the robot's joints. Fix --- New scene_ops.eject_robot_from_scene() mirrors the scope of _prefix_robot_names (the inject-time namespacer): strips every element whose name starts with '{robot_name}/' from: - worldbody (the robot's root body + children), - actuator, sensor, equality, tendon, - all keyframes (they reference the full qpos vector and are stale). Default classes + assets stay in place so a subsequent same-data_config robot can still reference them. Scene is recompiled via _reload_scene_from_xml. Simulation.remove_robot now calls eject_robot_from_scene before popping the Python-side registry entry, and returns an error if the ejection fails (so callers don't see a success while the model is still broken). New tests (TestRemoveRobotActuallyRemoves): * test_remove_robot_empties_model — njnt goes to 0 after remove. * test_readd_same_name_after_remove — the original repro. * test_remove_middle_of_three_robots — multi-robot scene, remove middle robot, verify the other two's joint counts survive. 364 -> 367 passing, ruff + mypy clean.

factory.py tests (new test_factory.py, 19 tests): - backend registration: duplicate conflicts against built-in, built-in alias, runtime alias, and self; force=True override; alias collisions. - list_backends: sorted + deduped; includes runtime-registered names. - create_simulation: default, by alias, unknown-backend error message lists available backends, kwargs forwarded, runtime alias priority over built-in. policy_runner.py tests (new test_policy_runner_behaviour.py, 14 tests): - run() with real MuJoCo + MockPolicy, on_frame hook gets called with monotonic step indices. - evaluate() default no-success-fn path and unknown-success-fn error. - _maybe_sim_time() — fast path from sim._world.sim_time, fallback to get_state(), None on broken sim. - _require_default_robot() — empty list raises, returns first robot. - replay() ImportError handling when lerobot is not installed. - _resolve_coroutine() — passthrough for plain list, awaits coroutine. - VideoConfig.enabled with/without path. Drive-by fix: _maybe_sim_time previously tried to read 'sim_time' from the top-level status dict, which MuJoCo backend never populated — the helper silently returned None for every backend. Now reads from sim._world.sim_time directly, with the legacy get_state() fallback kept for backends that only expose the status-dict shape. Test count: 367 → 467 passing (entire tests/simulation/ tree). Coverage (simulation-only, excluding tools/): - simulation/factory.py 26% → 97% (+71pp) - simulation/policy_runner.py 52% → 91% (+39pp) - simulation/__init__.py 94% → 100% - simulation/mujoco/__init__ 29% → 100% Overall sim-module coverage: 51% → 54%. Lint + mypy clean.

backend.py tests (test_backend.py, 13 tests): - _is_headless() across Linux/darwin, DISPLAY, WAYLAND_DISPLAY. - _configure_gl_backend(): respects MUJOCO_GL, no-op on non-Linux, picks EGL when libEGL.so.1 loads, falls back to OSMesa, warns when neither is available. - _can_render() caches its probe result; short-circuits to False on headless+no-GL to avoid GLFW's fatal SIGABRT. - _ensure_mujoco() smoke-test + caching check. MJCF builder tests (test_mjcf_builder_units.py, 31 tests): - _sanitize_name() valid chars, invalid chars, length bound. - _camera_xyaxes_from_target(): unit-length / orthogonal axes for top-down and side views; None on degenerate input; up-parallel fallback. - MJCFBuilder._object_xml() per-shape branches (box/sphere/cylinder/ capsule/mesh with+without path/plane), static vs dynamic freejoint, name sanitization. Coverage bumps (tests/simulation/ tree): - simulation/mujoco/backend.py 58% → 93% (+35pp) - simulation/mujoco/__init__.py 29% → 100% - simulation/factory.py 26% → 97% Test count: 467 → 511 passing. Lint + mypy clean.

…-labs#113) format_robot_table() now accepts max_width (default 100) and truncates the Description column with an ellipsis when rows would overflow. Width constants are named (_NAME_WIDTH, _CAT_WIDTH, ...) so tests can assert against them without hardcoding digits. Drive-by fix while auditing the category list: 'aerial' category robots (crazyflie, skydio_x2) were silently missing from the rendered table even though they're fully registered. Added 'aerial' to the category iteration order so every registered robot appears in list_urdfs output. New test module tests/registry/test_format_robot_table.py (8 tests): - default-width output stays under 101 chars - narrow (80-col) widths trigger description truncation (ellipsis) - wide widths (1000) disable truncation entirely - table contains every registered robot (row count matches registry) - header + total row always present Closes GitHub issue strands-labs#113. Test count: 511 → 519 passing.

…olicies (strands-labs#112) cProfile of the T26 target scenario (500 steps @ 500Hz with mock policy) revealed ~93% of wall time spent inside mjr_render + mjr_readPixels in _get_sim_observation. The MockPolicy never consumes image obs — the renders were dead work on every step. Fix --- * Policy base: new ``requires_images: bool`` property (default True). Policies that only consume joint state (MockPolicy, pure-IK, scripted trajectories) override to False. * MockPolicy.requires_images = False. * SimEngine.get_observation(): new keyword-only ``skip_images`` param (default False for backwards compat). * PolicyRunner.run() + .evaluate(): read policy.requires_images once at entry and propagate as skip_images= to every get_observation call. * MuJoCo _get_sim_observation(skip_images=True): early-return after the joint-state dict, before the camera render loop. * Safety override: MuJoCo Simulation.get_observation() force-disables skip_images when a dataset recorder is attached (recording needs every frame regardless of the policy's preference). Perf (500 steps @ 500Hz, fast_mode=True, so100 robot): before: 0.38s (T26 target: <2s ✓ but all time in rendering) after: 0.02s (19x speedup) Also lifts the real-time (fast_mode=False) case: 500 steps @ 500Hz real-time: 2.40s → 1.50s Closes strands-labs#112. New tests (TestT26PerfBudget): * test_mock_policy_500_steps_under_budget — pins the 2s ceiling. * test_requires_images_propagates_to_observation — pins the skip_images plumbing via a spy on get_observation. Updated test scaffolding: * tests/simulation/test_policy_runner.py FakeSim.get_observation sig now matches the new abstract signature. * tests/simulation/test_foundation.py FakeSim.get_observation sig. Test count: 511 → 513 passing. Lint + mypy clean.

tool_spec is a @Property on the LLM hot path — every strands agent invocation reads it. Previously it re-opened and re-parsed the 357-line tool_spec.json on every access. Moves the json.load into module scope as _TOOL_SPEC_SCHEMA, and has the property return a reference to the cached dict. No functional change — the schema is still built from the same file at import time. Adds identity-check regression tests in test_tool_spec.py to ensure future edits don't accidentally reintroduce the per-access reload. Identified via second-opinion review of PR strands-labs#85.

Adds a repo-sweep test that greps strands_robots/ + tests/ + tests_integ/ for /Users/<name>/, /home/<name>/, and C:\\Users\\<name>\\ patterns. Rationale: PR strands-labs#85 shipped a hardcoded /Users/cagatay/robots/... path in test_agenttool_contract.py that passed locally, got committed, and was caught by CI only because CI didn't live at that exact path. A cheap regex check in the test suite prevents a repeat without needing a pre-commit framework. Narrow allowlist for: - this test itself (it defines the patterns) - _path_validation.py and its tests (those contain Windows system path prefixes like C:\\Windows\\, which are unrelated to user profile paths) Identified via second-opinion review of PR strands-labs#85.

Fixes GH strands-labs#115. load_scene previously did not populate _backend_state bookkeeping, so subsequent add_object / add_camera / remove_object calls either: * recompiled the world via MJCFBuilder.build_objects_only (silently discarding every body from the loaded scene), or * hit the XML round-trip path but fell through to mj_saveLastXML global state and emitted the wrong (robot, not scene) XML. Changes in load_scene: * Cache the on-disk scene XML in _backend_state['xml']. * Set _backend_state['scene_loaded'] = True as a marker. * Record _backend_state['scene_base_dir'] for mesh path resolution during injection round-trips. Changes in add_object / add_camera / remove_object: * Gate the XML-round-trip branch on 'robots OR scene_loaded' instead of 'robots only'. Previously the no-robots branch called _recompile_world() which rebuilds via MJCFBuilder.build_objects_only and would wipe a loaded scene's bodies and meshes. Changes in scene_ops._get_robot_base_dir: * Fall back to _backend_state['scene_base_dir'] when no robot_base_xml is registered, so mesh refs in a round-tripped scene XML still resolve under tmpdir. New test file tests/simulation/mujoco/test_load_scene_interaction.py (9 tests) covers: * _backend_state population contract (3 tests) * add_object / add_camera / remove_object preserve scene bodies * The full load_scene -> add_robot -> add_object chain * create_world does NOT set scene_loaded (regression guard) Verified that all 8 behavioural tests fail on pre-fix code and pass after the fix. The single trivially-passing test (create_world_does_not_set_scene_loaded) is a guard against accidentally setting the flag in the non-load_scene path. Full suite: 524 passed, 1 skipped (was 515; +9 new). Lint clean.

…#114) Fixes GH strands-labs#114. Previously, despite _policy_threads being keyed by robot_name (implying concurrency), the _require_no_running_policy helper blocked *any* live Future on every scene mutation AND on every start_policy call. Two VLA arms could not actually run policies in the same scene. The mixin's docstring even said 'per robot' while semantics were serial. ## Semantics changes start_policy(X): * Was: global check — rejected if ANY Future was not done. * Now: per-robot check — only rejected if X's own Future is live. Policies on different robots can coexist. remove_robot(X): * Was: errored when X had a live policy; required two-step stop_policy(X) + remove_robot(X). * Now: gracefully stops X's own policy (as before), then runs the XML-round-trip ejection. Still errors if a DIFFERENT robot has a live policy, because that robot's PolicyRunner holds cached actuator/joint IDs that the recompile invalidates. Scene mutations (add_robot, add_object, add_camera, remove_object, remove_camera, load_scene, reset, set_gravity, set_timestep, randomize, apply_force, set_body_properties, set_geom_properties, set_joint_*, load_state, move_object): still block on ANY live policy. Unchanged. The error message now NAMES the active-policy robots so the LLM can stop_policy on each without guessing: Cannot 'set_gravity' while a policy is running on 'armA', 'armB'. Stop it first: action='stop_policy'. ## New helpers * _prune_done_futures() — drops completed Futures from _policy_threads (GH strands-labs#120 companion fix). Previously the dict grew unboundedly and list_policies_running would leak historical names as 'running'. * _active_policy_robots() — returns names with LIVE policies. Prunes stale entries as a side-effect so the returned list is authoritative. * _require_no_running_policy(action_name, robot_name=None) — new keyword arg scopes the check to one robot. robot_name=None is the existing global-scope behaviour. ## New action list_policies_running — returns the names of robots with live policies. Idempotent, always succeeds, prunes stale entries. Added to tool_spec.json enum and to the tool_spec description so the LLM discovers it. ## Why this is safe MuJoCo's mj_step and ctrl[] writes are still serialized via self._lock, which is the single point that makes concurrent multi-robot policies safe: * Two policies on different robots run in parallel at the inference level (observation build, action compute — no shared state). * When either calls send_action, it serializes briefly on self._lock to write its own ctrl[] slots and advance physics. * mj_step advances the WHOLE scene — so two robots sharing a world share one physics clock. That's correct: one tick of physical time advances all bodies. * Each robot writes to a DISJOINT slice of data.ctrl[], indexed by actuator IDs specific to that robot's namespaced actuators (set up by inject_robot_into_scene via _prefix_robot_names). No ctrl[] aliasing. Documented inline on __init__ and in start_policy's docstring. ## Tests tests/simulation/mujoco/test_concurrency.py — adds TestConcurrentPerRobotPolicies class with 6 new tests: * test_start_policy_allowed_on_second_robot_while_first_runs * test_start_policy_still_rejected_on_SAME_robot * test_list_policies_running_reports_active * test_completed_futures_are_pruned (GH strands-labs#120 companion) * test_scene_mutation_lists_which_robots_are_running * test_two_policies_no_segfault_under_stress — actually runs two policies to completion and asserts both produced policy_steps > 0 Updated the existing test_remove_robot_blocked_during_policy (which encoded the old 'error when same-robot policy active' semantics) into two tests that reflect the new semantics: * test_remove_robot_stops_own_policy_and_succeeds * test_remove_robot_blocked_by_OTHER_robot_policy Verified: the new tests fail on pre-fix simulation.py (stashed to confirm), pass on post-fix code. ## Numbers * 525 -> 531 passed (+6 new) in tests/simulation/ * hatch run lint: clean (no new errors) * hatch run format: clean * CHANGELOG.md updated with both the concurrency change and the new list_policies_running action. Closes strands-labs#114. Companion fix for strands-labs#120 (stale Future pruning).

…s-labs#117) Fixes GH strands-labs#117. PolicyRunner.run previously caught ALL on_frame exceptions (other than CooperativeStop) at WARN level and kept iterating. Failure mode: a recording hook with a typo'd observation key would raise on every step, produce one log line per step for 500 steps, and complete 'successfully' with zero frames written. The resulting dataset is silently empty. Fix: count *consecutive* on_frame failures. After N in a row (default 5, overridable via new kwarg max_onframe_failures), raise RuntimeError so run() returns status=error with a clear message. A single transient failure still logs at WARN and keeps going — the counter resets on the next successful call. Plumbed the new kwarg through: * PolicyRunner.run (core) * SimEngine.run_policy (base) * Simulation.run_policy (MuJoCo override) Tests: 4 new in TestOnFrameFailureCounter class: * test_single_onframe_failure_is_tolerated * test_consecutive_onframe_failures_abort_episode * test_consecutive_counter_resets_on_success * test_default_threshold_is_5 All 535 tests pass (was 531; +4 new). Lint clean.

…strands-labs#116) Fixes GH strands-labs#116. Previously cleanup() called executor.shutdown(wait=False) right after setting self._world = None, which opened a race window where a policy worker still inside mj_step(world._model, world._data) would segfault on freed arrays. The 'policy_running = False' flag was set but never awaited. New cleanup order: 1. Signal every live policy to stop (policy_running = False). 2. Await each outstanding Future with a bounded timeout. The on_frame hook sees the flag at the top of its next call and raises CooperativeStop, which short-circuits run_policy. 3. Workers that don't stop within the timeout get logged as a warning and abandoned — cleanup proceeds rather than hanging the host process on exit. 4. Only AFTER workers have unwound do we null self._world and tear down renderers / viewer / executor. New kwarg: cleanup(policy_stop_timeout=...) for tests and edge cases. Defaults to 5.0s via a module-level _DEFAULT_POLICY_STOP_TIMEOUT constant. None (default) uses the constant. Tests: 4 new in TestCleanupGracefulShutdown: * test_cleanup_awaits_running_policy — verifies Future.done() by the time cleanup returns * test_cleanup_tolerates_wedged_policy — proves cleanup returns in bounded time even with an aggressively-short 1ms timeout * test_cleanup_is_idempotent_with_no_policies — no-op when there are no live Futures * test_cleanup_drains_multiple_concurrent_policies — pairs with GH strands-labs#114 concurrent-policy support; both robots' futures awaited All 539 tests pass (was 535; +4 new). Lint clean.

Closes GH strands-labs#119. The mutation guard (_require_no_running_policy) is the load-bearing safety mechanism that stops the LLM from scheduling a scene mutation while a policy worker is mid-step. A race between the guard and the worker's mj_step is a SIGSEGV on stale pointers. We had a partial stress test in strands-labs#114's commit (two policies run to completion), but no test that proved: * 1000 concurrent main-thread mutation attempts don't starve the worker * rapid start/stop/start/stop cycles leave _policy_threads clean * the first mutation after a policy completes succeeds (no lingering guard state) * two concurrent policies + 500 main-thread mutations don't deadlock on self._lock New TestMutationGuardStress class covers all four: * test_1000_set_gravity_calls_during_policy_never_segfault * test_rapid_start_stop_start_stop_policy * test_mutation_accepted_immediately_after_policy_completes * test_concurrent_policies_stress_no_deadlock (pairs with GH strands-labs#114) Each test asserts well-formed dict responses (no crashes), specific status invariants, and the uniform 'policy is running' error shape when blocked. All 543 tests pass (was 539; +4 new). Lint clean.

Per user request during autonomous review cycle. The em-dash (—) and horizontal ellipsis (…) unicode characters sneak in when docstrings get authored in text editors with smart-quote autocorrect. They look fine in rendered markdown but are noisy in code and diffs, don't copy-paste cleanly into terminals, and break grep with non-unicode patterns. Bulk replacements: * 424 em-dashes ('—' U+2014) -> ' - ' (with normalized spacing) or '-' (at line start, mostly bullet points) * 8 horizontal-ellipsis ('…' U+2026) -> '...' (three ASCII dots) Also fixed one arithmetic bug surfaced by the ellipsis replacement: * strands_robots/registry/robots.py: description-truncation previously subtracted 1 char (for the 1-char ellipsis) and appended 3 chars (for '...'), overflowing the table column by 2 chars. Now subtracts 3. Files touched: * 39 in strands_robots/ * 30 in tests/ * 5 in tests_integ/ * CHANGELOG.md, README.md, AGENTS.md * 82 files total No semantic changes. All 1248 tests pass (was 1236, 5 pre-existing test_path_validation failures unrelated).

…trands-labs#118) Partial address of GH strands-labs#118. The review correctly flagged that the 4-way mixin split (PhysicsMixin + RenderingMixin + RecordingMixin + RandomizationMixin) pretends to describe a decoupling when it really just describes *where lines live*. Every mixin reaches back into Simulation for self._world / self._lock / self._mj / _policy_threads / _renderer_tls, plus the cross-cutting _require_no_running_policy / _require_world / _prune_done_futures helpers. Rather than pretend otherwise, this commit makes the coupling documentary and explicit: 1. simulation.py module-level docstring replaced with a full 'Architecture notes (honest version)' block that enumerates every piece of shared state and every cross-cutting helper the mixins rely on. Cross-refs GH strands-labs#118 and commit f5c8518 (which established that the alternative -- _SimulationState extraction -- breaks mypy narrowing across the helper boundary). 2. Every mixin's class docstring rewritten to name the specific state it touches and the specific helpers it calls. Short, precise, greppable. 3. TYPE_CHECKING stubs in each mixin updated to reflect the NEW per-robot _require_no_running_policy signature (from strands-labs#114) and to add _require_world which previously was missing despite being used. Now when we edit the real helpers in simulation.py, mypy can check the mixin call sites against the intended shape. 4. Class body order normalized: docstring first, THEN TYPE_CHECKING block. Previously PhysicsMixin and RandomizationMixin had the stub block *before* the class docstring, which hid the real documentation. No runtime behavior change. Lint clean. 543 tests pass. This leaves the bigger structural question (actually extract _SimulationState, or merge mixins back into one file) open. That's tracked on strands-labs#118 -- it's an L/XL refactor and needs its own PR. For THIS PR, the goal was to stop the split from being *dishonest*.

Cosmetic/quality sweep surfaced a dead return value: _ensure_meshes returned an error dict on auto-download failure, but the caller at add_robot (line 494) discarded the return value. Result: the agent got a cryptic 'mesh not found' from MuJoCo later instead of the clear 'Auto-download failed for X: Y. Install robot_descriptions:...' that _ensure_meshes constructs. Changes: * _ensure_meshes typed as -> dict[str, Any] | None explicitly * Explicit return None on all success paths (previously the function fell off the end in places, which implicitly returned None but was not self-documenting) * Caller in add_robot now checks the return value; propagates any error dict and pops the partially-registered robot out of self._world.robots before bubbling up No test change -- the existing happy-path tests still pass, and the error path requires network-blocked CI to test cleanly (left as integration territory). Lint and all 543 tests pass.

…abs#117 (on_frame abort)

Start the string-concat -> MjSpec refactor documented in IDEA.md. The current 'builder' (mjcf_builder.MJCFBuilder) hand-writes MJCF as f-string concatenation and needs ~600 lines of scaffolding (sanitize, xyaxes math, ElementTree round-trips) to do what mujoco.MjSpec (shipped in mujoco 3.2+) does natively. This PR lands Stages 0-2: * Stage 0: bump mujoco floor to 3.2 (env already at 3.8), track IDEA.md in repo so the plan is visible to agents. * Stage 1: add spec_builder.py alongside mjcf_builder.py, behind a STRANDS_SIM_USE_MJSPEC env flag. Both code paths tested in CI. * Stage 2: replace camera xyaxes math with mujoco.mju_mat2Quat via the new _target_quat helper. MjSpec cameras use quat= instead of xyaxes=; compiled cam_mat0 is numerically identical within 4e-7. ## Changes ### New: strands_robots/simulation/mujoco/spec_builder.py (+260 LOC) Exports: * SpecBuilder.build(world) -> mujoco.MjSpec * _geom_type(shape) -> mjtGeom enum (drop-in for the shape enum lookup; also adds 'ellipsoid' which the legacy builder rejects) * _normalize_size(shape, size) -> per-type size list * _target_quat(pos, target) -> look-at quaternion via mju_mat2Quat Does NOT touch scene_ops (Stages 5-6), robots (Stage 3-4), or remove the legacy builder (Stage 7). Pure additive. ### Modified: simulation.py _compile_world gets a feature flag (STRANDS_SIM_USE_MJSPEC). When on: 1. Build MjSpec via SpecBuilder.build() 2. Stash the spec in _backend_state['spec'] 3. Compile to MjModel 4. Export _backend_state['xml'] via spec.to_xml() for legacy readers New helper Simulation._use_mjspec() centralises env-var parsing. ### Modified: test_input_validation.py Two tests were asserting on raw 'xyaxes="..."' XML strings, which aren't emitted under the SpecBuilder path (quat= instead). Rewrote them to assert on cam_mat0 (the compiled rotation matrix) which is representation-agnostic and passes under BOTH code paths: * test_xyaxes_emitted_in_xml -> test_camera_orientation_written * test_different_targets_produce_different_xyaxes -> test_different_targets_produce_different_orientations ### New: test_spec_builder.py (+170 LOC, 19 tests) Locks the SpecBuilder contract: * Module-level helpers (_geom_type, _normalize_size, _target_quat) with unit coverage including the new ellipsoid support. * Parity class: builds the same SimWorld via both paths and asserts nbody/ngeom/ncam/nu/njnt/nq/nv match exactly, plus body_pos, body_mass, cam_mat0. * Bonus: test_ellipsoid_compiles_via_spec_builder - proves the new shape that the legacy builder rejects. ### Modified: pyproject.toml mujoco>=3.0.0 -> mujoco>=3.2.0 (MjSpec API). Current hatch env is at 3.8.0 already; this is just a floor bump. ### Added: IDEA.md Full staged refactor plan at repo root (from user). Tracks what's done and what's left. ## Safety All existing tests pass under BOTH code paths: * Default (legacy): 562 passed, 1 skipped * STRANDS_SIM_USE_MJSPEC=1: 562 passed, 1 skipped Including the tests-that-were-string-coupled. Flag default is OFF so no existing consumer sees any behaviour change. ## Follow-ups (tracked in subsequent issues on PVT_kwDOD151Fs4BSRJP) * Stage 3: single-robot attach via spec.attach(robot_spec, ...) * Stage 4: multi-robot compose via repeated spec.attach() * Stage 5: port scene_ops inject_*/eject_* to spec.recompile(model, data) * Stage 6: replace_scene_mjcf / patch_scene_mjcf tool-facing entry points * Stage 7: remove feature flag, delete mjcf_builder.py

Complete the string-concat -> MjSpec migration started in ad1d298. mjcf_builder.py is deleted, scene_ops.py collapses from ~980 lines to ~295 lines of direct MjSpec AST manipulation, and agents get a new replace_scene_mjcf escape hatch for raw MJCF. ## What landed ### Stage 3-4: single + multi robot via spec.attach() - SpecBuilder.attach_robot() composes URDF/MJCF robots via mujoco.MjSpec.from_file(...) + scene_spec.attach(robot_spec, prefix=..., frame=...). No more hand-rolled name prefixing (_prefix_robot_names, ~120 lines) or default-class namespacing (_namespace_robot_default_classes, ~60 lines) - MjSpec does it. - Asset deduplication (meshes/textures/materials) is free via attach(). ### Stage 5: live inject/eject via spec.recompile(model, data) - scene_ops.inject_object_into_scene: one call - SpecBuilder.add_object(spec, obj) + spec.recompile(model, data). - scene_ops.eject_body_from_scene: spec.body(name).delete() + recompile. - scene_ops.inject_camera_into_scene, inject_robot_into_scene same shape. - spec.recompile() preserves qpos/qvel for unchanged joints automatically, no manual state-copy loop needed. - Gone: _patch_xml_paths, _rewrite_mesh_paths, _get_abs_meshdir, _save_and_patch_xml, the whole tmpdir+mj_saveLastXML dance. ### Stage 6: agent-authored raw MJCF - scene_ops.replace_scene_mjcf(world, xml): validates by actually calling spec.compile() - returns the MuJoCo compiler error verbatim on failure, no process abort. - Exposed as new tool action replace_scene_mjcf in tool_spec.json. - Simulation.replace_scene_mjcf() guards against policy-running races the same way load_scene / add_robot do. ### Stage 7: cleanup - Deleted strands_robots/simulation/mujoco/mjcf_builder.py (273 lines). - Deleted tests/simulation/mujoco/test_mjcf_builder_units.py (190 lines). - Deleted tests/simulation/mujoco/test_mjcf_xml_injection.py (124 lines) - XML injection fuzzing is no longer applicable; MjSpec validates names itself. - scene_ops.py: 980 -> 307 lines (meets the <500-line success criterion in IDEA.md). - test_spec_builder.py rewritten to assert on spec structure + compiled MjModel properties, never on exact XML strings. - New tests/simulation/mujoco/test_replace_scene_mjcf.py covers happy path (incl. <tendon> that SimObject can't express), malformed XML, semantically invalid MJCF, and the policy-running guard. ## Known constraint eject_robot_from_scene rebuilds the scene from scratch (drops qpos/qvel for in-place joints) rather than calling spec.delete() on the attached robot body. This works around a MuJoCo 3.8 double-free: calling spec.delete() on a body produced by spec.attach() and then letting the interpreter shut down crashes with a segfault in the spec destructor. Documented inline. Not a regression - legacy code path also had to rebuild for remove_robot. ## Verification - hatch run test tests/simulation/mujoco/: 415 passed, 1 skipped. - hatch run lint: ruff + mypy clean across 108 source files. - grep -r "f'<" strands_robots/simulation/mujoco/: no matches (IDEA.md success criterion). - grep -r "MJCFBuilder\|mjcf_builder" strands_robots/: only historical doc comments in spec_builder.py docstring and simulation/__init__.py package tree diagram. The 5 pre-existing failures in tests/tools/test_path_validation.py are unrelated to this refactor - they fail on base commit too. ## IDEA.md checklist - [x] mjcf_builder.py deleted - [x] scene_ops.py under 500 lines (307) - [x] All existing unit + integration tests pass - [x] Integration test proves agent can author MJCF with <tendon> (unexpressible via SimObject) - [x] No test asserts on exact XML strings - [x] grep -r "f'<" strands_robots/simulation/mujoco/ empty Closes stages 3-7 of IDEA.md. Refs strands-labs#121-strands-labs#126.

cagataycali · 2026-05-05T06:26:27Z

Superseded: both commits (stages 0-2 and 3-7) are now stacked into PR #85 on feat/mujoco-backend. Closing to avoid duplication.

cagataycali and others added 30 commits May 3, 2026 08:38

ci: add MuJoCo system deps (libosmesa6-dev + MUJOCO_GL=osmesa)

a6f12bc

feat: add [sim] extra with mujoco dependency

166bea8

Adds mujoco>=3.0.0,<4.0.0 to the [sim] optional-dependencies group, and includes it in [all] so CI installs it via 'uv sync --extra all --extra dev'.

fix: rename [sim] extra to [sim-mujoco] per review

4825e44

Address yinsong1986's feedback to namespace the optional dependency group as [sim-mujoco] for clarity when additional sim backends are added.

fix: rebase on simulation-foundation — SimulationBackend→SimEngine, u…

08eed8c

…pdate lazy imports

feat(sim): use require_optional for imageio in policy_runner

b28b410

- Replace bare with for consistent optional dependency handling per project conventions - Add imageio and imageio-ffmpeg to sim-mujoco extras in pyproject.toml - Add type: ignore comment for dynamic imageio writer attribute

cagataycali and others added 28 commits May 4, 2026 19:03

style: fix ruff format in test_agenttool_contract.py

4ddf236

Missing blank line after import (E303). Fixes CI lint failure.

docs: CHANGELOG for GH strands-labs#116 (cleanup race) + GH strands-l…

3894a43

…abs#117 (on_frame abort)

cagataycali closed this May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sim/mujoco): MjSpec-based builder (stages 0-2 of IDEA.md, closes part of #121)#127

feat(sim/mujoco): MjSpec-based builder (stages 0-2 of IDEA.md, closes part of #121)#127
cagataycali wants to merge 92 commits intostrands-labs:mainfrom
cagataycali:feat/mjspec-refactor

cagataycali commented May 5, 2026

Uh oh!

cagataycali commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cagataycali commented May 5, 2026

Summary

What this PR does

Stage 0 (prep)

Stage 1 (new builder, feature-flagged)

Stage 2 (camera xyaxes -> quat)

Test updates

What this PR does NOT do

Safety

Base branch

Follow-ups on the project board

Uh oh!

cagataycali commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants