strands-labs · cagataycali · Apr 1, 2026 · Apr 1, 2026 · Apr 1, 2026 · Apr 1, 2026
diff --git a/.github/workflows/test-lint.yml b/.github/workflows/test-lint.yml
@@ -26,6 +26,11 @@ jobs:
           python-version: '3.12'
           cache: 'pip'
 
+      - name: Install system dependencies (OpenGL for MuJoCo)
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y libosmesa6-dev ffmpeg
+
       - name: Install dependencies
         run: |
           pip install --no-cache-dir hatch
@@ -35,4 +40,6 @@ jobs:
         run: hatch run lint
 
       - name: Run tests
+        env:
+          MUJOCO_GL: osmesa
         run: hatch run test -x --strict-markers
diff --git a/.gitignore b/.gitignore
@@ -10,3 +10,7 @@ dist
 .strands_robots
 .coverage
 .ideation/
+MUJOCO_LOG.TXT
+TASKS.md
+TASKS_TO_FIX_85.md
+.coverage.*
diff --git a/AGENTS.md b/AGENTS.md
@@ -1,4 +1,4 @@
-# AGENTS.md — strands-labs/robots
+# AGENTS.md - strands-labs/robots
 
 ## Overview
 
@@ -11,7 +11,7 @@
 
 > **RULE**: ALWAYS use the project board to track work. When creating follow-up items,
 > create GitHub issues and add them to this board with Status + Priority set.
-> Never track work only in local markdown — the board is the source of truth.
+> Never track work only in local markdown - the board is the source of truth.
 
 ## Repository Structure
 
@@ -61,20 +61,20 @@ hatch run format            # ruff check --fix, ruff format
 ```
 
 > **Note**: Hatch uses `uv` as installer (`installer = "uv"` in pyproject.toml) for faster
-> environment creation. No manual uv install needed — hatch handles it.
+> environment creation. No manual uv install needed - hatch handles it.
 
 ## Key Conventions
 
-1. **Python 3.12+** — `requires-python = ">=3.12"` (LeRobot >=0.5.0 requires 3.12)
-2. **Dependency bounds** — `>=1.0` deps: cap major. `<1.0` deps: cap minor. E.g. `lerobot>=0.5.0,<0.6.0`
-3. **`__init__.py` must be thin** — exports only, no logic
-4. **Imports at file top** — unless lazy-loading heavy deps with documented reason
-5. **Raise on fatal errors** — never warn-and-continue if the system will behave unexpectedly
-6. **No silent defaults on error** — returning zero-valued actions on failure is forbidden
-7. **Use `require_optional()`** — from `strands_robots/utils.py` for all optional deps
-8. **Integration tests required** — each policy needs `tests_integ/` tests with real inference
-9. **Test behavior, not implementation** — assert on outputs, not internal state
-10. **No dead code** — if it's not called and not part of base class, delete it
+1. **Python 3.12+** - `requires-python = ">=3.12"` (LeRobot >=0.5.0 requires 3.12)
+2. **Dependency bounds** - `>=1.0` deps: cap major. `<1.0` deps: cap minor. E.g. `lerobot>=0.5.0,<0.6.0`
+3. **`__init__.py` must be thin** - exports only, no logic
+4. **Imports at file top** - unless lazy-loading heavy deps with documented reason
+5. **Raise on fatal errors** - never warn-and-continue if the system will behave unexpectedly
+6. **No silent defaults on error** - returning zero-valued actions on failure is forbidden
+7. **Use `require_optional()`** - from `strands_robots/utils.py` for all optional deps
+8. **Integration tests required** - each policy needs `tests_integ/` tests with real inference
+9. **Test behavior, not implementation** - assert on outputs, not internal state
+10. **No dead code** - if it's not called and not part of base class, delete it
 
 ## PR Workflow
 
@@ -94,7 +94,7 @@ hatch run format            # ruff check --fix, ruff format
   `asimovinc/asimov-v0` which has `sim-model/xmls/asimov.xml` +
   `sim-model/assets/`. The `_safe_join` helper in `strands_robots/utils.py`
   guards against traversal (`..`).
-- **Auto-download strategy** — every robot with an `asset` block must declare
+- **Auto-download strategy** - every robot with an `asset` block must declare
   exactly one of:
     1. `asset.robot_descriptions_module` (preferred)
     2. `asset.source` with `type: "github"`

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,268 @@
+# CHANGELOG
+
+All notable behavioural changes to `strands-robots` are logged here. Follows
+[Keep a Changelog](https://keepachangelog.com/) conventions.
+
+## Unreleased - PR #85 (MuJoCo backend remediation)
+
+### MJCF builder refactor: string-concat -> MjSpec AST (closes #121, #122-#126)
+
+The ``MJCFBuilder`` string-concat path and the ``scene_ops`` XML-round-trip
+machinery (~700 lines total) are replaced by direct manipulation of
+``mujoco.MjSpec`` - the editable MJCF AST shipped with MuJoCo 3.2+.
+
+What changed under the hood:
+- **New module** ``strands_robots/simulation/mujoco/spec_builder.py``. The
+  ``SpecBuilder`` class owns scene construction + mutation (``build``,
+  ``add_object``, ``remove_body``, ``add_camera``, ``remove_camera``,
+  ``attach_robot``, ``from_mjcf_string``, ``from_file``).
+- **Deleted**: ``strands_robots/simulation/mujoco/mjcf_builder.py`` (273
+  lines of f-string MJCF and the ``_camera_xyaxes_from_target`` helper).
+- **Rewrote**: ``strands_robots/simulation/mujoco/scene_ops.py`` from
+  ~980 lines of tmpdir + ``mj_saveLastXML`` + ``ElementTree`` round-trips
+  down to ~295 lines that go through ``spec.recompile(model, data)``.
+- **Bumped**: ``mujoco>=3.0.0`` -> ``>=3.2.0`` in ``pyproject.toml`` so
+  ``MjSpec`` is always available. Current hatch env runs 3.8.0.
+
+Agent-visible wins:
+- **New action** ``replace_scene_mjcf(xml=...)`` - atomically replace the
+  whole scene with agent-authored MJCF. Validated by actually compiling
+  it, so ``<tendon>``, ``<equality>``, ``<pair>``, custom solref/solimp,
+  sites, hfield, etc. all work without needing new ``SimObject`` shape
+  vocabulary. On malformed XML returns a clean error dict (no process
+  abort).
+- **``ellipsoid`` shape** now works in ``add_object`` - it's a free
+  bonus MuJoCo geom type the string-concat builder rejected.
+- **Camera orientation** uses ``quat`` (computed via
+  ``mujoco.mju_mat2Quat``) instead of a hand-rolled ``xyaxes`` string.
+  Compiled ``cam_mat0`` is numerically identical within ~4e-7.
+- **``spec.recompile(model, data)``** preserves existing joint qpos/qvel
+  for unchanged joints automatically - no manual "copy state by name"
+  loop. Object freejoints added post-compile get initialised to the
+  body's ``pos``/``quat``.
+- **No more XML injection surface**: names go straight into MjSpec which
+  validates them itself, so the old ``_sanitize_name`` regex gate +
+  fuzz test are no longer needed.
+
+Downstream API is unchanged: ``add_object``, ``add_robot``, ``remove_object``,
+``remove_robot``, ``add_camera``, ``remove_camera``, ``load_scene`` all keep
+their tool-action signatures. Tests that asserted on exact XML strings
+were rewritten to assert on compiled ``MjModel`` properties (``cam_mat0``,
+``mj_name2id``) so they are representation-agnostic.
+
+Known constraint: ``remove_robot`` now rebuilds the scene from scratch
+(drops joint qpos state) rather than going through ``spec.delete()`` on
+attached bodies. This sidesteps a MuJoCo 3.8 double-free bug where
+``spec.delete(attached_body)`` + interpreter shutdown crashes. Trade-off
+is documented in ``scene_ops.eject_robot_from_scene``.
+
+### Breaking
+
+These changes tighten the MuJoCo AgentTool contract. Legacy callers that
+silently worked by accident will now receive a clear error instead:
+
+- **Router input validation**: The ``_dispatch_action`` router rejects any
+  top-level parameter that isn't declared on the target method. Passing
+  ``step(num_steps=5)`` (wrong name) or ``set_gravity(device="mps")``
+  (stray kwarg) now errors with *"Unknown parameter X for action Y.
+  Valid: [...]"* instead of silently dropping the value. Methods whose
+  Python signature includes ``**kwargs`` (e.g. ``add_object``) keep their
+  pass-through semantics.
+- **Missing required args**: produce *"Action X requires parameter Y."*
+  instead of a raw Python ``TypeError``.
+- **Vector dimension validation**: ``position``, ``target``, ``origin``,
+  ``force``, ``torque``, ``gravity``, ``direction``, ``point``, ``orientation``
+  (quaternion), and ``color`` (rgba) all validated for length + numeric
+  dtype before reaching numpy/MuJoCo.
+- **Camera orientation**: ``add_camera(target=[x,y,z])`` is now honoured
+  by baking ``xyaxes`` into the MJCF ``<camera>``. Previously the target
+  was silently dropped and every custom camera rendered a default view.
+  Degenerate case (``target == position``) errors.
+- **Render camera validation**: ``render(camera_name="missing")`` errors
+  with *"Camera 'missing' not found."* instead of silently falling back
+  to the free camera while claiming to render from the named one.
+- **Raycast zero-direction guard**: ``raycast(direction=[0,0,0])`` now
+  errors with *"direction vector is zero-length"*. Previously MuJoCo's
+  C-level ``mj_ray`` would abort the Python process.
+- **apply_force requires a non-zero vector**: passing neither ``force``
+  nor ``torque`` (or both zero) errors. Previously the call silently
+  succeeded with no effect.
+- **step(n_steps<0)** rejected (previously it corrupted ``step_count``).
+- **Negative mass / timestep / size** rejected per shape; previously
+  ``set_body_properties(mass=-1)`` and ``set_timestep(-0.01)`` silently
+  succeeded.
+- **Plane objects auto-static**: ``add_object(shape="plane")`` now forces
+  ``is_static=True`` (planes are infinite in MuJoCo). Explicit
+  ``is_static=False`` on a plane is a hard error.
+- **Duplicate camera name** rejected. Previously a second ``add_camera``
+  with an existing name silently overwrote the registry entry while
+  leaving the old camera in the XML - ghost behaviour. Use
+  ``remove_camera`` + ``add_camera`` to replace.
+- **stop_policy(robot_name='')** errors with *"stop_policy requires
+  'robot_name'."* instead of silently matching the first robot.
+- **eval_policy** requires an explicit ``robot_name``. Default
+  ``n_episodes`` lowered from 10 to 1.
+- **register_urdf** validates the path: file must exist, be a file, and
+  be readable. Previously bad paths were cached and blew up later.
+
+### Recording backend split
+
+- ``start_recording`` (LeRobotDataset: parquet + per-camera MP4) still
+  requires the ``[lerobot]`` extra. Its error message when lerobot is
+  missing now points callers at ``start_cameras_recording`` for plain
+  MP4 (which runs under ``[sim-mujoco]`` alone via imageio-ffmpeg).
+- No API change - the fix is informational.
+
+### Resource hygiene
+
+- ``destroy()`` and ``cleanup()`` now close renderers on the main thread
+  and empty the TLS cache. Previously each ``create_world/destroy``
+  cycle leaked one ``mujoco.Renderer`` + its GL context (~33 MB per
+  cycle measured). Worker-thread renderers still release themselves on
+  thread teardown (we avoid cross-thread ``close()`` to prevent
+  ``cgl.free()`` SIGSEGVs on macOS).
+- ``get_mass_matrix`` and ``get_contacts`` run ``mj_forward`` first so
+  values are valid immediately after a ``reset`` or ``add_robot``
+  (previously returned stale / uninitialised memory).
+
+### Concurrency guards
+
+Write-mutations are now refused while a policy is running on any robot
+in the world. Previously these could race the policy worker thread and
+produce undefined behaviour or SIGSEGV:
+
+    reset, set_gravity, set_timestep, set_joint_positions,
+    set_joint_velocities, apply_force, set_body_properties,
+    set_geom_properties, load_state, randomize, move_object
+
+The error now lists *which* robot(s) are active so the LLM can
+``stop_policy`` on each without guessing: *"Cannot 'X' while a policy
+is running on 'armA', 'armB'. Stop it first: action='stop_policy'."*
+
+### Concurrent per-robot policies (GH #114)
+
+Multiple ``start_policy`` calls on *different* robots now run
+concurrently. MuJoCo physics is still serialized via ``self._lock``
+(``mj_step`` and ``ctrl[]`` writes are not thread-safe for concurrent
+mutation), but each policy owns a disjoint slice of ``data.ctrl[]`` so
+two VLA arms can operate in the same scene without semantic conflict.
+
+- ``start_policy("armA")`` + ``start_policy("armB")`` both succeed.
+  Second call no longer hits a global "policy already running" gate.
+- ``start_policy`` on the *same* robot while its policy is active
+  still errors (unchanged).
+- ``remove_robot("X")`` now gracefully stops X's own policy before
+  removing, instead of requiring a prior ``stop_policy("X")``. Still
+  errors if a *different* robot has an active policy (XML round-trip
+  invalidates cached IDs everywhere).
+- New action ``list_policies_running`` returns the names of robots
+  with live policies. Prunes completed Futures as a side-effect.
+- Completed policy Futures are no longer retained forever in
+  ``_policy_threads`` (GH #120 companion fix).
+
+### Policy-hook robustness (GH #117)
+
+``PolicyRunner.run`` previously caught *all* ``on_frame`` exceptions at
+WARN level and kept iterating. A recording hook with a typo'd observation
+key would log 500 lines and produce an empty dataset. Now we count
+*consecutive* failures and abort the episode after a threshold (default
+5, tunable via new ``max_onframe_failures`` kwarg).
+
+- A single transient failure still logs + continues; counter resets on
+  the next successful call.
+- ``N`` consecutive failures raise ``RuntimeError`` so ``run()`` returns
+  ``status='error'`` with a clear message, preventing silent dataset
+  corruption.
+
+### Cleanup graceful shutdown (GH #116)
+
+``Simulation.cleanup()`` no longer races the policy worker. Previously
+cleanup set ``self._world = None`` and called ``executor.shutdown(wait=False)``
+nearly simultaneously - a policy still inside ``mj_step`` segfaulted on
+freed arrays. Now cleanup:
+
+1. Signals every live policy to stop (``policy_running = False``).
+2. Awaits each outstanding Future with a bounded timeout (default 5s,
+   overridable via new ``cleanup(policy_stop_timeout=...)`` kwarg).
+3. Only AFTER workers unwind do we null ``self._world`` and tear down
+   renderers / viewer / executor.
+
+Wedged workers that don't stop in time get logged as a warning - cleanup
+proceeds rather than hanging the host process on exit.
+
+### Error message consistency
+
+- All "no world" paths return the same string:
+  *"No world. Call create_world (or load_scene) first."*
+- Unknown-name errors use a uniform ``<Kind> 'X' not found.`` shape
+  (Robot / Object / Body / Geom / Joint / Sensor / Camera / Checkpoint).
+- ``stop_recording``, ``stop_cameras_recording``, ``stop_policy``,
+  ``close_viewer`` are now **idempotent**: calling them when nothing
+  is running returns ``status="success"`` with a *"Was not ..."* message
+  so callers can invoke them unconditionally.
+- ``get_recording_status`` returns success in every lifecycle state
+  (no world / not recording / recording).
+
+### Deprecations
+
+- **add_robot name-as-registry fallback**: passing ``name="my_bot"``
+  without ``urdf_path`` or ``data_config`` used to resolve ``my_bot`` in
+  the model registry. This now fires a ``DeprecationWarning``. Use
+  ``add_robot(name="...", data_config="<registry_key>")`` instead. Will
+  be removed next major release.
+
+### New / extended actions
+
+- ``forward_kinematics(body_name="X")`` filters to a single body.
+- ``get_features(robot_name="X")`` filters to a single robot's joints
+  and actuators.
+- ``set_geom_properties(geom_name="X")`` accepts the bare object name
+  as an alias for the injected ``"{name}_geom"``.
+- ``render_all`` flags cameras whose frame has near-zero pixel variance
+  (``"⚠️ camera 'X': image appears empty (variance < 1)"``).
+- ``render_depth`` surfaces MuJoCo's one-time ``ARB_clip_control``
+  warning in the response text on macOS, so the LLM knows when depth
+  accuracy is reduced.
+- ``render`` / ``render_depth``: width/height validated up front;
+  oversized requests get a plain-English message naming the actual
+  framebuffer cap (``<global offwidth=...>``) instead of MuJoCo's raw
+  error.
+- ``run_policy`` / ``start_policy``: accept optional ``n_steps``
+  (primary) or legacy ``max_steps`` as an alternative to
+  ``duration``+``control_frequency``. ``duration = n_steps /
+  control_frequency`` when ``n_steps`` is set.
+- **New ``list_policies_running``** action returns the names of robots
+  with a live policy - pairs with the new concurrent-policy support
+  (see *Concurrent per-robot policies* above).
+- ``randomize(randomize_physics=True)`` now reports per-body mass scales
+  and per-geom friction scales in the response (not just range
+  endpoints).
+- ``get_contacts`` resolves unnamed geoms to
+  ``"<body_name>/geom_<id>"`` so contact pairs are always human-readable.
+- ``get_sensor_data(sensor_name="X")`` on a model with no sensors now
+  distinguishes *"Sensor 'X' not found. Model has no sensors."* from
+  the generic "no sensors in model" success.
+
+### Tests
+
+- New: ``tests/simulation/mujoco/test_agenttool_contract.py`` - ~50
+  tests that lock in router validation, tool_spec ↔ method parity,
+  unified error messages, idempotent stop family, ``mj_forward`` before
+  reads, render-dim validation, feature filters, camera duplicate
+  policy, plane auto-static, policy horizon unification, and more.
+- New: ``tests/simulation/mujoco/test_renderer_hygiene.py`` - 4 tests
+  asserting TLS cache is emptied on ``destroy``, renderer reuse works
+  for identical ``(w,h)``, and ``create_world`` after ``destroy``
+  rebuilds cleanly.
+- New: ``tests/simulation/mujoco/test_recording_backends.py`` - 2 tests
+  (one skipped when ``lerobot`` IS installed) pinning the
+  MP4-without-lerobot backend.
+- New: ``tests/simulation/mujoco/test_input_validation.py`` - 11 tests
+  for step/raycast/apply_force validation.
+- New: ``tests_integ/test_resource_hygiene.py`` - 3 integration tests
+  (require ``psutil``): 50 create/destroy cycles grow RSS < 50 MB; 500
+  renders at fixed dims grow RSS < 100 MB; TLS cache cleared on destroy.
+
+Test count: **256 → 362** (+106 new regression tests), zero
+regressions. ``hatch run lint`` (ruff + mypy) clean across 102 source
+files.