Problem
The mutation guard (_require_no_running_policy) is the load-bearing safety
mechanism that stops the LLM from scheduling a scene mutation while a policy
worker is mid-step. If it has a race, we get SIGSEGV on stale model/data
pointers.
We currently trust that:
self._policy_threads[robot_name] = future and any(not f.done() for f in ...) are atomic enough in practice
- The executor's
submit() of the policy worker happens-before the next
mutation's check (i.e. the dict assignment is visible before
_require_no_running_policy runs)
- A policy worker finishing and setting its future to
done() doesn't race
with a mutation that checks "is anything running" mid-transition
No test proves any of this.
Proposal
Add tests/simulation/mujoco/test_concurrency.py:
- Test 1: mutation-during-policy rejected. Start a policy that runs for ~5
seconds. From the main thread, rapidly call set_gravity() 1000x. Assert:
every single call returns a friendly error, none succeed, no segfault.
- Test 2: rapid start→stop→start→stop. Stress the future lifecycle. Assert
_policy_threads state stays consistent.
- Test 3: mutation accepted immediately after policy completes. Assert
that once the future is done, the very next mutation succeeds — no
lingering state.
Use a cheap mock policy (lambda obs: {"qpos": [0]*n}) so the stress loop
doesn't take minutes.
Acceptance
- Tests marked
pytest.mark.slow if needed (run in hatch run test-integ)
- Or keep fast enough for
hatch run test if the mock is trivial
- Any discovered races get fixed + regression-tested before closing this issue
Surfaced by a second-opinion review of PR #85.
Problem
The mutation guard (
_require_no_running_policy) is the load-bearing safetymechanism that stops the LLM from scheduling a scene mutation while a policy
worker is mid-step. If it has a race, we get SIGSEGV on stale model/data
pointers.
We currently trust that:
self._policy_threads[robot_name] = futureandany(not f.done() for f in ...)are atomic enough in practicesubmit()of the policy worker happens-before the nextmutation's check (i.e. the dict assignment is visible before
_require_no_running_policyruns)done()doesn't racewith a mutation that checks "is anything running" mid-transition
No test proves any of this.
Proposal
Add
tests/simulation/mujoco/test_concurrency.py:seconds. From the main thread, rapidly call
set_gravity()1000x. Assert:every single call returns a friendly error, none succeed, no segfault.
_policy_threadsstate stays consistent.that once the future is done, the very next mutation succeeds — no
lingering state.
Use a cheap mock policy (
lambda obs: {"qpos": [0]*n}) so the stress loopdoesn't take minutes.
Acceptance
pytest.mark.slowif needed (run inhatch run test-integ)hatch run testif the mock is trivialSurfaced by a second-opinion review of PR #85.