sim/mujoco: _policy_threads dict accumulates completed Future refs forever

## Problem

`self._policy_threads[robot_name] = future` is set by `start_policy`, but the
entry is **never removed** when the policy completes. After 100 episodes on
the same robot, you have 100 entries in the dict, 99 of which are `done()`.

Consequences today:
- `_require_no_running_policy` iterates all entries on every mutation. O(n),
  microsecond-scale — not a real perf issue at today's scale.
- `list_running_policies()` (if/when it exists) would return stale robot
  names as "running" unless it filters on `.done()`.
- Memory: trivial (each Future is small).

## Fix options

**A.** In `_require_no_running_policy`, prune as you iterate:
```python
done_keys = [k for k, f in self._policy_threads.items() if f.done()]
for k in done_keys:
    del self._policy_threads[k]
```

**B.** Track `running` on the Robot dataclass (already has `policy_running`)
and iterate `self._world.robots` instead of the dict.

**C.** Don't track futures at all — use `robot.policy_running: bool` as the
source of truth. Futures become implementation detail, not session state.

Related: this intersects with #114 (`_policy_threads` semantics).

## Acceptance

- Pick fix, land it
- Regression test: run 10 policies sequentially, assert `_policy_threads`
  (or equivalent state) doesn't grow unboundedly

---

Surfaced by a second-opinion review of PR #85.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sim/mujoco: _policy_threads dict accumulates completed Future refs forever #120

Problem

Fix options

Acceptance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sim/mujoco: _policy_threads dict accumulates completed Future refs forever #120

Description

Problem

Fix options

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions