Skip to content

sim/mujoco: _policy_threads dict accumulates completed Future refs forever #120

@cagataycali

Description

@cagataycali

Problem

self._policy_threads[robot_name] = future is set by start_policy, but the
entry is never removed when the policy completes. After 100 episodes on
the same robot, you have 100 entries in the dict, 99 of which are done().

Consequences today:

  • _require_no_running_policy iterates all entries on every mutation. O(n),
    microsecond-scale — not a real perf issue at today's scale.
  • list_running_policies() (if/when it exists) would return stale robot
    names as "running" unless it filters on .done().
  • Memory: trivial (each Future is small).

Fix options

A. In _require_no_running_policy, prune as you iterate:

done_keys = [k for k, f in self._policy_threads.items() if f.done()]
for k in done_keys:
    del self._policy_threads[k]

B. Track running on the Robot dataclass (already has policy_running)
and iterate self._world.robots instead of the dict.

C. Don't track futures at all — use robot.policy_running: bool as the
source of truth. Futures become implementation detail, not session state.

Related: this intersects with #114 (_policy_threads semantics).

Acceptance

  • Pick fix, land it
  • Regression test: run 10 policies sequentially, assert _policy_threads
    (or equivalent state) doesn't grow unboundedly

Surfaced by a second-opinion review of PR #85.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions