fix: improve libero eval reset stability for 90%+ success rate#6
Conversation
- env_libero.py: set_init_state 후 reset() 재호출 제거 → "executing action in terminated episode" 버그 방지 - sim_env.py: reset 후 num_steps_wait=10 dummy action 추가 → 물리 시뮬레이션 안정화 대기 (핵심 수정, 0%→100% success) Ref: Isaac-GR00T/examples/Libero/eval/run_libero_eval.py Validated: libero_spatial 10 tasks × 3 episodes = 90% (27/30) on g6.12xlarge
yinsong1986
left a comment
There was a problem hiding this comment.
PR Review — fix: improve libero eval reset stability for 90%+ success rate
Summary: Removes a redundant reset() call after set_init_state() and adds 10 physics warm-up steps post-reset to fix near-0% success rate in LIBERO eval runs.
🔴 Issues (blocking)
- [sim_env.py:_execute_task_async] The stabilization action
[0, 0, 0, 0, 0, 0, -1]is hardcoded — the-1value (gripper closed?) is not explained or parameterized. If a task starts with an open gripper, these 10 steps will actively interfere.num_steps_waitshould use a zero/neutral action[0]*7unless a closed gripper is explicitly correct for all LIBERO tasks. At minimum this needs a comment explaining why-1is correct here, not just a reference to GR00T's example. - [sim_env.py:_execute_task_async] The
step()return value isobservation, _, _, _— errors/done signals during the warm-up phase are silently discarded. If the env terminates during stabilization, the policy will start on a dead episode anyway.
🟡 Suggestions (non-blocking)
- [sim_env.py]
num_steps_wait=10is magic; consider pulling it into a constant or a param on_execute_task_asyncso it can be tuned without a code change. - [env_libero.py] The removed
obs = self.env.reset()was assigning toobs— confirm upstream code does not rely on the returned value from that call path.
✅ Verdict
Needs changes — the hardcoded -1 gripper action and silently-dropped done/error signals during warm-up need justification or fixing before merge.
yinsong1986
left a comment
There was a problem hiding this comment.
PR #6 — fix: improve libero eval reset stability for 90%+ success rate
Summary: Removes a redundant reset() call after set_init_state() in env_libero.py and adds 10 dummy stabilization steps in sim_env.py to prevent premature episode termination and physics instability during LIBERO evaluation.
🔴 Issues (blocking)
-
[sim_env.py:_execute_task_async] The stabilization loop sends a hardcoded 7-DOF zero-action
[0, 0, 0, 0, 0, 0, -1]with no validation that this action shape matches the environment's action space. Ifsim_env.step()raises an exception here (e.g. wrong action dimensions for a non-LIBERO env), the error is silently swallowed by the outer try/except and the episode proceeds from an unstabilized state. The wait loop should be guarded or the action parameterized. -
[sim_env.py:_execute_task_async]
num_steps_waitis hardcoded to 10 with no way to configure it per-environment or per-task. If this path is hit for non-LIBERO envs, 10 dummy steps may produce incorrect behaviour (e.g. moving a robot arm from a valid starting pose). Should be gated on env type or exposed as a config parameter.
🟡 Suggestions (non-blocking)
-
[env_libero.py:reset] The removed
reset()call left behind the local variableobsthat was previously assigned by it. Confirm there's no dead assignment warning or confusion — looks clean from the diff, just worth a quick check that nothing downstream expectedobsto be set from that path. -
[sim_env.py] The reference comment cites
Isaac-GR00T/examples/Libero/eval/run_libero_eval.py— good practice, but consider linking or citing the specific commit/line in the comment for future maintainability.
✅ Verdict
Needs changes (minor) — the fix is directionally correct and the 90%→27/30 validation is solid evidence it works. The hardcoded action shape and unconditional application to all envs are the issues to address before merging.
yinsong1986
left a comment
There was a problem hiding this comment.
PR #6 — fix: improve libero eval reset stability for 90%+ success rate
Summary: Removes a redundant reset() call after set_init_state() in env_libero.py (which was terminating episodes prematurely) and adds a 10-step dummy-action warm-up loop in sim_env.py to let physics stabilize before policy execution.
🔴 Issues (blocking)
- [sim_env.py:_execute_task_async] The warm-up step loop does not check the
doneflag from each step. If the episode ends mid-warm-up, all 10 steps silently execute inside a terminated episode — the exact bug this PR is trying to fix. Should guard:
for _ in range(10):
observation, _, done, _ = await self.sim_env.step({"action": [0, 0, 0, 0, 0, 0, -1]})
if done:
break # don't silently continue in a terminated episode- [sim_env.py:_execute_task_async]
num_steps_wait=10is a magic number with no named constant or parameter. Define it at module level:
NUM_PHYSICS_WARMUP_STEPS = 10🟡 Suggestions (non-blocking)
- [env_libero.py:reset] The returned
obsis from beforeset_init_state()is applied. A comment clarifying that callers should re-fetch observation before policy execution would prevent future confusion. - [sim_env.py] The gripper value
-1in[0,0,0,0,0,0,-1]means "closed." If a task starts with an open gripper, warm-up with a closed gripper could set wrong initial state. Worth confirming this is intentional.
✅ Verdict
Needs changes (minor) — core fix is correct and well-reasoned. Blocking issue is low-risk in practice but is a real correctness gap — the warm-up loop needs a done guard.
- Add NUM_PHYSICS_WARMUP_STEPS=10 module-level constant (no magic number) - Guard warm-up loop with done check to prevent stepping in terminated episode - Add comment in env_libero.reset() clarifying obs is pre-set_init_state Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Gripper closed (-1) matches LIBERO task initial states and the Isaac-GR00T eval convention. Add note clarifying action[6] is a delta command, not gripper_qpos, so observation state cannot be substituted directly. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
yinsong1986
left a comment
There was a problem hiding this comment.
All previous concerns addressed — approving. LGTM, ready to merge.
Summary
reset()call afterset_init_state()inenv_libero.pyto prevent "executing action in terminated episode" bugnum_steps_wait=10dummy actions after reset insim_env.pyto allow physics simulation to stabilize before policy executionMotivation
During LIBERO evaluation on AWS EC2 g6.12xlarge, the original code exhibited near-0% success rate due to the simulation not fully stabilizing before the policy started executing. These two changes together bring success rate to 90%+ (27/30 tasks across libero_spatial).
Root cause analysis:
set_init_state()followed immediately byreset()caused the episode to terminate prematurely, triggering the "executing action in terminated episode" errorValidation
libero_spatialwith 10 tasks × 3 episodes = 90% success rate (27/30) on g6.12xlargeIsaac-GR00T/examples/Libero/eval/run_libero_eval.pyuses the same stabilization patternChanges
strands_robots_sim/envs/env_libero.py: remove post-set_init_statereset()callstrands_robots_sim/sim_env.py: addnum_steps_wait=10dummy action loop after reset🤖 Generated with Claude Code