Skip to content

fix: improve libero eval reset stability for 90%+ success rate#6

Merged
yinsong1986 merged 3 commits intostrands-labs:mainfrom
byongwu:fix/libero-eval-reset-stability
May 2, 2026
Merged

fix: improve libero eval reset stability for 90%+ success rate#6
yinsong1986 merged 3 commits intostrands-labs:mainfrom
byongwu:fix/libero-eval-reset-stability

Conversation

@byongwu
Copy link
Copy Markdown
Contributor

@byongwu byongwu commented Apr 25, 2026

Summary

  • Remove redundant reset() call after set_init_state() in env_libero.py to prevent "executing action in terminated episode" bug
  • Add num_steps_wait=10 dummy actions after reset in sim_env.py to allow physics simulation to stabilize before policy execution

Motivation

During LIBERO evaluation on AWS EC2 g6.12xlarge, the original code exhibited near-0% success rate due to the simulation not fully stabilizing before the policy started executing. These two changes together bring success rate to 90%+ (27/30 tasks across libero_spatial).

Root cause analysis:

  • set_init_state() followed immediately by reset() caused the episode to terminate prematurely, triggering the "executing action in terminated episode" error
  • Without post-reset stabilization steps, the physics engine had not settled, causing the robot to start from an unstable state

Validation

  • Tested on libero_spatial with 10 tasks × 3 episodes = 90% success rate (27/30) on g6.12xlarge
  • References: Isaac-GR00T/examples/Libero/eval/run_libero_eval.py uses the same stabilization pattern

Changes

  • strands_robots_sim/envs/env_libero.py: remove post-set_init_state reset() call
  • strands_robots_sim/sim_env.py: add num_steps_wait=10 dummy action loop after reset

🤖 Generated with Claude Code

- env_libero.py: set_init_state 후 reset() 재호출 제거
  → "executing action in terminated episode" 버그 방지
- sim_env.py: reset 후 num_steps_wait=10 dummy action 추가
  → 물리 시뮬레이션 안정화 대기 (핵심 수정, 0%→100% success)

Ref: Isaac-GR00T/examples/Libero/eval/run_libero_eval.py
Validated: libero_spatial 10 tasks × 3 episodes = 90% (27/30) on g6.12xlarge
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review — fix: improve libero eval reset stability for 90%+ success rate

Summary: Removes a redundant reset() call after set_init_state() and adds 10 physics warm-up steps post-reset to fix near-0% success rate in LIBERO eval runs.

🔴 Issues (blocking)

  • [sim_env.py:_execute_task_async] The stabilization action [0, 0, 0, 0, 0, 0, -1] is hardcoded — the -1 value (gripper closed?) is not explained or parameterized. If a task starts with an open gripper, these 10 steps will actively interfere. num_steps_wait should use a zero/neutral action [0]*7 unless a closed gripper is explicitly correct for all LIBERO tasks. At minimum this needs a comment explaining why -1 is correct here, not just a reference to GR00T's example.
  • [sim_env.py:_execute_task_async] The step() return value is observation, _, _, _ — errors/done signals during the warm-up phase are silently discarded. If the env terminates during stabilization, the policy will start on a dead episode anyway.

🟡 Suggestions (non-blocking)

  • [sim_env.py] num_steps_wait=10 is magic; consider pulling it into a constant or a param on _execute_task_async so it can be tuned without a code change.
  • [env_libero.py] The removed obs = self.env.reset() was assigning to obs — confirm upstream code does not rely on the returned value from that call path.

✅ Verdict

Needs changes — the hardcoded -1 gripper action and silently-dropped done/error signals during warm-up need justification or fixing before merge.

Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #6 — fix: improve libero eval reset stability for 90%+ success rate

Summary: Removes a redundant reset() call after set_init_state() in env_libero.py and adds 10 dummy stabilization steps in sim_env.py to prevent premature episode termination and physics instability during LIBERO evaluation.

🔴 Issues (blocking)

  • [sim_env.py:_execute_task_async] The stabilization loop sends a hardcoded 7-DOF zero-action [0, 0, 0, 0, 0, 0, -1] with no validation that this action shape matches the environment's action space. If sim_env.step() raises an exception here (e.g. wrong action dimensions for a non-LIBERO env), the error is silently swallowed by the outer try/except and the episode proceeds from an unstabilized state. The wait loop should be guarded or the action parameterized.

  • [sim_env.py:_execute_task_async] num_steps_wait is hardcoded to 10 with no way to configure it per-environment or per-task. If this path is hit for non-LIBERO envs, 10 dummy steps may produce incorrect behaviour (e.g. moving a robot arm from a valid starting pose). Should be gated on env type or exposed as a config parameter.

🟡 Suggestions (non-blocking)

  • [env_libero.py:reset] The removed reset() call left behind the local variable obs that was previously assigned by it. Confirm there's no dead assignment warning or confusion — looks clean from the diff, just worth a quick check that nothing downstream expected obs to be set from that path.

  • [sim_env.py] The reference comment cites Isaac-GR00T/examples/Libero/eval/run_libero_eval.py — good practice, but consider linking or citing the specific commit/line in the comment for future maintainability.

✅ Verdict

Needs changes (minor) — the fix is directionally correct and the 90%→27/30 validation is solid evidence it works. The hardcoded action shape and unconditional application to all envs are the issues to address before merging.

Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #6 — fix: improve libero eval reset stability for 90%+ success rate

Summary: Removes a redundant reset() call after set_init_state() in env_libero.py (which was terminating episodes prematurely) and adds a 10-step dummy-action warm-up loop in sim_env.py to let physics stabilize before policy execution.

🔴 Issues (blocking)

  • [sim_env.py:_execute_task_async] The warm-up step loop does not check the done flag from each step. If the episode ends mid-warm-up, all 10 steps silently execute inside a terminated episode — the exact bug this PR is trying to fix. Should guard:
for _ in range(10):
    observation, _, done, _ = await self.sim_env.step({"action": [0, 0, 0, 0, 0, 0, -1]})
    if done:
        break  # don't silently continue in a terminated episode
  • [sim_env.py:_execute_task_async] num_steps_wait=10 is a magic number with no named constant or parameter. Define it at module level:
NUM_PHYSICS_WARMUP_STEPS = 10

🟡 Suggestions (non-blocking)

  • [env_libero.py:reset] The returned obs is from before set_init_state() is applied. A comment clarifying that callers should re-fetch observation before policy execution would prevent future confusion.
  • [sim_env.py] The gripper value -1 in [0,0,0,0,0,0,-1] means "closed." If a task starts with an open gripper, warm-up with a closed gripper could set wrong initial state. Worth confirming this is intentional.

✅ Verdict

Needs changes (minor) — core fix is correct and well-reasoned. Blocking issue is low-risk in practice but is a real correctness gap — the warm-up loop needs a done guard.

byongwu and others added 2 commits April 30, 2026 21:43
- Add NUM_PHYSICS_WARMUP_STEPS=10 module-level constant (no magic number)
- Guard warm-up loop with done check to prevent stepping in terminated episode
- Add comment in env_libero.reset() clarifying obs is pre-set_init_state

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Gripper closed (-1) matches LIBERO task initial states and the Isaac-GR00T
eval convention. Add note clarifying action[6] is a delta command, not
gripper_qpos, so observation state cannot be substituted directly.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All previous concerns addressed — approving. LGTM, ready to merge.

@yinsong1986 yinsong1986 merged commit 8310bc1 into strands-labs:main May 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants