fix: improve libero eval reset stability for 90%+ success rate by byongwu · Pull Request #6 · strands-labs/robots-sim

byongwu · 2026-04-25T13:17:03Z

Summary

Remove redundant reset() call after set_init_state() in env_libero.py to prevent "executing action in terminated episode" bug
Add num_steps_wait=10 dummy actions after reset in sim_env.py to allow physics simulation to stabilize before policy execution

Motivation

During LIBERO evaluation on AWS EC2 g6.12xlarge, the original code exhibited near-0% success rate due to the simulation not fully stabilizing before the policy started executing. These two changes together bring success rate to 90%+ (27/30 tasks across libero_spatial).

Root cause analysis:

set_init_state() followed immediately by reset() caused the episode to terminate prematurely, triggering the "executing action in terminated episode" error
Without post-reset stabilization steps, the physics engine had not settled, causing the robot to start from an unstable state

Validation

Tested on libero_spatial with 10 tasks × 3 episodes = 90% success rate (27/30) on g6.12xlarge
References: Isaac-GR00T/examples/Libero/eval/run_libero_eval.py uses the same stabilization pattern

Changes

strands_robots_sim/envs/env_libero.py: remove post-set_init_state reset() call
strands_robots_sim/sim_env.py: add num_steps_wait=10 dummy action loop after reset

🤖 Generated with Claude Code

- env_libero.py: set_init_state 후 reset() 재호출 제거 → "executing action in terminated episode" 버그 방지 - sim_env.py: reset 후 num_steps_wait=10 dummy action 추가 → 물리 시뮬레이션 안정화 대기 (핵심 수정, 0%→100% success) Ref: Isaac-GR00T/examples/Libero/eval/run_libero_eval.py Validated: libero_spatial 10 tasks × 3 episodes = 90% (27/30) on g6.12xlarge

yinsong1986

PR Review — fix: improve libero eval reset stability for 90%+ success rate

Summary: Removes a redundant reset() call after set_init_state() and adds 10 physics warm-up steps post-reset to fix near-0% success rate in LIBERO eval runs.

🔴 Issues (blocking)

[sim_env.py:_execute_task_async] The stabilization action [0, 0, 0, 0, 0, 0, -1] is hardcoded — the -1 value (gripper closed?) is not explained or parameterized. If a task starts with an open gripper, these 10 steps will actively interfere. num_steps_wait should use a zero/neutral action [0]*7 unless a closed gripper is explicitly correct for all LIBERO tasks. At minimum this needs a comment explaining why -1 is correct here, not just a reference to GR00T's example.
[sim_env.py:_execute_task_async] The step() return value is observation, _, _, _ — errors/done signals during the warm-up phase are silently discarded. If the env terminates during stabilization, the policy will start on a dead episode anyway.

🟡 Suggestions (non-blocking)

[sim_env.py] num_steps_wait=10 is magic; consider pulling it into a constant or a param on _execute_task_async so it can be tuned without a code change.
[env_libero.py] The removed obs = self.env.reset() was assigning to obs — confirm upstream code does not rely on the returned value from that call path.

✅ Verdict

Needs changes — the hardcoded -1 gripper action and silently-dropped done/error signals during warm-up need justification or fixing before merge.

yinsong1986

PR #6 — fix: improve libero eval reset stability for 90%+ success rate

Summary: Removes a redundant reset() call after set_init_state() in env_libero.py and adds 10 dummy stabilization steps in sim_env.py to prevent premature episode termination and physics instability during LIBERO evaluation.

🔴 Issues (blocking)

[sim_env.py:_execute_task_async] The stabilization loop sends a hardcoded 7-DOF zero-action [0, 0, 0, 0, 0, 0, -1] with no validation that this action shape matches the environment's action space. If sim_env.step() raises an exception here (e.g. wrong action dimensions for a non-LIBERO env), the error is silently swallowed by the outer try/except and the episode proceeds from an unstabilized state. The wait loop should be guarded or the action parameterized.
[sim_env.py:_execute_task_async] num_steps_wait is hardcoded to 10 with no way to configure it per-environment or per-task. If this path is hit for non-LIBERO envs, 10 dummy steps may produce incorrect behaviour (e.g. moving a robot arm from a valid starting pose). Should be gated on env type or exposed as a config parameter.

🟡 Suggestions (non-blocking)

[env_libero.py:reset] The removed reset() call left behind the local variable obs that was previously assigned by it. Confirm there's no dead assignment warning or confusion — looks clean from the diff, just worth a quick check that nothing downstream expected obs to be set from that path.
[sim_env.py] The reference comment cites Isaac-GR00T/examples/Libero/eval/run_libero_eval.py — good practice, but consider linking or citing the specific commit/line in the comment for future maintainability.

✅ Verdict

Needs changes (minor) — the fix is directionally correct and the 90%→27/30 validation is solid evidence it works. The hardcoded action shape and unconditional application to all envs are the issues to address before merging.

yinsong1986

PR #6 — fix: improve libero eval reset stability for 90%+ success rate

Summary: Removes a redundant reset() call after set_init_state() in env_libero.py (which was terminating episodes prematurely) and adds a 10-step dummy-action warm-up loop in sim_env.py to let physics stabilize before policy execution.

🔴 Issues (blocking)

[sim_env.py:_execute_task_async] The warm-up step loop does not check the done flag from each step. If the episode ends mid-warm-up, all 10 steps silently execute inside a terminated episode — the exact bug this PR is trying to fix. Should guard:

for _ in range(10):
    observation, _, done, _ = await self.sim_env.step({"action": [0, 0, 0, 0, 0, 0, -1]})
    if done:
        break  # don't silently continue in a terminated episode

[sim_env.py:_execute_task_async] num_steps_wait=10 is a magic number with no named constant or parameter. Define it at module level:

NUM_PHYSICS_WARMUP_STEPS = 10

🟡 Suggestions (non-blocking)

[env_libero.py:reset] The returned obs is from before set_init_state() is applied. A comment clarifying that callers should re-fetch observation before policy execution would prevent future confusion.
[sim_env.py] The gripper value -1 in [0,0,0,0,0,0,-1] means "closed." If a task starts with an open gripper, warm-up with a closed gripper could set wrong initial state. Worth confirming this is intentional.

✅ Verdict

Needs changes (minor) — core fix is correct and well-reasoned. Blocking issue is low-risk in practice but is a real correctness gap — the warm-up loop needs a done guard.

- Add NUM_PHYSICS_WARMUP_STEPS=10 module-level constant (no magic number) - Guard warm-up loop with done check to prevent stepping in terminated episode - Add comment in env_libero.reset() clarifying obs is pre-set_init_state Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>

Gripper closed (-1) matches LIBERO task initial states and the Isaac-GR00T eval convention. Add note clarifying action[6] is a delta command, not gripper_qpos, so observation state cannot be substituted directly. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>

yinsong1986

All previous concerns addressed — approving. LGTM, ready to merge.

yinsong1986 requested changes Apr 26, 2026

View reviewed changes

yinsong1986 requested changes Apr 27, 2026

View reviewed changes

yinsong1986 requested changes Apr 30, 2026

View reviewed changes

byongwu and others added 2 commits April 30, 2026 21:43

yinsong1986 approved these changes May 1, 2026

View reviewed changes

yinsong1986 merged commit 8310bc1 into strands-labs:main May 2, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve libero eval reset stability for 90%+ success rate#6

fix: improve libero eval reset stability for 90%+ success rate#6
yinsong1986 merged 3 commits intostrands-labs:mainfrom
byongwu:fix/libero-eval-reset-stability

byongwu commented Apr 25, 2026

Uh oh!

yinsong1986 left a comment

Uh oh!

yinsong1986 left a comment

Uh oh!

yinsong1986 left a comment

Uh oh!

yinsong1986 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

byongwu commented Apr 25, 2026

Summary

Motivation

Validation

Changes

Uh oh!

yinsong1986 left a comment

Choose a reason for hiding this comment

PR Review — fix: improve libero eval reset stability for 90%+ success rate

🔴 Issues (blocking)

🟡 Suggestions (non-blocking)

✅ Verdict

Uh oh!

yinsong1986 left a comment

Choose a reason for hiding this comment

PR #6 — fix: improve libero eval reset stability for 90%+ success rate

🔴 Issues (blocking)

🟡 Suggestions (non-blocking)

✅ Verdict

Uh oh!

yinsong1986 left a comment

Choose a reason for hiding this comment

PR #6 — fix: improve libero eval reset stability for 90%+ success rate

🔴 Issues (blocking)

🟡 Suggestions (non-blocking)

✅ Verdict

Uh oh!

yinsong1986 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants