Port reference impl → SafeMiniRunner + policies + tests#1
Port reference impl → SafeMiniRunner + policies + tests#1JustinJLeopard merged 1 commit intomainfrom
Conversation
…n-parser/worktree modules + integration tests
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 096b13a8c8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if "COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT" in shaped.output: | ||
| break | ||
| if final_tests_pass(cwd, command=self.final_check_command): | ||
| success = True |
There was a problem hiding this comment.
Evaluate tests before exiting on completion marker
In SafeMiniRunner.run, the loop breaks immediately when observation output contains COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT, before checking final_tests_pass. If the agent emits that marker in the same action that fixes the repo, success is never set to True, and the final RunResult.success remains false even though tests pass (success and tests_pass). This causes false-negative outcomes for otherwise successful runs that follow the completion protocol.
Useful? React with 👍 / 👎.
| observation_budget_exhausted = ( | ||
| observation_budget_exhausted | ||
| or observation_chars_used > budget.observation_budget | ||
| ) |
There was a problem hiding this comment.
Stop execution when observation budget is exceeded
The runner records observation_budget_exhausted once observation_chars_used crosses the limit, but it does not terminate the loop or prevent success afterward. As written, additional commands continue to execute and the run can still return success=True, which violates the documented two-axis budget enforcement for observation characters.
Useful? React with 👍 / 👎.
| provisioner = WorktreeProvisioner(self.repo_path) | ||
| cwd = provisioner.provision() |
There was a problem hiding this comment.
Retain worktree lifetime when keep_worktree is true
With keep_worktree=True, run() returns worktree_path but only keeps the WorktreeProvisioner in a local variable. Since the provisioner owns a TemporaryDirectory, dropping that last reference after return can trigger automatic cleanup and delete the returned path. This makes worktree_path unreliable for post-run inspection.
Useful? React with 👍 / 👎.
Factors the 270-line reference/lab_safe_mini_agent.py into a proper module structure.
Changes:
Validation: