Port reference impl → SafeMiniRunner + policies + tests by JustinJLeopard · Pull Request #1 · JustinJLeopard/safe-mini

JustinJLeopard · 2026-05-01T05:48:24Z

Factors the 270-line reference/lab_safe_mini_agent.py into a proper module structure.

Changes:

Replaces the SafeMiniRunner stub with a concrete substrate runner.
Adds action parsing for fenced bash and JSON protocols.
Adds open, safe, and allowlist executor policy classes.
Adds full, tail, headtail, structured, and structured+raw-tail observation policies.
Adds worktree provisioning and the seven-class failure classifier.
Moves ScriptedModel into tests as a test double and adds unit/integration coverage.

Validation:

pytest -q: 51 passed
pytest --cov=safe_mini: 95% total coverage
ruff check safe_mini tests: clean
ruff format --check safe_mini tests: clean
mypy safe_mini: clean

…n-parser/worktree modules + integration tests

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 096b13a8c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-01T05:51:20Z

+                if "COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT" in shaped.output:
+                    break
+                if final_tests_pass(cwd, command=self.final_check_command):
+                    success = True


Evaluate tests before exiting on completion marker

In SafeMiniRunner.run, the loop breaks immediately when observation output contains COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT, before checking final_tests_pass. If the agent emits that marker in the same action that fixes the repo, success is never set to True, and the final RunResult.success remains false even though tests pass (success and tests_pass). This causes false-negative outcomes for otherwise successful runs that follow the completion protocol.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-01T05:51:20Z

+                observation_budget_exhausted = (
+                    observation_budget_exhausted
+                    or observation_chars_used > budget.observation_budget
+                )


Stop execution when observation budget is exceeded

The runner records observation_budget_exhausted once observation_chars_used crosses the limit, but it does not terminate the loop or prevent success afterward. As written, additional commands continue to execute and the run can still return success=True, which violates the documented two-axis budget enforcement for observation characters.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-01T05:51:20Z

+        provisioner = WorktreeProvisioner(self.repo_path)
+        cwd = provisioner.provision()


Retain worktree lifetime when keep_worktree is true

With keep_worktree=True, run() returns worktree_path but only keeps the WorktreeProvisioner in a local variable. Since the provisioner owns a TemporaryDirectory, dropping that last reference after return can trigger automatic cleanup and delete the returned path. This makes worktree_path unreliable for post-run inspection.

Useful? React with 👍 / 👎.

feat(port): SafeMiniRunner concrete impl + executor/observation/actio…

096b13a

…n-parser/worktree modules + integration tests

chatgpt-codex-connector Bot reviewed May 1, 2026

View reviewed changes

JustinJLeopard merged commit b3d3a41 into main May 1, 2026
2 checks passed

JustinJLeopard deleted the refactor/port-from-reference branch May 1, 2026 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port reference impl → SafeMiniRunner + policies + tests#1

Port reference impl → SafeMiniRunner + policies + tests#1
JustinJLeopard merged 1 commit intomainfrom
refactor/port-from-reference

JustinJLeopard commented May 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		provisioner = WorktreeProvisioner(self.repo_path)
		cwd = provisioner.provision()

Conversation

JustinJLeopard commented May 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants