Skip to content

Add plan review pass and improve Codex streaming output#14

Merged
axeldelafosse merged 1 commit intomainfrom
feat/plan-review-pass
Feb 24, 2026
Merged

Add plan review pass and improve Codex streaming output#14
axeldelafosse merged 1 commit intomainfrom
feat/plan-review-pass

Conversation

@axeldelafosse
Copy link
Copy Markdown
Owner

Summary

  • make --proof optional and only inject proof requirements into prompts when non-empty
  • add --review-plan [other|claude|codex] plus a plan-review step after plain-text prompt planning
  • improve Codex app-server pretty rendering by parsing raw delta/completed events, preserving nested text, and inserting clean message boundaries
  • update README/help text and expand args/prompts/task/runner tests for the new behavior

Testing

  • bun run check
  • bun test

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @axeldelafosse, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the loop tool by making proof requirements optional and introducing an automated plan review process, improving the robustness of task execution. Additionally, it refines the user experience for Codex users by providing a much cleaner and more readable streaming output, ensuring that complex agent interactions are easier to follow.

Highlights

  • Optional Proof Requirements: The --proof command-line argument is now optional. Proof requirements will only be injected into prompts if a non-empty value is provided.
  • Automatic Plan Review Pass: A new --review-plan [other|claude|codex] option has been introduced, enabling an automatic review step for PLAN.md after plain-text prompts are used for planning. By default, the review is performed by the 'other' model (the one not chosen as the primary agent).
  • Improved Codex Streaming Output: The pretty rendering of Codex app-server output has been enhanced. It now parses raw delta/completed events, preserves nested text content, and inserts clean message boundaries for better readability.
  • Documentation and Test Updates: The README.md and help text have been updated to reflect the new --proof and --review-plan options. Corresponding tests for arguments, prompts, tasks, and the runner have been expanded to cover the new behavior.
Changelog
  • README.md
    • Updated --proof argument description to indicate it is optional.
    • Added --review-plan argument description and its possible values.
    • Included new example usage for --review-plan.
  • src/loop/args.ts
    • Removed the REQUIRED_PROOF_ERROR constant.
    • Imported PlanReviewMode type.
    • Implemented parsePlanReviewValue to validate --review-plan argument values.
    • Added parsePlanReviewArg to handle parsing of the --review-plan command-line argument.
    • Integrated --review-plan parsing into the consumeArg function.
    • Removed the mandatory check for the --proof argument.
  • src/loop/constants.ts
    • Modified the help text for --proof to remove the 'required' annotation.
    • Added help text for the new --review-plan option.
  • src/loop/prompts.ts
    • Introduced appendProofRequirements helper function to conditionally add proof text.
    • Added buildPlanReviewPrompt for generating prompts specific to plan review.
    • Updated buildWorkPrompt and buildReviewPrompt to utilize the new appendProofRequirements helper.
  • src/loop/runner.ts
    • Defined AppServerEvent and CodexRenderState interfaces for internal state management.
    • Added several utility functions (asRecord, asString, parseJsonLine, collectDeltaText, parseDeltaText, collectMessageText, parseCompletedMessage, parseItemId) for robust parsing of Codex app-server streaming events.
    • Implemented handleDeltaLine and handleCompletedLine to process streaming delta and completion events, respectively.
    • Refactored runCodexAgent to incorporate the new event parsing and rendering logic, ensuring proper message boundaries and nested text preservation in pretty output.
    • Ensured a final newline is printed in pretty output if the last message does not end with one.
  • src/loop/task.ts
    • Imported buildPlanReviewPrompt and PlanReviewMode.
    • Added resolvePlanReviewer function to determine the appropriate agent for plan review based on configuration.
    • Modified runPlanMode to include an automatic plan review step after PLAN.md is created, using the resolved reviewer.
  • src/loop/types.ts
    • Added PlanReviewMode type to define valid options for plan review agents.
    • Included reviewPlan as an optional property in the Options interface.
  • tests/loop/args.test.ts
    • Removed the test case that asserted an error for missing required proof.
    • Updated the test for default argument parsing to reflect that --proof is now optional and defaults to an empty string.
    • Added test cases for --review-plan to verify correct parsing of 'other', 'claude', 'codex' values, the equals form, and error handling for invalid values.
  • tests/loop/prompts.test.ts
    • Imported buildPlanReviewPrompt for testing.
    • Added a test case for buildPlanReviewPrompt to ensure it generates the correct review instructions.
    • Added a test case for buildReviewPrompt to confirm proof requirements are omitted when the proof string is empty.
  • tests/loop/runner.test.ts
    • Updated the mock signature for runCodexTurn to include callbacks for onParsed and onRaw.
    • Added a test case to verify that runAgent inserts a blank line after message completion in pretty mode.
    • Added a test case to confirm runAgent correctly preserves nested delta newline content in pretty mode.
  • tests/loop/task.test.ts
    • Modified makeOptions to accept partial overrides for flexibility in testing.
    • Updated resolveTask test cases to account for the new plan review step, verifying two calls to runAgentMock and the correct reviewer selection.
    • Added test cases to specifically verify the resolvePlanReviewer logic, including the default 'other' model behavior.
    • Added a test case to ensure resolveTask throws an error if the plan review step exits with a non-zero code.
Activity
  • The author has performed local testing using bun run check and bun test to ensure code quality and functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new plan review pass and improves the Codex streaming output. The --proof argument is now optional, and a new --review-plan argument has been added to specify a reviewer for the automatic plan review step. The Codex output rendering has been enhanced to better handle raw delta/completed events, preserving nested text and inserting clean message boundaries. The changes also include updates to the README, help text, and expanded tests for the new functionalities. Overall, the changes improve the flexibility and output clarity of the loop tool.

Comment thread src/loop/task.ts
Comment on lines +40 to +48
const reviewer = resolvePlanReviewer(opts.reviewPlan, opts.agent);
console.log(`\n[loop] reviewing PLAN.md with ${reviewer}.`);
const reviewPrompt = buildPlanReviewPrompt(task);
const review = await runAgent(reviewer, reviewPrompt, opts);
if (review.exitCode !== 0) {
throw new Error(
`[loop] plan review ${reviewer} exited with code ${review.exitCode}`
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new block introduces the plan review step after PLAN.md is created. It uses the resolvePlanReviewer to select the reviewer and throws an error if the review fails. This is a critical new feature for improving plan quality.

Comment thread tests/loop/args.test.ts
Comment on lines +168 to +172
test("parseArgs rejects invalid --review-plan value", () => {
expect(() =>
parseArgs(["--review-plan=claudex", "--proof", "verify"])
).toThrow("Invalid --review-plan value: claudex");
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test correctly asserts that an invalid value for --review-plan (like claudex) throws an error. This is crucial for maintaining data integrity and providing clear error messages to users.

Comment thread tests/loop/runner.test.ts
Comment on lines +182 to +234
test("runAgent inserts pretty-mode blank line after message completion", async () => {
runCodexTurn.mockImplementation((_prompt, _opts, callbacks) => {
callbacks.onRaw(
JSON.stringify({
method: "item/agentMessage/delta",
params: { itemId: "item-1", delta: "I checked the repo." },
})
);
callbacks.onRaw(
JSON.stringify({
method: "item/completed",
params: {
item: {
id: "item-1",
type: "agentMessage",
content: [{ text: "I checked the repo." }],
},
},
})
);
callbacks.onRaw(
JSON.stringify({
method: "item/agentMessage/delta",
params: { itemId: "item-2", delta: "I am updating PLAN.md." },
})
);
return Promise.resolve(makeResult());
});

const originalWrite = process.stdout.write;
const writes: string[] = [];
const writeSpy = mock((chunk: string | Uint8Array): boolean => {
writes.push(
typeof chunk === "string" ? chunk : Buffer.from(chunk).toString("utf8")
);
return true;
});
(process.stdout as { write: typeof writeSpy }).write = writeSpy;

try {
const result = await runAgent(
"codex",
"say hi",
makeOptions({ format: "pretty" })
);
expect(result.parsed).toBe("I checked the repo.\nI am updating PLAN.md.");
expect(writes.join("")).toContain(
"I checked the repo.\n\nI am updating PLAN.md.\n"
);
} finally {
process.stdout.write = originalWrite;
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new test case verifies that runAgent correctly inserts a blank line after a message completion in pretty mode. This is a crucial test for the improved Codex streaming output's readability.

Comment thread tests/loop/runner.test.ts
Comment on lines +236 to +273
test("runAgent preserves nested delta newline content in pretty mode", async () => {
runCodexTurn.mockImplementation((_prompt, _opts, callbacks) => {
callbacks.onRaw(
JSON.stringify({
method: "item/agentMessage/delta",
params: {
delta: {
text: "Heading",
content: [{ text: "\n- one" }, { text: "\n- two" }],
},
},
})
);
return Promise.resolve(makeResult());
});

const originalWrite = process.stdout.write;
const writes: string[] = [];
const writeSpy = mock((chunk: string | Uint8Array): boolean => {
writes.push(
typeof chunk === "string" ? chunk : Buffer.from(chunk).toString("utf8")
);
return true;
});
(process.stdout as { write: typeof writeSpy }).write = writeSpy;

try {
const result = await runAgent(
"codex",
"say hi",
makeOptions({ format: "pretty" })
);
expect(result.parsed).toBe("Heading\n- one\n- two");
expect(writes.join("")).toContain("Heading\n- one\n- two\n");
} finally {
process.stdout.write = originalWrite;
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new test case ensures that runAgent correctly preserves nested newline content within delta events in pretty mode. This is important for accurately rendering complex output from Codex.

Comment thread tests/loop/task.test.ts
Comment on lines +209 to +226
test("resolveTask throws when plan review exits non-zero", async () => {
let calls = 0;
const { resolveTask } = await loadResolveTask({
isFile: (path) => path === "PLAN.md",
runAgent: () => {
calls += 1;
return {
combined: "",
exitCode: calls === 1 ? 0 : 2,
parsed: "",
};
},
});

await expect(resolveTask(makeOptions("ship feature"))).rejects.toThrow(
"[loop] plan review claude exited with code 2"
);
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new test case verifies that resolveTask correctly throws an error when the plan review step exits with a non-zero code. This is important for robust error handling in the new workflow.

@axeldelafosse axeldelafosse merged commit abd6f93 into main Feb 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant