Skip to content

Codex integration fixes + agent version detection#407

Open
openasocket wants to merge 19 commits intoRunMaestro:mainfrom
openasocket:feat/agent-version-detection
Open

Codex integration fixes + agent version detection#407
openasocket wants to merge 19 commits intoRunMaestro:mainfrom
openasocket:feat/agent-version-detection

Conversation

@openasocket
Copy link
Contributor

@openasocket openasocket commented Feb 18, 2026

Summary

  • Fix Codex CLI integration end-to-end (8 bugs across parser, exit handler, stall detection, UX)
  • Add dual-format parser supporting both old (v0.41.0-era) and current Codex CLI JSONL output formats
  • Add agent version detection infrastructure with spawn-time warnings for unsupported versions
  • Display detected agent versions across 3 UX surfaces (Edit Modal, Wizard, AgentCard)

Motivation

Maestro's Codex parser was originally built against Codex CLI v0.41.0, which used a flat item-envelope JSONL format ({ type: 'item.completed', item: {...} }). Subsequent Codex CLI releases (current: v0.103.0) changed to a nested msg-envelope format ({ id, msg: { type: 'agent_message', ... } }). Users who upgraded Codex CLI got zero parsed output with no explanation because Maestro's parser only understood the old format.

Beyond the format mismatch, the initial Codex integration had several bugs that caused:

  • Reasoning text leaking into user-visible results on process exit
  • Tool names lost between tool_call and tool_result events (Codex emits these as separate events)
  • streamedText clobbered by reasoning, breaking result extraction
  • Auto Run sessions hanging indefinitely with no stall detection
  • showThinking defaulting to 'off' for Codex, hiding the primary visibility into agent activity
  • Parser singleton state leaking across sessions

This PR fixes all of the above and adds version detection so Maestro can warn users before they hit these silent failures.

Changes

Codex Integration Fixes (8 commits)

Commit Fix
0cf3fc4f Separate codexPendingResult from streamedText to prevent reasoning text from clobbering the actual result
3f1f3ca8 Default showThinking to 'on' for Codex across 7 tab creation sites; add length bypass for block-style reasoning
e07454d3 Fix Auto Run synopsis extraction for short Codex responses; add result capture logging
d1c83270 Carry tool names across tool_calltool_result events; truncate large tool outputs (10K limit)
d23ece56 Add modelArgs for Codex (-m flag); fix parser singleton config staleness with refreshConfig()
6079a161 Bridge IPC gap for Auto Run stall detection — track activity from thinking/tool events, add 5-min inactivity timeout
b51f7e5c Fix ExitHandler reasoning leak (toolType-aware branching), singleton state leak (lastToolName reset), dead stall detection
4857e93e Dual-format parser — auto-detect old vs new Codex CLI JSONL format via routeMessage(), support both with format-specific transformers

Agent Version Detection (5 commits)

Commit Change
33c7e957 Add detectedVersion field to AgentConfig, generic detectAgentVersion() helper (3s timeout, semver scan), AGENT_MIN_VERSIONS constant
a4c8cbc6 Emit inline [Maestro] Warning in terminal when spawning an agent below minimum version; skip for SSH remote sessions
41bdb4d9 11 unit tests covering all detectAgentVersion() branches (success, failure, timeout, unparseable, empty, unavailable, no versionArgs) + 3 integration tests for version comparison
be33df8f Display detectedVersion in Edit Agent Modal, Wizard agent tiles, and shared AgentCard component
a827f80a Expand version detection to all 7 agents (was only Claude Code + Codex); fix v-prefix parsing (v0.29.00.29.0)

Key Technical Details

Dual-Format Codex Parser

The parser auto-detects the format by inspecting each JSON line:

  • parsed.msg exists → new msg-envelope format (current versions)
  • parsed.type is string → old item-envelope format (v0.41.0-era)
  • parsed.model / parsed.prompt → config metadata line

New format event mapping:

New Event Maps To
task_started type: 'system' with raw.type: 'turn.started' (triggers StdoutHandler turn reset)
agent_reasoning type: 'text' (thinking/reasoning display)
agent_message type: 'result' (final answer)
exec_command_begin type: 'tool_use' with tool name from msg.message.name
exec_command_end type: 'tool_use' with output from msg.output, tool name from call_id map
token_count type: 'usage' with token counts

Version Detection

  • Runs <binary> --version during agent detection (non-blocking, 3s timeout)
  • Parses version by scanning stdout for semver-like tokens (/^\d+\.\d+/)
  • Strips v/V prefix for formats like Gemini CLI v0.29.0
  • Current minimum versions: Codex 0.41.0 (baseline tested format), Claude Code 1.0.0 (stream-json support)
  • All 7 non-terminal agents now have versionArgs: ['--version']

Files Changed (26 files, +1731 / -206)

Main Process

  • src/main/parsers/codex-output-parser.ts — Dual-format parser rewrite (+534 lines)
  • src/main/agents/detector.tsdetectAgentVersion(), AGENT_MIN_VERSIONS
  • src/main/agents/definitions.tsdetectedVersion, versionArgs on all agents
  • src/main/ipc/handlers/process.ts — Spawn-time version warning
  • src/main/process-manager/handlers/StdoutHandler.tscodexPendingResult, turn reset
  • src/main/process-manager/handlers/ExitHandler.ts — toolType-aware result extraction

Renderer

  • src/renderer/App.tsxshowThinking: 'on' defaults for Codex
  • src/renderer/components/NewInstanceModal.tsx — Version in edit modal
  • src/renderer/components/Wizard/screens/AgentSelectionScreen.tsx — Version on wizard tiles
  • src/renderer/components/shared/AgentSelector.tsx — Version in AgentCard
  • src/renderer/components/TerminalOutput.tsx — Codex tool input extraction
  • src/renderer/hooks/agent/useAgentExecution.ts — Codex showThinking defaults
  • src/renderer/hooks/batch/useBatchProcessor.ts — Stall detection timer
  • src/renderer/hooks/batch/useDocumentProcessor.ts — Synopsis extraction fix

Tests

  • src/__tests__/main/parsers/codex-output-parser.test.ts — 80 tests (rewritten)
  • src/__tests__/main/agents/detector.test.ts — 15 new version detection tests

Test plan

  • npm run lint — all TypeScript configs pass
  • npm run test — 19,759 tests pass (464 test files)
  • Manual: spawn Codex agent → verify output parses correctly
  • Manual: open Edit Agent modal → verify version displays next to provider name
  • Manual: step through Wizard → verify version on agent tiles
  • Manual: open New Agent modal → verify version in agent cards
  • Manual: verify no version warning for current Codex (v0.103.0, above minimum v0.41.0)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Agent version detection: displays detected versions on agent tiles and selector
    • Version requirement checks for Codex (v0.41.0+) and Claude Code with warnings
    • Codex sessions now default to showing thinking/reasoning output
    • Batch processing stall detection: automatically skips documents with no progress
    • Activity monitoring for agent tasks with auto-termination after 5 minutes
  • Improvements

    • Enhanced Codex format support for newer message envelope protocol
    • Parser configuration refresh between spawns
    • Tool output truncation with size limits

@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

PR introduces agent version detection with enforced minimum versions, refactors Codex parser for dual format support (item/msg-envelope), implements pending-result tracking for multi-turn handling, adds activity/stall detection for hung tasks, establishes group chat operation locking, and displays detected agent versions in UI.

Changes

Cohort / File(s) Summary
Agent Version Detection
src/main/agents/definitions.ts, src/main/agents/detector.ts, src/__tests__/main/agents/detector.test.ts
Added detectedVersion and versionArgs fields to AgentConfig; introduced AGENT_MIN_VERSIONS mapping; implemented detectAgentVersion with 3-second timeout; integrated version probing into detection flow with comprehensive test coverage for version parsing, timeouts, and format variants.
Codex Capabilities & Test Updates
src/main/agents/capabilities.ts, src/__tests__/main/agents/capabilities.test.ts
Changed Codex supportsResultMessages from false to true, aligning with new msg-envelope format agent_message emission.
Codex Parser Refactor
src/main/parsers/codex-output-parser.ts, src/__tests__/main/parsers/codex-output-parser.test.ts
Refactored to support dual formats (legacy item-envelope and new msg-envelope); added refreshConfig method; introduced internal state (toolNamesByCallId, lastToolName) for tool-name carryover; added format routing, specialized handlers for task_started/agent_reasoning/agent_message/exec_command events; enhanced error detection and tool-output truncation with +700/-12 test expansion.
Parser Interface & Process Handler Integration
src/main/parsers/agent-output-parser.ts, src/main/process-manager/spawners/ChildProcessSpawner.ts
Added optional refreshConfig() method to AgentOutputParser interface; ChildProcessSpawner now calls refreshConfig on parser before spawn if method exists.
Multi-turn Result Handling
src/main/process-manager/types.ts, src/main/process-manager/handlers/StdoutHandler.ts, src/main/process-manager/handlers/ExitHandler.ts
Added codexPendingResult to ManagedProcess interface; StdoutHandler resets state on new Codex turns, accumulates results in codexPendingResult instead of streamedText; ExitHandler prioritizes codexPendingResult over streamedText when emitting final Codex results.
Version Enforcement & Logging
src/main/ipc/handlers/process.ts, src/main/utils/agent-args.ts
Added version check comparing detectedVersion against AGENT_MIN_VERSIONS, emitting warnings for below-minimum versions; added agentVersion to spawn logging; introduced readOnlyMode-aware filtering of batchModeArgs to exclude sandbox-bypass flags when readOnlyArgs present.
Activity & Stall Detection
src/renderer/hooks/agent/useAgentExecution.ts, src/renderer/hooks/agent/useAgentListeners.ts, src/renderer/hooks/batch/useBatchProcessor.ts, src/__tests__/renderer/hooks/useAgentExecution.test.ts
Added 5-minute activity timeout with 30-second check interval for hung task auto-kill; integrated activity signals (onData, onThinkingChunk, onToolExecution); added length-based guards for thinking-chunk processing; implemented per-document stall detection (MAX_CONSECUTIVE_NO_CHANGES=2) with stalled-document tracking and detailed logging.
Group Chat Operations Locking
src/main/group-chat/group-chat-lock.ts, src/main/process-listeners/types.ts, src/main/process-listeners/exit-listener.ts, src/main/process-listeners/__tests__/exit-listener.test.ts, src/main/index.ts
New module provides per-chat operation locks (5-minute stale timeout), synthesis-in-progress tracking, and lock-release mechanisms; exit listener now integrates lock release and synthesis-state clearance; added getPendingParticipants and lock functions to ProcessListenerDependencies.
Document Processing & Result Logging
src/renderer/hooks/batch/useDocumentProcessor.ts
Added structured result logging (agent type, success, response details, elapsed time); enhanced short-summary extraction with useFullText logic for determining whether to use full response or first paragraph as source.
UI Version Display & Thinking Defaults
src/renderer/components/Wizard/screens/AgentSelectionScreen.tsx, src/renderer/components/shared/AgentSelector.tsx, src/renderer/App.tsx, src/renderer/hooks/tabs/useTabHandlers.ts, src/renderer/hooks/modal/useModalHandlers.ts, src/renderer/global.d.ts, src/renderer/types/index.ts
Added detectedVersion display in agent tiles and selector cards; updated new-session creation to conditionally enable showThinking for Codex (set to 'on' if agentId is 'codex'), applied in multiple paths (App.tsx, useTabHandlers, useModalHandlers).
Test Infrastructure & Utilities
src/__tests__/main/process-manager/handlers/StdoutHandler.test.ts, src/__tests__/main/utils/agent-args.test.ts
Enhanced StdoutHandler tests with createCodexParser helper and reasoning-message event coverage; added readOnlyMode interaction tests with batchModeArgs filtering validation.
Formatting & Configuration
.prettierignore, package.json, src/__tests__/renderer/components/DocumentGraph/mindMapLayouts.test.ts, src/__tests__/renderer/components/TerminalOutput.test.tsx, src/__tests__/renderer/components/auto-scroll.test.tsx, src/__tests__/renderer/hooks/useBatchProcessor.test.ts
Updated .prettierignore with image extensions; modified package.json lint-staged to use --ignore-unknown flag; applied multi-line formatting in test files without functional changes.

Sequence Diagram(s)

sequenceDiagram
    participant Detector as AgentDetector
    participant Binary as Agent Binary
    participant VersionCheck as Version Validator
    participant Config as AgentConfig

    Detector->>Binary: spawn with versionArgs (--version)
    activate Binary
    Binary-->>Detector: version output (timeout: 3s)
    deactivate Binary
    
    Detector->>Detector: parseVersionString(output)
    Detector->>Config: set detectedVersion
    
    VersionCheck->>Config: read detectedVersion
    VersionCheck->>VersionCheck: compareVersions(detected, MIN_VERSION)
    alt version < minimum
        VersionCheck->>VersionCheck: emit warning
    else version >= minimum
        VersionCheck->>VersionCheck: proceed normally
    end
Loading
sequenceDiagram
    participant Parser as CodexOutputParser
    participant Line as OutputLine
    participant Router as routeMessage
    participant Handlers as Format Handlers
    participant State as Internal State

    Line->>Parser: incoming message line
    Parser->>Parser: parseJSON(line)
    
    Router->>Router: detect envelope structure
    alt msg-envelope detected
        Router->>Handlers: transformNewFormat()
        Handlers->>Handlers: route by event.type
        Handlers->>State: manage toolNamesByCallId
    else item-envelope detected
        Router->>Handlers: transformOldFormat()
        Handlers->>State: manage lastToolName
    else config/non-envelope
        Router->>Handlers: transformConfigLine()
        Handlers->>State: update model, contextWindow
    end
    
    Handlers->>Parser: return formatted result
Loading
sequenceDiagram
    participant Agent as Agent Execution
    participant Activity as Activity Tracker
    participant Monitor as Watchdog Monitor
    participant Process as Process Manager

    Agent->>Activity: markActivity() on onData
    Agent->>Activity: markActivity() on onThinkingChunk
    Agent->>Activity: markActivity() on onToolExecution
    
    Monitor->>Activity: check lastActivityTime (30s interval)
    Monitor->>Monitor: elapsed = now - lastActivityTime
    
    alt elapsed > 5 minutes
        Monitor->>Monitor: log inactivity
        Monitor->>Process: kill(taskId)
        Monitor->>Monitor: clear activity interval
    else elapsed <= 5 minutes
        Monitor->>Monitor: continue monitoring
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 79.31% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: Codex integration fixes (multi-format parser, exit handler, stall detection) and agent version detection (detection logic, UI display, spawn-time warnings). It is specific, actionable, and reflects the primary scope of the 26-file changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/renderer/hooks/agent/useAgentExecution.ts`:
- Around line 202-229: The watchdog currently swallows errors from
window.maestro.process.kill(...) via .catch(() => {}); update the inactivity
check in useAgentExecution (the interval using ACTIVITY_TIMEOUT_MS and
activityCheckInterval) to surface kill failures by replacing the empty catch
with an error reporter: call the app logger at error level with context
(targetSessionId, idleMs, agentType) and invoke the Sentry utility
captureException (imported from src/utils/sentry) with the caught error and the
same context so failures are recorded; do not silently suppress the exception.

Comment on lines 202 to 229
// Activity tracking for per-task stall detection.
// Resets on any signal: data output, thinking chunks, or tool execution.
// This bridges the IPC channel gap for agents like Codex whose reasoning
// and tool events travel on separate channels from data output.
let lastActivityTime = Date.now();
const markActivity = () => {
lastActivityTime = Date.now();
};

// Per-task activity timeout: if no activity (data, thinking, tool) for
// this duration, assume the agent is hung and kill it. This catches
// mid-execution hangs that the post-execution stall detector in
// useBatchProcessor cannot detect (it only triggers after task completion).
const ACTIVITY_TIMEOUT_MS = 5 * 60 * 1000; // 5 minutes
const activityCheckInterval = setInterval(() => {
const idleMs = Date.now() - lastActivityTime;
if (idleMs > ACTIVITY_TIMEOUT_MS) {
clearInterval(activityCheckInterval);
window.maestro.logger.log('warn', 'Auto Run task killed due to inactivity', 'AgentExecution', {
targetSessionId,
idleMs,
agentType: session.toolType,
});
// Kill the hung process — onExit listener will fire and resolve the promise
window.maestro.process.kill(targetSessionId).catch(() => {});
}
}, 30_000); // Check every 30 seconds

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t silently swallow watchdog kill failures.

kill() errors are currently ignored, which hides failures in the inactivity watchdog. Please capture and report the error so it surfaces in Sentry.

Proposed fix
-import { useCallback, useRef } from 'react';
+import { useCallback, useRef } from 'react';
+import { captureException } from '../../../utils/sentry';
@@
-							window.maestro.process.kill(targetSessionId).catch(() => {});
+							window.maestro.process.kill(targetSessionId).catch((err) => {
+								captureException(err, {
+									contexts: {
+										activityWatchdog: {
+											targetSessionId,
+											agentType: session.toolType,
+											idleMs,
+										},
+									},
+								});
+							});

As per coding guidelines: "Do NOT silently swallow exceptions with try-catch-console.error blocks. Let unhandled exceptions bubble up to Sentry for error tracking. Handle only expected/recoverable errors explicitly with specific error codes. Let unexpected errors bubble up to Sentry. Use Sentry utilities (captureException, captureMessage) from 'src/utils/sentry' for explicit error reporting with context."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Activity tracking for per-task stall detection.
// Resets on any signal: data output, thinking chunks, or tool execution.
// This bridges the IPC channel gap for agents like Codex whose reasoning
// and tool events travel on separate channels from data output.
let lastActivityTime = Date.now();
const markActivity = () => {
lastActivityTime = Date.now();
};
// Per-task activity timeout: if no activity (data, thinking, tool) for
// this duration, assume the agent is hung and kill it. This catches
// mid-execution hangs that the post-execution stall detector in
// useBatchProcessor cannot detect (it only triggers after task completion).
const ACTIVITY_TIMEOUT_MS = 5 * 60 * 1000; // 5 minutes
const activityCheckInterval = setInterval(() => {
const idleMs = Date.now() - lastActivityTime;
if (idleMs > ACTIVITY_TIMEOUT_MS) {
clearInterval(activityCheckInterval);
window.maestro.logger.log('warn', 'Auto Run task killed due to inactivity', 'AgentExecution', {
targetSessionId,
idleMs,
agentType: session.toolType,
});
// Kill the hung process — onExit listener will fire and resolve the promise
window.maestro.process.kill(targetSessionId).catch(() => {});
}
}, 30_000); // Check every 30 seconds
import { useCallback, useRef } from 'react';
import { captureException } from '../../../utils/sentry';
// ... other code ...
// Activity tracking for per-task stall detection.
// Resets on any signal: data output, thinking chunks, or tool execution.
// This bridges the IPC channel gap for agents like Codex whose reasoning
// and tool events travel on separate channels from data output.
let lastActivityTime = Date.now();
const markActivity = () => {
lastActivityTime = Date.now();
};
// Per-task activity timeout: if no activity (data, thinking, tool) for
// this duration, assume the agent is hung and kill it. This catches
// mid-execution hangs that the post-execution stall detector in
// useBatchProcessor cannot detect (it only triggers after task completion).
const ACTIVITY_TIMEOUT_MS = 5 * 60 * 1000; // 5 minutes
const activityCheckInterval = setInterval(() => {
const idleMs = Date.now() - lastActivityTime;
if (idleMs > ACTIVITY_TIMEOUT_MS) {
clearInterval(activityCheckInterval);
window.maestro.logger.log('warn', 'Auto Run task killed due to inactivity', 'AgentExecution', {
targetSessionId,
idleMs,
agentType: session.toolType,
});
// Kill the hung process — onExit listener will fire and resolve the promise
window.maestro.process.kill(targetSessionId).catch((err) => {
captureException(err, {
contexts: {
activityWatchdog: {
targetSessionId,
agentType: session.toolType,
idleMs,
},
},
});
});
}
}, 30_000); // Check every 30 seconds
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/hooks/agent/useAgentExecution.ts` around lines 202 - 229, The
watchdog currently swallows errors from window.maestro.process.kill(...) via
.catch(() => {}); update the inactivity check in useAgentExecution (the interval
using ACTIVITY_TIMEOUT_MS and activityCheckInterval) to surface kill failures by
replacing the empty catch with an error reporter: call the app logger at error
level with context (targetSessionId, idleMs, agentType) and invoke the Sentry
utility captureException (imported from src/utils/sentry) with the caught error
and the same context so failures are recorded; do not silently suppress the
exception.

@openasocket
Copy link
Contributor Author

CodeRabbit Review — Fix Applied

Finding: Watchdog kill failure silently swallowed (⚠️ Potential issue | 🟠 Major)

File: src/renderer/hooks/agent/useAgentExecution.ts:226

Issue: The inactivity watchdog's .catch(() => {}) on process.kill() was silently discarding errors when the watchdog failed to kill a hung Auto Run process. Per project error handling guidelines, exceptions should not be swallowed — they need to surface for diagnostics.

Fix (commit 9c7147fc): Replaced the empty catch with structured error logging:

// Before
window.maestro.process.kill(targetSessionId).catch(() => {});

// After
window.maestro.process.kill(targetSessionId).catch((err) => {
    window.maestro.logger.log('error', 'Failed to kill hung Auto Run process', 'AgentExecution', {
        targetSessionId,
        agentType: session.toolType,
        idleMs,
        error: String(err),
    });
});

Used window.maestro.logger rather than direct Sentry captureException because the renderer has no Sentry utility wrapper — only main.tsx imports @sentry/electron/renderer directly. The logger route ensures the error is recorded in the application log with full context (session ID, agent type, idle duration).

🤖 Generated with Claude Code

… debug logs

- Add codexPendingResult field to ManagedProcess to separate result
  capture from reasoning text accumulation in streamedText
- Update StdoutHandler to store agent_message results in
  codexPendingResult and flush from there at turn.completed
- Update ExitHandler fallback to prefer codexPendingResult over
  streamedText for Codex processes
- Add turn.started reset to clear resultEmitted, codexPendingResult,
  and streamedText for multi-turn session support
- Set supportsResultMessages: true for Codex in capabilities.ts
- Remove 4 debug console.log statements from StdoutHandler.ts and
  GroupChat debug block from ExitHandler.ts
- Update existing test to verify codexPendingResult usage and add new
  test ensuring reasoning text is never emitted as result

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/renderer/hooks/agent/useAgentExecution.ts`:
- Around line 220-233: The watchdog kill catch currently only logs and swallows
errors; import Sentry helpers (captureException, captureMessage) from
'src/utils/sentry' and, inside the catch for
window.maestro.process.kill(targetSessionId), call captureException(err, {
extra: { targetSessionId, agentType: session.toolType, idleMs } }) (or
captureMessage for non-throw values) and then rethrow or return
Promise.reject(err) so the failure surfaces; ensure you only swallow
known/recoverable errors by explicitly checking error codes before swallowing
and keep the existing window.maestro.logger.log('error', ...) call for local
logs.

openasocket and others added 13 commits February 18, 2026 01:59
…to 'on' for Codex agents, add length bypass for block-style reasoning

- Override showThinking default to 'on' for Codex agents across 7 tab creation sites in App.tsx
  (worktree tabs, new tabs, new sessions, error recovery) since reasoning is the primary
  visibility into Codex agent activity
- Add length-based bypass (<=500 chars) for isLikelyConcatenatedToolNames checks in
  useAgentListeners.ts — Codex emits reasoning as complete blocks, not incremental streams,
  so large content should not be misidentified as malformed tool names
- Verified: isResultMessage correctly excludes reasoning events (type 'text' vs 'result'),
  no thinking/result duplication occurs
- Verified: TerminalOutput renders thinking with accent border + MarkdownRenderer, correctly
  displaying formatReasoningText's **section** markers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…re logging and completion safety checks

- Add logger.info at result capture point in useDocumentProcessor with agent type, response length, and document name for debugging Codex Auto Run flows
- Add safety warning log when agent exits successfully but captures no result text (prevents silent failures with Codex)
- Improve synopsis extraction for short responses: use entire cleaned text as summary source for responses < 500 chars that don't follow structured **Summary:**/**Details:** format (agent-agnostic, benefits Codex's concise response style)
- Document absence of per-task timeout in useBatchProcessor with note about Promise.race() as future enhancement

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s events, truncate large outputs, add Codex arg extraction

- Add lastToolName tracking to carry tool name from tool_call to tool_result events
  (Codex emits these as separate item.completed events, losing the name on tool_result)
- Truncate tool output to 10K chars to prevent UI performance degradation from large outputs
- Add Codex-specific tool input extraction (path, cmd, code, content) in TerminalOutput
- Add 10 new tests covering tool name carryover and output truncation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… selection via -m flag, fix singleton config staleness

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…add activity tracking from thinking/tool events, guard responseText, add task completion logging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ing leak, singleton state leak, dead stall detection

- ExitHandler fallback now uses toolType-aware branching: Codex only emits
  codexPendingResult (never reasoning from streamedText), other agents keep
  streamedText fallback. Fixes reasoning text appearing as user-visible result
  on Codex process exit without agent_message.
- ExitHandler JSON buffer path adds codexPendingResult to fallback chain
  (event.text → codexPendingResult → streamedText) for consistency.
- CodexOutputParser.refreshConfig() now resets lastToolName to prevent tool
  name leakage across sessions (parser is a singleton).
- Wire up lastActivityTime as actual stall detector: 5-minute inactivity
  timeout kills hung Auto Run agent processes mid-execution via setInterval
  check. Interval cleared on normal exit via cleanupFns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-detection in parser

Support both old (item-envelope, ≤v0.x) and new (msg-envelope, v0.41.0+) Codex CLI
output formats with automatic detection. New format uses { id, msg: { type, ... } }
envelope with exec_command_begin/end, agent_reasoning, agent_message, token_count
events. Tool name carryover uses call_id mapping for concurrent commands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… field, detectAgentVersion() helper, AGENT_MIN_VERSIONS constants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…orted version

Emits an inline [Maestro] warning in the terminal when an agent's detected
version is below AGENT_MIN_VERSIONS. Skipped for SSH remote sessions where
local version is irrelevant. Also logs agentVersion in spawn debug metadata.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…vering all detectAgentVersion() branches

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tCard

Add detectedVersion field to renderer AgentConfig types (global.d.ts and
types/index.ts) and display it on 3 UI surfaces:

- Edit Agent Modal: version shown inline after agent provider name
- Wizard agent tiles: version shown below description for detected agents
- AgentCard (shared): version appended after path with dot separator

All version display is conditional — agents without detectedVersion render
identically to before. No layout shifts, no new IPC calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5 agents, fix v-prefix parsing

- Fix detectAgentVersion() to strip leading v/V prefix before matching version tokens
  (e.g., "Gemini CLI v0.29.0" → "0.29.0", "Factory Droid V0.4.0" → "0.4.0")
- Add versionArgs: ['--version'] to Gemini CLI, Qwen3 Coder, OpenCode, Factory Droid, and Aider
- Add 4 new tests: v-prefix, uppercase V, path-prefixed output, and aider format
- No changes to AGENT_MIN_VERSIONS (no enforcement for newly-probed agents)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Log error details (targetSessionId, agentType, idleMs) when the
inactivity watchdog fails to kill a hung Auto Run process, instead
of silently ignoring the failure with .catch(() => {}).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openasocket openasocket force-pushed the feat/agent-version-detection branch from 9c7147f to 31dc215 Compare February 18, 2026 07:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/renderer/hooks/batch/useBatchProcessor.ts (1)

820-835: ⚠️ Potential issue | 🟠 Major

Stalled documents can re-run forever in loop mode.
Stalled docs are skipped for the current pass but are retried in later loops, and anyTasksProcessedThisIteration stays true even when no progress is made. If all documents stall and maxLoops is unset, Auto Run can loop indefinitely.

Suggested fix
 			// Process each document in order
 			for (let docIndex = 0; docIndex < documents.length; docIndex++) {
+				const docEntry = documents[docIndex];
+				if (stalledDocuments.has(docEntry.filename)) {
+					continue;
+				}
-				const docEntry = documents[docIndex];
@@
 			}
+
+			if (loopEnabled && stalledDocuments.size === documents.length) {
+				addFinalLoopSummary('All documents stalled');
+				break;
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/hooks/batch/useBatchProcessor.ts` around lines 820 - 835, The
loop can run forever because stalled documents are retried and still count as
progress; change the logic so that stalled documents are permanently skipped
across loops (persist stalledDocuments between passes) and ensure skipping a doc
due to hitting MAX_CONSECUTIVE_NO_CHANGES does NOT set
anyTasksProcessedThisIteration to true. Concretely, update the per-document
stall tracking (consecutiveNoChangeCount and stalledDocuments) so once a doc is
added to stalledDocuments it is excluded from future iterations, and adjust the
main loop condition to break when no non-stalled documents made progress in an
iteration (treat stalls as no progress) so the auto-run terminates when all
remaining docs are stalled or when maxLoops is reached/unspecified.
🧹 Nitpick comments (1)
src/renderer/App.tsx (1)

5228-5230: Align Codex default thinking across all session creation paths.

Right now this sets Codex showThinking only for direct creation. Wizard- and Symphony-created Codex sessions still use defaultShowThinking (or omit it), so the UX diverges depending on how the agent is created. Consider centralizing this default and reusing it in those paths for consistency.

♻️ Suggested refactor (reuse a single helper)
-				showThinking: agentId === 'codex' ? 'on' : defaultShowThinking,
+				showThinking: getInitialShowThinking(agentId, defaultShowThinking),
// e.g., near createNewSession helpers
const getInitialShowThinking = (agentId: ToolType, fallback: ThinkingMode) =>
	agentId === 'codex' ? 'on' : fallback;

Apply the same helper where initial tabs are created in the wizard and symphony flows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/App.tsx` around lines 5228 - 5230, Centralize the Codex default
for showThinking by adding a helper (e.g., getInitialShowThinking(agentId:
ToolType, fallback: ThinkingMode)) that returns 'on' when agentId === 'codex'
and fallback otherwise, then replace the inline ternary at showThinking (and any
other locations that set initial tabs or session options) with calls to this
helper; update the createNewSession helpers plus the wizard and symphony
session/tab creation code paths to call getInitialShowThinking when initializing
showThinking so all session creation flows use the same logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/parsers/codex-output-parser.ts`:
- Around line 464-466: The fallback chain for building the tool output in
codex-output-parser is missing msg.stderr and should prefer
msg.aggregated_output even if empty; update the expression that sets rawOutput
(used by truncateToolOutput) to use nullish coalescing for aggregated_output and
include stderr in the chain (e.g., msg.aggregated_output ?? msg.stderr ??
msg.stdout ?? msg.formatted_output ?? '') so that stderr from the
CodexNewExecCommandEnd payload is returned when other fields are absent; locate
this logic where rawOutput is defined and where truncateToolOutput(rawOutput) is
called.

In `@src/renderer/components/shared/AgentSelector.tsx`:
- Around line 129-133: The JSX block inside the AgentSelector component uses
space indentation for the newly added lines (the <p> element showing {agent.path
?? agent.command} and the conditional {agent.detectedVersion} span); re-indent
those lines and any nested JSX to use tabs (matching the file's existing
tab-based style) so the <p> line, the {agent.detectedVersion && (...)} span, and
their closing tags align with surrounding code using tabs instead of spaces.

---

Outside diff comments:
In `@src/renderer/hooks/batch/useBatchProcessor.ts`:
- Around line 820-835: The loop can run forever because stalled documents are
retried and still count as progress; change the logic so that stalled documents
are permanently skipped across loops (persist stalledDocuments between passes)
and ensure skipping a doc due to hitting MAX_CONSECUTIVE_NO_CHANGES does NOT set
anyTasksProcessedThisIteration to true. Concretely, update the per-document
stall tracking (consecutiveNoChangeCount and stalledDocuments) so once a doc is
added to stalledDocuments it is excluded from future iterations, and adjust the
main loop condition to break when no non-stalled documents made progress in an
iteration (treat stalls as no progress) so the auto-run terminates when all
remaining docs are stalled or when maxLoops is reached/unspecified.

---

Duplicate comments:
In `@src/renderer/hooks/agent/useAgentExecution.ts`:
- Around line 202-233: The watchdog's kill failure currently only logs and
swallows all errors—update the activityCheckInterval handler so the catch on
window.maestro.process.kill(targetSessionId) distinguishes expected benign
errors (e.g., already-exited) and ignores those, but for any unexpected error
call Sentry utilities (import captureException/captureMessage from
src/utils/sentry) to report with context (targetSessionId, idleMs,
session.toolType), and still call window.maestro.logger.log('error', ...) with
the same context; keep clearing the interval and existing log messages around
ACTIVITY_TIMEOUT_MS and lastActivityTime/markActivity unchanged.

---

Nitpick comments:
In `@src/renderer/App.tsx`:
- Around line 5228-5230: Centralize the Codex default for showThinking by adding
a helper (e.g., getInitialShowThinking(agentId: ToolType, fallback:
ThinkingMode)) that returns 'on' when agentId === 'codex' and fallback
otherwise, then replace the inline ternary at showThinking (and any other
locations that set initial tabs or session options) with calls to this helper;
update the createNewSession helpers plus the wizard and symphony session/tab
creation code paths to call getInitialShowThinking when initializing
showThinking so all session creation flows use the same logic.

Comment on lines 464 to 466
// Use aggregated_output or stdout, with truncation
const rawOutput = msg.aggregated_output || msg.stdout || msg.formatted_output || '';
const output = this.truncateToolOutput(rawOutput);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file and understand the context around lines 464-466
cat -n src/main/parsers/codex-output-parser.ts | sed -n '450,480p'

Repository: RunMaestro/Maestro

Length of output: 1028


🏁 Script executed:

# Let's see the full context of the function containing these lines
cat -n src/main/parsers/codex-output-parser.ts | sed -n '440,500p'

Repository: RunMaestro/Maestro

Length of output: 2136


🏁 Script executed:

# Search for where msg object is created and how fields are populated
rg "aggregated_output|exec_command_end" src/main/parsers/ -A 3 -B 3

Repository: RunMaestro/Maestro

Length of output: 4437


🏁 Script executed:

# Look for message type definitions or interfaces
rg "interface.*exec_command_end|type.*exec_command_end" src/

Repository: RunMaestro/Maestro

Length of output: 1410


🏁 Script executed:

# Search for stderr usage in this file
rg "stderr" src/main/parsers/codex-output-parser.ts -B 2 -A 2

Repository: RunMaestro/Maestro

Length of output: 765


Include stderr in the exec_command_end output fallback chain.

The CodexNewExecCommandEnd interface includes a stderr field, but the current code at line 465 omits it from the output fallback. If a command writes only to stderr and aggregated_output, stdout, and formatted_output are all missing or empty, the tool result will be empty instead of including the stderr output.

Proposed fix
-		const rawOutput = msg.aggregated_output || msg.stdout || msg.formatted_output || '';
+		const rawOutput =
+			msg.aggregated_output ??
+			(msg.stdout || msg.stderr || msg.formatted_output || '');

Using ?? (nullish coalescing) for aggregated_output also ensures that if it is explicitly provided—even as an empty string—it takes precedence; the fallback chain activates only when aggregated_output is undefined or null.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Use aggregated_output or stdout, with truncation
const rawOutput = msg.aggregated_output || msg.stdout || msg.formatted_output || '';
const output = this.truncateToolOutput(rawOutput);
// Use aggregated_output or stdout, with truncation
const rawOutput =
msg.aggregated_output ??
(msg.stdout || msg.stderr || msg.formatted_output || '');
const output = this.truncateToolOutput(rawOutput);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/parsers/codex-output-parser.ts` around lines 464 - 466, The fallback
chain for building the tool output in codex-output-parser is missing msg.stderr
and should prefer msg.aggregated_output even if empty; update the expression
that sets rawOutput (used by truncateToolOutput) to use nullish coalescing for
aggregated_output and include stderr in the chain (e.g., msg.aggregated_output
?? msg.stderr ?? msg.stdout ?? msg.formatted_output ?? '') so that stderr from
the CodexNewExecCommandEnd payload is returned when other fields are absent;
locate this logic where rawOutput is defined and where
truncateToolOutput(rawOutput) is called.

Comment on lines 129 to 133
<p className="text-xs truncate" style={{ color: theme.colors.textDim }}>
{agent.path ?? agent.command}
{agent.detectedVersion && (
<span style={{ opacity: 0.7 }}> · v{agent.detectedVersion}</span>
)}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix indentation to tabs in this block.

The newly added lines are still indented with spaces; please re-indent using tabs to match repo style.

As per coding guidelines, src/**/*.{ts,tsx,js,jsx}: Use tabs for indentation, not spaces. Always match existing file indentation when editing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/components/shared/AgentSelector.tsx` around lines 129 - 133, The
JSX block inside the AgentSelector component uses space indentation for the
newly added lines (the <p> element showing {agent.path ?? agent.command} and the
conditional {agent.detectedVersion} span); re-indent those lines and any nested
JSX to use tabs (matching the file's existing tab-based style) so the <p> line,
the {agent.detectedVersion && (...)} span, and their closing tags align with
surrounding code using tabs instead of spaces.

@pedramamini
Copy link
Collaborator

@greptile what do you think about this PR?

@greptile-apps
Copy link

greptile-apps bot commented Feb 18, 2026

This is a high-quality infrastructure PR that solves real user pain points with a disciplined, test-driven approach. Here's my assessment:

Strengths

1. Clear Problem Statement
The motivation is excellent—users upgrading Codex CLI got "zero parsed output with no explanation" due to a silent format incompatibility. This is a catastrophic UX failure, and the PR addresses it head-on.

2. Comprehensive Fix Scope
Rather than band-aiding the parser, you identified and fixed 8 related bugs:

  • Output clobbering between reasoning/results (codexPendingResult separation)
  • Tool name tracking across event boundaries
  • Stall detection for Auto Run
  • Config staleness (refreshConfig())
  • Exit handler reasoning leaks

This shows strong systems thinking—you traced the symptom to its root causes.

3. Backward Compatibility
The dual-format parser (routeMessage() auto-detection) is the right approach. Supporting both v0.41.0 and v0.103.0 formats means no breaking changes for users on older Codex versions while unblocking upgrades.

4. Proactive Observability
Version detection with spawn-time warnings is excellent defensive UX. The detectedVersion field surfaced across 3 UI locations (Edit Modal, Wizard, AgentCard) gives users immediate visibility into potential issues.

5. Test Coverage

  • 80 rewritten parser tests
  • 15 new version detection tests (including edge cases: timeout, unparseable, empty)
  • 19,759 total tests passing

This is exemplary. The unit test breakdown (success/failure/timeout/unparseable) shows you thought through failure modes.

Concerns / Questions

1. Version Detection Timeout
The 3-second timeout for --version calls is reasonable, but what happens in the (rare) case where it times out during spawn?

  • Does the agent still spawn, just without version info?
  • Is the warning message clear that version detection failed (vs. version is actually too old)?

2. Tool Output Truncation
d1c83270 adds a 10K character limit for tool outputs. This is pragmatic for UI performance, but:

  • Is truncation logged somewhere? (so users know they're not seeing the full output)
  • Could this break workflows where tool outputs are piped between tools?

3. Stall Detection Timeout
The 5-minute inactivity timeout for Auto Run is hardcoded. Is this tuneable? Some workflows (e.g., long test suites, large file processing) might legitimately take >5min between tool calls.

4. Breaking Change Risk
The showThinking: 'on' default change for Codex is mentioned as a fix, but this changes existing behavior for users who had implicit 'off' defaults. Was this intentional? (seems right, but worth confirming)

5. AGENT_MIN_VERSIONS Maintenance
AGENT_MIN_VERSIONS is now a central constant. How do you plan to keep this updated as agent CLIs evolve? Is there a process to test new versions before updating minimums?

Minor Nits

  • Commit 4857e93e (+534 lines): This is a large single commit for the dual-format parser. Consider splitting format detection (routeMessage) from transformer logic in future similar changes.
  • File naming: codex-output-parser.ts is now multi-format. Consider renaming to codex-parser.ts or adding a comment clarifying it handles both formats.
  • Version display UX: Does the version badge have a tooltip explaining why it matters? (e.g., "Minimum v0.41.0 required")

Verdict

Merge with confidence. This PR demonstrates mature engineering:

  • Solves a critical user-facing bug
  • Adds defensive infrastructure (version detection)
  • Maintains backward compatibility
  • Includes comprehensive tests

The 8-bug fix scope shows you didn't just patch the immediate issue—you hardened the entire Codex integration path. The test-driven approach (80 parser tests, 15 version tests) gives me confidence this won't regress.

Recommended follow-ups (not blockers):

  1. Add observability for tool output truncation (log when 10K limit is hit)
  2. Make stall timeout configurable (or at least document it)
  3. Add telemetry to track version distribution in the wild (helps prioritize future compatibility work)

Ship it. 🚀

…se chat lock on moderator failure

- agent-args.ts: Only filter --dangerously-bypass-approvals-and-sandbox from
  batchModeArgs in readOnly mode, preserving --skip-git-repo-check which Codex
  requires even in read-only sandbox mode
- exit-listener.ts: Release chat lock when moderator exits with no output,
  empty parsed output, or chat load failure — prevents chats from becoming
  permanently locked and undeletable
- Updated agent-args tests for new selective filtering behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openasocket openasocket reopened this Feb 28, 2026
@greptile-apps
Copy link

greptile-apps bot commented Feb 28, 2026

Greptile Summary

This PR comprehensively fixes Codex CLI integration end-to-end and adds agent version detection infrastructure across the application.

Codex Integration Fixes (8 bugs resolved)

  • Dual-format parser now auto-detects and supports both old (v0.41.0-era item-envelope) and current (v0.103.0+ msg-envelope) Codex CLI JSONL formats via routeMessage(), solving the silent parsing failure users experienced after upgrading Codex CLI
  • Separated codexPendingResult from streamedText in StdoutHandler to prevent reasoning text from clobbering actual results
  • Tool-type-aware result extraction in ExitHandler prevents reasoning leaks by using codexPendingResult for Codex (not streamedText)
  • Turn reset on turn.started events enables multi-turn Codex sessions by resetting result guard
  • Tool names now carried across tool_calltool_result and exec_command_beginexec_command_end event pairs using lastToolName and toolNamesByCallId maps
  • Tool output truncated at 10K characters to prevent UI freezing on large outputs
  • Parser singleton state properly managed via refreshConfig() called before each spawn in ChildProcessSpawner
  • showThinking defaults to 'on' for Codex across 7 tab creation sites, restoring visibility into agent activity
  • Auto Run stall detection added with 5-minute inactivity timeout tracked across data/thinking/tool events, prevents indefinite hanging

Version Detection Infrastructure

  • Non-blocking version detection added to all 7 agents with versionArgs: ['--version']
  • detectAgentVersion() runs with 3s timeout during agent detection, strips v-prefix, parses semver from stdout
  • Inline [Maestro] Warning emitted in terminal when spawning agents below AGENT_MIN_VERSIONS (Codex: 0.41.0, Claude Code: 1.0.0), skipped for SSH remote sessions
  • Detected versions displayed across 3 UX surfaces: Edit Agent Modal, Wizard agent tiles, and AgentCard component

Additional Fixes

  • Read-only mode now filters dangerous --dangerously-bypass-approvals-and-sandbox flag while preserving safe flags like --skip-git-repo-check
  • Group chat locking prevents race conditions during delete/update operations with 5-minute stale lock timeout
  • Synopsis result extraction fixed for short Codex responses

Test Coverage

  • 80+ Codex parser tests covering both formats, auto-detection, tool name carryover, and error handling
  • 15 version detection tests covering success, timeout, parse failures, v-prefix stripping
  • All 19,759 tests pass across 464 test files

Confidence Score: 5/5

  • This PR is safe to merge with high confidence
  • Comprehensive solution to documented problems with excellent test coverage (95+ new tests), backward compatibility maintained for old Codex format, non-blocking version detection with proper timeout handling, all edge cases covered, and all 19,759 existing tests pass. Changes are well-isolated and follow existing patterns
  • No files require special attention - all changes are well-implemented with comprehensive test coverage

Important Files Changed

Filename Overview
src/main/parsers/codex-output-parser.ts Comprehensive dual-format parser rewrite supporting both old (v0.41.0) and new (v0.103.0) Codex CLI JSONL formats with auto-detection, proper singleton handling via refreshConfig(), and tool output truncation at 10K chars
src/main/agents/detector.ts Added non-blocking version detection with 3s timeout, v-prefix stripping, and AGENT_MIN_VERSIONS constant. Returns null gracefully on failures without blocking agent detection
src/main/process-manager/handlers/StdoutHandler.ts Added codexPendingResult field separate from streamedText to prevent reasoning clobbering, turn reset on turn.started for multi-turn sessions, and Codex-specific result emission at turn completion
src/main/process-manager/handlers/ExitHandler.ts Fixed tool-type-aware result extraction at exit - uses codexPendingResult for Codex (not streamedText which contains reasoning) to prevent reasoning leaks into final results
src/main/ipc/handlers/process.ts Emits inline version warnings when detectedVersion < AGENT_MIN_VERSIONS, skipped for SSH remote sessions. Warning appears in terminal before agent starts
src/renderer/hooks/agent/useAgentExecution.ts Added per-task stall detection with 5-minute inactivity timeout tracked across data/thinking/tool events, prevents Auto Run from hanging indefinitely on stalled agents
src/main/utils/agent-args.ts Filters dangerous sandbox-bypass flags when read-only mode is active, preserves safe flags like --skip-git-repo-check. Prevents conflicts with --sandbox read-only
src/main/group-chat/group-chat-lock.ts New file implementing chat-level operation locks and synthesis guards to prevent race conditions in group chat delete/update operations during processing, with 5-minute stale lock timeout

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start([Agent Spawn Request]) --> Detect[Agent Detection]
    Detect --> VersionCheck{Has versionArgs?}
    VersionCheck -->|Yes| RunVersion[Run --version<br/>3s timeout]
    VersionCheck -->|No| Skip[Skip version check]
    RunVersion --> ParseVer[Parse semver<br/>Strip v-prefix]
    ParseVer --> StoreVer[Store detectedVersion]
    Skip --> StoreVer
    StoreVer --> CompareVer{Version < minimum?}
    CompareVer -->|Yes & not SSH| WarnUser[Emit inline warning<br/>to terminal]
    CompareVer -->|No| SpawnProc
    WarnUser --> SpawnProc[Spawn Process]
    
    SpawnProc --> InitParser[Initialize Parser]
    InitParser --> RefreshConfig[refreshConfig<br/>Read ~/.codex/config.toml]
    RefreshConfig --> StreamData[Stream JSONL Output]
    
    StreamData --> ParseLine{Parse JSON Line}
    ParseLine --> RouteMsg[routeMessage]
    RouteMsg --> DetectFormat{Has msg field?}
    
    DetectFormat -->|Yes| NewFormat[New Format Handler<br/>msg-envelope]
    DetectFormat -->|No| OldFormat[Old Format Handler<br/>item-envelope]
    
    NewFormat --> NewEvents[task_started<br/>agent_reasoning<br/>agent_message<br/>exec_command_begin/end<br/>token_count]
    OldFormat --> OldEvents[thread.started<br/>turn.started<br/>item.completed<br/>turn.completed]
    
    NewEvents --> Transform[Transform to ParsedEvent]
    OldEvents --> Transform
    
    Transform --> EmitEvent{Event Type?}
    EmitEvent -->|text & isPartial| ThinkingChunk[Emit thinking-chunk<br/>Mark activity]
    EmitEvent -->|tool_use| ToolExec[Emit tool-execution<br/>Mark activity]
    EmitEvent -->|result| ResultCheck{toolType === codex?}
    
    ThinkingChunk --> ActivityTimer
    ToolExec --> ActivityTimer
    ResultCheck -->|Yes| StorePending[Store in codexPendingResult]
    ResultCheck -->|No| EmitResult[Emit result immediately]
    
    StorePending --> TurnComplete{Turn completed?}
    TurnComplete -->|Yes| EmitFinal[Emit codexPendingResult]
    TurnComplete -->|No| Continue[Continue streaming]
    
    EmitResult --> ActivityTimer[Update lastActivityTime]
    EmitFinal --> ActivityTimer
    Continue --> StreamData
    
    ActivityTimer --> StallCheck{Idle > 5 min?}
    StallCheck -->|Yes| KillStalled[Kill stalled agent]
    StallCheck -->|No| StreamData
    
    KillStalled --> ExitHandler
    ActivityTimer --> ExitHandler[Process Exit]
    
    ExitHandler --> FinalResult{Has result?}
    FinalResult -->|No & Codex| UseCodexPending[Use codexPendingResult]
    FinalResult -->|No & Other| UseStreamed[Use streamedText]
    FinalResult -->|Yes| EmitExit[Emit exit event]
    
    UseCodexPending --> EmitExit
    UseStreamed --> EmitExit
    EmitExit --> End([Complete])
Loading

Last reviewed commit: a222d80

@openasocket openasocket reopened this Feb 28, 2026
@openasocket openasocket reopened this Feb 28, 2026
@openasocket
Copy link
Contributor Author

Closing/reopening to refresh GitHub mergeability cache

@openasocket openasocket reopened this Feb 28, 2026
openasocket and others added 2 commits February 28, 2026 22:59
…detection

# Conflicts:
#	src/main/agents/definitions.ts
#	src/main/parsers/codex-output-parser.ts
#	src/main/process-manager/handlers/ExitHandler.ts
#	src/main/process-manager/handlers/StdoutHandler.ts
#	src/renderer/App.tsx
#	src/renderer/components/NewInstanceModal.tsx
#	src/renderer/components/TerminalOutput.tsx
#	src/renderer/components/shared/AgentSelector.tsx
#	src/renderer/hooks/worktree/useWorktreeHandlers.ts
…all loop, observability

- Include stderr in Codex exec_command_end fallback chain so commands
  that only write to stderr don't produce blank tool results
- Fix stalled document re-run loop: skip already-stalled documents at
  the start of each pass and break when all documents are stalled
- Add logger.warn when tool output truncation occurs (10K char limit)
- Extract 5-minute inactivity timeout to module-level constant with
  TODO for settings integration
- Document AGENT_MIN_VERSIONS maintenance strategy in JSDoc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/main/parsers/codex-output-parser.ts (1)

498-500: ⚠️ Potential issue | 🟡 Minor

Use nullish coalescing for aggregated_output precedence.

Using || treats '' as absent and can incorrectly fall back to other fields. aggregated_output should remain authoritative when explicitly present but empty.

Suggested fix
-		const rawOutput =
-			msg.aggregated_output || msg.stdout || msg.stderr || msg.formatted_output || '';
+		const rawOutput =
+			msg.aggregated_output ?? msg.stdout ?? msg.stderr ?? msg.formatted_output ?? '';
#!/bin/bash
set -euo pipefail
rg -n "const rawOutput" src/main/parsers/codex-output-parser.ts -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/parsers/codex-output-parser.ts` around lines 498 - 500, Change the
fallback for msg fields to use nullish coalescing so an explicitly present empty
aggregated_output is honored: in the block that sets rawOutput (variable
rawOutput) replace the current logical-OR chain with a nullish-coalescing chain
that checks msg.aggregated_output ?? msg.stdout ?? msg.stderr ??
msg.formatted_output ?? '' before passing into this.truncateToolOutput (method
truncateToolOutput) and assigning to output; ensure you only change the operator
semantics, leaving variable names rawOutput and output and the call to
truncateToolOutput intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/agents/detector.ts`:
- Around line 61-67: The timeout currently aborting version detection doesn't
kill the child because execFileNoThrow is called with the old signature (env
passed as plain object) and ExecOptions lacks an env field; extend the
ExecOptions type to include env: NodeJS.ProcessEnv, update execFileNoThrow and
execFileWithInput to accept an options object and respect both timeout and env,
and modify detector.ts to call execFileNoThrow(binaryPath, versionArgs, { env,
timeout: VERSION_DETECT_TIMEOUT_MS }) so the underlying child process is
terminated on timeout and env vars are applied.

In `@src/main/parsers/codex-output-parser.ts`:
- Around line 361-363: The conditional detecting old-format messages only checks
for parsed.error being a string, so payloads like { "error": { "message": "..."
} } are missed; update the if in codex-output-parser.ts (the branch that calls
this.transformOldFormat(parsed as unknown as CodexOldMessage)) to treat
parsed.error as either a string OR an object with a message property (e.g.,
typeof parsed.error === 'string' || (typeof parsed.error === 'object' &&
parsed.error?.message)), so those nested error objects are routed to
transformOldFormat like true string errors.

---

Duplicate comments:
In `@src/main/parsers/codex-output-parser.ts`:
- Around line 498-500: Change the fallback for msg fields to use nullish
coalescing so an explicitly present empty aggregated_output is honored: in the
block that sets rawOutput (variable rawOutput) replace the current logical-OR
chain with a nullish-coalescing chain that checks msg.aggregated_output ??
msg.stdout ?? msg.stderr ?? msg.formatted_output ?? '' before passing into
this.truncateToolOutput (method truncateToolOutput) and assigning to output;
ensure you only change the operator semantics, leaving variable names rawOutput
and output and the call to truncateToolOutput intact.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 328641b and b241c86.

📒 Files selected for processing (16)
  • .prettierignore
  • package.json
  • src/__tests__/main/parsers/codex-output-parser.test.ts
  • src/__tests__/main/process-manager/handlers/StdoutHandler.test.ts
  • src/__tests__/renderer/components/DocumentGraph/mindMapLayouts.test.ts
  • src/__tests__/renderer/components/TerminalOutput.test.tsx
  • src/__tests__/renderer/components/auto-scroll.test.tsx
  • src/__tests__/renderer/hooks/useBatchProcessor.test.ts
  • src/main/agents/definitions.ts
  • src/main/agents/detector.ts
  • src/main/index.ts
  • src/main/ipc/handlers/process.ts
  • src/main/parsers/codex-output-parser.ts
  • src/main/process-manager/handlers/ExitHandler.ts
  • src/main/process-manager/handlers/StdoutHandler.ts
  • src/main/process-manager/spawners/ChildProcessSpawner.ts
✅ Files skipped from review due to trivial changes (5)
  • src/tests/renderer/components/TerminalOutput.test.tsx
  • .prettierignore
  • src/tests/renderer/components/auto-scroll.test.tsx
  • src/tests/renderer/hooks/useBatchProcessor.test.ts
  • src/tests/renderer/components/DocumentGraph/mindMapLayouts.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/main/process-manager/spawners/ChildProcessSpawner.ts
  • src/main/ipc/handlers/process.ts

Comment on lines +61 to +67
const resultPromise = execFileNoThrow(binaryPath, versionArgs, undefined, env);
const timeoutPromise = new Promise<null>((resolve) =>
setTimeout(() => resolve(null), VERSION_DETECT_TIMEOUT_MS)
);

const result = await Promise.race([resultPromise, timeoutPromise]);
if (!result) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Inspecting detectAgentVersion timeout behavior ==="
rg -n "detectAgentVersion|Promise\.race|VERSION_DETECT_TIMEOUT_MS|execFileNoThrow" src/main/agents/detector.ts -C4

echo ""
echo "=== Full context of timeout handling section ==="
sed -n '54,104p' src/main/agents/detector.ts

Repository: RunMaestro/Maestro

Length of output: 3855


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Inspecting execFileNoThrow implementation ==="
rg -n "function execFileNoThrow|export.*execFileNoThrow|interface ExecOptions|type ExecOptions" src/main/utils/execFile.ts -C4

echo ""
echo "=== Full execFileNoThrow function implementation ==="
rg -n "execFileNoThrow" src/main/utils/execFile.ts -A30

Repository: RunMaestro/Maestro

Length of output: 1886


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Checking for AbortController/AbortSignal usage in execFile ==="
rg -n "AbortController|AbortSignal|signal|timeout" src/main/utils/execFile.ts

echo ""
echo "=== Checking for process termination patterns in detector ==="
rg -n "kill|abort|cleanup|unref" src/main/agents/detector.ts

Repository: RunMaestro/Maestro

Length of output: 901


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Full execFileNoThrow implementation with timeout handling ==="
sed -n '86,155p' src/main/utils/execFile.ts

echo ""
echo "=== Check other execFileNoThrow calls in detector.ts ==="
rg -n "execFileNoThrow" src/main/agents/detector.ts -B2 -A2

Repository: RunMaestro/Maestro

Length of output: 3003


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Verify if env is being merged into ExecOptions elsewhere ==="
rg -n "ExecOptions|{ env" src/main/agents/detector.ts -B2 -A2

echo ""
echo "=== Check how execFileWithInput handles both timeout and env ==="
sed -n '160,210p' src/main/utils/execFile.ts

Repository: RunMaestro/Maestro

Length of output: 248


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Full ExecOptions interface definition ==="
sed -n '7,15p' src/main/utils/execFile.ts

echo ""
echo "=== Check if ExecOptions includes env property ==="
rg -n "interface ExecOptions" -A10 src/main/utils/execFile.ts

Repository: RunMaestro/Maestro

Length of output: 820


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Verify execFileWithInput function signature ==="
sed -n '160,175p' src/main/utils/execFile.ts

echo ""
echo "=== Check how other callers pass env with timeout ==="
rg -n "execFileNoThrow" src/ -B1 -A1 | head -50

Repository: RunMaestro/Maestro

Length of output: 3317


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check if ExecOptions needs env property added ==="
rg -n "env.*ExecOptions|ExecOptions.*env" src/

echo ""
echo "=== Verify execFileWithInput does NOT accept env ==="
sed -n '160,210p' src/main/utils/execFile.ts

echo ""
echo "=== Check actual Node.js execFile behavior with env ==="
sed -n '114,130p' src/main/utils/execFile.ts

Repository: RunMaestro/Maestro

Length of output: 1996


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Verify spawn call in execFileWithInput lacks env ==="
sed -n '160,180p' src/main/utils/execFile.ts

echo ""
echo "=== Check what properties spawn accepts for env ==="
sed -n '172,178p' src/main/utils/execFile.ts

echo ""
echo "=== Confirm detector.ts line 344 also passes env without timeout ==="
sed -n '340,350p' src/main/agents/detector.ts

Repository: RunMaestro/Maestro

Length of output: 1292


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Summary: ExecOptions interface definition ==="
cat src/main/utils/execFile.ts | sed -n '7,11p'

echo ""
echo "=== How detector.ts passes env (backward compat path) ==="
sed -n '98,107p' src/main/utils/execFile.ts

echo ""
echo "=== Conclusion: Can detector.ts pass both env AND timeout? ==="
echo "ExecOptions has only: input, timeout"
echo "No env property in ExecOptions interface"
echo "Calling with env object goes to old signature path (line 105)"
echo "Old path does NOT extract timeout (lines 98-107)"

Repository: RunMaestro/Maestro

Length of output: 904


Version probe timeout does not terminate the child process.

Promise.race returns after 3s, but the underlying execFileNoThrow call continues running. The timeout is never passed to the exec helper—detector.ts passes env as a plain object (old signature), so the timeout option in ExecOptions is ignored. Additionally, ExecOptions lacks an env property, making it impossible to pass both environment variables and timeout simultaneously with the current API design.

Processes can accumulate across detections when --version commands hang. Extend ExecOptions to include env: NodeJS.ProcessEnv, update detector.ts to pass { env, timeout: VERSION_DETECT_TIMEOUT_MS }, and ensure execFileWithInput also applies env variables.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/agents/detector.ts` around lines 61 - 67, The timeout currently
aborting version detection doesn't kill the child because execFileNoThrow is
called with the old signature (env passed as plain object) and ExecOptions lacks
an env field; extend the ExecOptions type to include env: NodeJS.ProcessEnv,
update execFileNoThrow and execFileWithInput to accept an options object and
respect both timeout and env, and modify detector.ts to call
execFileNoThrow(binaryPath, versionArgs, { env, timeout:
VERSION_DETECT_TIMEOUT_MS }) so the underlying child process is terminated on
timeout and env vars are applied.

Comment on lines +361 to +363
// Old format detection: top-level `type` field or standalone `error` field
if (typeof parsed.type === 'string' || typeof parsed.error === 'string') {
return this.transformOldFormat(parsed as unknown as CodexOldMessage);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Top-level object error payloads are not routed to old-format handling.

A line like { "error": { "message": "..." } } currently falls through as system instead of becoming an error event.

Suggested fix
-		if (typeof parsed.type === 'string' || typeof parsed.error === 'string') {
+		if (
+			typeof parsed.type === 'string' ||
+			typeof parsed.error === 'string' ||
+			(typeof parsed.error === 'object' && parsed.error !== null)
+		) {
 			return this.transformOldFormat(parsed as unknown as CodexOldMessage);
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Old format detection: top-level `type` field or standalone `error` field
if (typeof parsed.type === 'string' || typeof parsed.error === 'string') {
return this.transformOldFormat(parsed as unknown as CodexOldMessage);
// Old format detection: top-level `type` field or standalone `error` field
if (
typeof parsed.type === 'string' ||
typeof parsed.error === 'string' ||
(typeof parsed.error === 'object' && parsed.error !== null)
) {
return this.transformOldFormat(parsed as unknown as CodexOldMessage);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/parsers/codex-output-parser.ts` around lines 361 - 363, The
conditional detecting old-format messages only checks for parsed.error being a
string, so payloads like { "error": { "message": "..." } } are missed; update
the if in codex-output-parser.ts (the branch that calls
this.transformOldFormat(parsed as unknown as CodexOldMessage)) to treat
parsed.error as either a string OR an object with a message property (e.g.,
typeof parsed.error === 'string' || (typeof parsed.error === 'object' &&
parsed.error?.message)), so those nested error objects are routed to
transformOldFormat like true string errors.

…essor

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openasocket
Copy link
Contributor Author

Review Fixes — Complete Summary

All CodeRabbit actionable items (3) and Greptile concerns (5) have been addressed. Branch is now fully merged with upstream/main and GitHub reports MERGEABLE.

CodeRabbit Fixes

  • Watchdog kill failure swallowing (useAgentExecution.ts) — Previously addressed in 31dc2151. The inactivity watchdog's .catch(() => {}) on process.kill() was silently discarding errors. Replaced with window.maestro.logger.log('error', ...) to surface kill failures.

  • stderr missing from exec_command_end fallback chain (codex-output-parser.ts:464) — The output fallback chain msg.aggregated_output || msg.stdout || msg.formatted_output || '' omitted stderr. If a command wrote only to stderr and other fields were empty, the tool result was blank. Added msg.stderr to the || chain. Decision: kept || semantics throughout rather than mixing ?? and || for consistency with the existing pattern.

  • Stalled document re-run loop (useBatchProcessor.ts:820-835) — Stalled docs were skipped per pass but retried in later loops. If all documents stalled and maxLoops was unset, Auto Run would loop forever. Added early skip at top of document loop for already-stalled docs plus explicit "all documents stalled" break condition. The existing anyTasksProcessedThisIteration check serves as an additional safety net.

Greptile Concerns

  • Version detection timeout blocking agent spawn — Verified: version detection runs during detectAgents() at app startup (cached), not during spawn. The spawn path in process-manager/ has no version detection calls. The 3s timeout in detectAgentVersion via Promise.race ensures detection never hangs.

  • Tool output truncation observability — Added logger.warn in truncateToolOutput() logging original and truncated character counts when 10K character truncation occurs. Without this, silent truncation could cause confusing behavior during debugging.

  • Hardcoded 5-minute inactivity timeout — Extracted to module-level constant AUTO_RUN_ACTIVITY_TIMEOUT_MS with TODO for settings integration. Decision: deferred full settings UI to a separate task to keep this PR focused on version detection and Codex fixes.

  • AGENT_MIN_VERSIONS maintenance strategy — Added JSDoc block documenting when to update (new agent CLI release with breaking changes), add (new agent type), and remove (deprecated agent) entries. This prevents version constants from going stale as agent CLIs evolve.

  • AgentSelector indentation — Verified with cat -A: lines 127-129 already use tabs. No change needed.

Merge & CI Fixes

  • Merge with upstream/main (577e1c28) — Resolved 9 conflicting files (definitions.ts, codex-output-parser.ts, ExitHandler.ts, StdoutHandler.ts, App.tsx, NewInstanceModal.tsx, TerminalOutput.tsx, AgentSelector.tsx, useWorktreeHandlers.ts). Branch is now 0 commits behind main.

  • Prettier formatting (1bdd525b) — Fixed formatting failures in detector.test.ts and useDocumentProcessor.ts that were causing CI lint-and-format job to fail after the merge.

Verification

  • ✅ 508 test files pass, 21,604 tests pass
  • tsc clean across all 3 configs (lint, main, cli)
  • ✅ GitHub reports MERGEABLE
  • ⬜ Manual testing pending: version detection display in Edit Modal, Wizard tiles, and AgentCard across all supported agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants