Skip to content

fix: auto-scroll follows thinking streams + Codex tool display#410

Merged
pedramamini merged 6 commits intoRunMaestro:mainfrom
openasocket:fix/auto-scroll-improvements
Feb 19, 2026
Merged

fix: auto-scroll follows thinking streams + Codex tool display#410
pedramamini merged 6 commits intoRunMaestro:mainfrom
openasocket:fix/auto-scroll-improvements

Conversation

@openasocket
Copy link
Contributor

@openasocket openasocket commented Feb 18, 2026

Summary

  • Replace filteredLogs.length auto-scroll dependency with MutationObserver to detect in-place content changes (thinking stream text growth, tool status updates)
  • Add programmatic scroll guard to prevent the scroll handler from falsely pausing auto-scroll during effect-driven scrollTo() calls
  • Fix Codex tool execution display: carry tool name from tool_call to tool_result, handle array-typed command args, add Codex-specific arg extraction, truncate large outputs

Motivation

Auto-scroll stops following thinking/tool streams (Fixes #402, supersedes #406)

The auto-scroll feature (PR #390) depends on filteredLogs.length to trigger scroll-to-bottom. However, thinking chunks update the last log entry's text in-place without changing the array length:

// useAgentListeners.ts — thinking chunk handler
logs: [...tab.logs.slice(0, -1), { ...lastLog, text: combinedText }]
//                                               ^^^^^^^^^^^^^^^^^^
//                                               text grows, length stays the same

Timeline of the bug:

T1: First thinking chunk → new log added → length 1→2 → scroll fires ✓
T2: Second thinking chunk → text updated → length still 2 → NO scroll ✗
T3: Third thinking chunk → text updated → length still 2 → NO scroll ✗
T4: Tool event → new log added → length 2→3 → scroll fires ✓
T5: More thinking → text updated → length still 3 → NO scroll ✗

PR #406 proposed a content-fingerprint memo (lastVisibleLogActivityKey) that creates a composite key from the last log's metadata. This works but couples the scroll trigger to React's data model. Our approach uses a MutationObserver on the scroll container instead, which:

  1. Catches all content changes — new DOM nodes, text mutations — regardless of how the React data model produces them
  2. Is framework-agnostic — doesn't need to know about filteredLogs, collapsedLogs, or log data shapes
  3. Naturally throttles — MutationObserver batches mutations and delivers them asynchronously, providing built-in coalescing
  4. Has zero React overhead — no additional memos, no dependency array churn, no re-renders

Programmatic scroll falsely pauses auto-scroll (secondary race condition)

When the auto-scroll effect calls scrollTo({ behavior: 'auto' }), the instant scroll fires the onScroll handler synchronously. If scrollHeight just grew from new content but the scroll position hasn't propagated yet, the handler computes atBottom = false and sets autoScrollPaused = true — pausing auto-scroll during a programmatic scroll. This creates intermittent freezes where the viewport stops following output mid-stream.

Fixed with a ref-based guard (isProgrammaticScrollRef) set before scrollTo() and cleared via queueMicrotask. The scroll handler checks this flag before pausing.

Codex tool executions show no command details

Codex bash tool entries render as just bash ● / bash ✓ with no information about what command was executed. Three sub-bugs:

Missing tool name on completion: Codex emits tool_call and tool_result as separate item.completed JSONL events. The parser transforms tool_result without a toolName field (only tool_call items include item.tool). StdoutHandler's gate (event.toolName check) silently drops the event — the renderer never receives "completed" status.

Array commands not rendered: Codex tool_call items use args: { command: ['ls', '-la'] } (string array). The renderer's safeStr(toolInput.command) returns null for non-strings, so no command detail is displayed. This bug is a new finding not addressed in PR #407's tool fix (commit 451f8871), which added lastToolName tracking and Codex arg patterns but didn't handle the array command format.

No output truncation: Large tool outputs (e.g., cat of a big file) create oversized log entries with no limit.

Changes

Commit 1: 638db6ff — MutationObserver auto-scroll

Replace the filteredLogs.length dependency in the auto-scroll useEffect with a MutationObserver on scrollContainerRef:

const observer = new MutationObserver(() => {
    if (shouldAutoScroll()) scrollToBottom();
});
observer.observe(container, {
    childList: true,    // New/removed DOM nodes (new log entries)
    subtree: true,      // Watch all descendants
    characterData: true, // Text node mutations (thinking stream growth)
});

The observer is set up and torn down inside the effect, keyed on [session.inputMode, autoScrollAiMode, autoScrollPaused]. The filteredLogs.length dependency is completely removed from this effect.

Intentionally kept: The "detect new messages" effect and handleScrollInner still use filteredLogs.length for unread badge counting and read state tracking — in-place text updates should NOT increment unread counts.

Commit 2: 35b86947 — Programmatic scroll guard

  • Add isProgrammaticScrollRef (boolean ref)
  • Set true before scrollTo(), clear via queueMicrotask after
  • Check !isProgrammaticScrollRef.current in handleScrollInner before setting autoScrollPaused = true

Why queueMicrotask not setTimeout(0): microtasks run after the current task's synchronous work but before the next macro task. Since behavior: 'auto' scrolls synchronously, the onScroll event fires in the same task. The microtask clears the flag before any subsequent user scroll could fire.

Commit 3: fdf8329f — Tests

4 new tests in auto-scroll.test.tsx:

  • Thinking text growth triggers auto-scroll (same array length)
  • Unread badge does NOT increment on in-place text updates
  • Tool event appends trigger auto-scroll
  • Genuine user scroll-up still pauses auto-scroll (guard doesn't interfere)

Includes a class-based MockMutationObserver for JSDOM compatibility.

Commit 4: c276056c — Codex tool display

Parser (codex-output-parser.ts):

  • Add lastToolName field: stored on tool_call, carried to tool_result, cleared after use
  • Add MAX_TOOL_OUTPUT_LENGTH (10K chars) truncation in decodeToolOutput

Renderer (TerminalOutput.tsx):

  • Add safeCommand() helper: handles both string and string-array command values (joins arrays with spaces)
  • Add truncate() helper for length-limited display
  • Add Codex-specific arg patterns: path, cmd, code, content
  • Change safeStr(toolInput.command)safeCommand(toolInput.command)

Tests (codex-output-parser.test.ts):

  • Tool name carryover from tool_call to tool_result
  • lastToolName reset after consumption (no stale names)
  • Output truncation at 10K chars with marker

Files Changed (4 files, +375 / -29)

File Changes
src/renderer/components/TerminalOutput.tsx MutationObserver effect, isProgrammaticScrollRef guard, safeCommand() helper, Codex arg extraction
src/main/parsers/codex-output-parser.ts lastToolName tracking, output truncation
src/__tests__/renderer/components/auto-scroll.test.tsx 4 new tests + MutationObserver mock
src/__tests__/main/parsers/codex-output-parser.test.ts 3 new tests for tool name carryover + truncation

Related

Test plan

  • npm run lint — clean
  • npm run test — 468 test files, 19,984 tests pass
  • New auto-scroll tests: thinking stream growth, unread badge isolation, tool event scroll, user scroll-up guard
  • New Codex parser tests: tool name carryover, stale name reset, output truncation
  • Manual: enable thinking display → start streaming prompt → verify viewport stays pinned during thinking text growth
  • Manual: scroll up during streaming → verify auto-scroll pauses → scroll back down → verify it resumes
  • Manual: Codex agent → run a command → verify tool entries show command details (not just "bash ●")

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Tool context is preserved across consecutive tool events so displayed tool names remain consistent.
    • Terminal displays additional tool details and trims very long tool content for readability.
  • Bug Fixes

    • Improved auto-scroll: programmatic scrolling no longer interrupts auto-scroll; user scrolls still pause it.
    • Unread badge/alignment updated to reflect actual scroll-bottom state.
  • Tests

    • Added coverage for tool name carryover, output truncation, and mutation-driven scroll behavior (includes test polyfill).

openasocket and others added 4 commits February 18, 2026 11:09
…erver

The auto-scroll effect previously depended on filteredLogs.length, which
missed in-place text updates during thinking/tool streaming (the array
length doesn't change when thinking chunks update the last log entry).

Replace with a MutationObserver on the scroll container that detects all
DOM mutations — childList for new nodes, characterData for text changes,
subtree to watch all descendants. This catches both new log entries and
in-place thinking stream growth.

Also adds a clarifying comment to the detect-new-messages effect
explaining why it intentionally still uses filteredLogs.length (unread
badge counts should only increment on new entries, not text updates).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ausing

Adds isProgrammaticScrollRef that the MutationObserver scrollToBottom sets
before calling scrollTo() and clears via queueMicrotask. The scroll handler
checks this flag before pausing auto-scroll, preventing the race condition
where a programmatic scroll triggers onScroll with a stale scrollTop and
falsely sets autoScrollPaused=true.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…scroll guard

Add thinking stream auto-scroll tests verifying:
- In-place text growth triggers auto-scroll via MutationObserver
- Unread badge does not increment on thinking text growth
- Tool event appended after thinking triggers auto-scroll

Add programmatic scroll guard test verifying:
- Genuine user scroll-up still pauses auto-scroll

Includes MutationObserver mock (class-based) for JSDOM compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, handle array commands

Add lastToolName tracking in Codex parser to carry tool name from
tool_call to tool_result events (Codex emits these as separate
item.completed events). Fix TerminalOutput tool detail extraction
to handle array-typed command args (Codex uses ['ls', '-la'] format).
Add Codex-specific arg patterns (path, cmd, code, content).
Truncate large tool outputs to 10K chars.

Fixes missing command details in Codex bash tool execution display.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Adds tool-name carryover and output truncation to CodexOutputParser, replaces filteredLogs-triggered auto-scroll with a MutationObserver-driven flow in TerminalOutput (with a programmatic-scroll guard and Codex command/content handling), and adds test scaffolding for MutationObserver plus parser tests for carryover and truncation.

Changes

Cohort / File(s) Summary
Parser Implementation
src/main/parsers/codex-output-parser.ts
Added lastToolName state to carry a tool_call's tool name to the next tool_result, introduced MAX_TOOL_OUTPUT_LENGTH = 10000, improved decodeToolOutput to handle strings and byte arrays robustly, and truncate long outputs with a truncation marker and fallback/error reporting.
Parser Tests
src/__tests__/main/parsers/codex-output-parser.test.ts
New tests validating tool name carryover from tool_call to tool_result, consumption (clearing) of carried name after use, and truncation behavior for outputs exceeding 10,000 characters versus shorter outputs.
TerminalOutput UI & Auto-scroll
src/renderer/components/TerminalOutput.tsx
Replaced filteredLogs-length trigger with a MutationObserver to detect DOM mutations and invoke scroll-to-bottom when appropriate. Added isProgrammaticScrollRef to ignore programmatic scrolls for pause/resume logic. Added safeCommand and truncate helpers and expanded Codex tool detail fields (path, cmd, code, truncated content).
Auto-Scroll Test Scaffolding
src/__tests__/renderer/components/auto-scroll.test.tsx
Introduced a MockMutationObserver polyfill for JSDOM tests (stores callback on window.__mutationCallback, provides observe, disconnect, takeRecords) to allow manual triggering of mutation callbacks; scaffolding duplicated across test sections.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Terminal as TerminalOutput
    participant Observer as MutationObserver
    participant Container as Scroll Container

    User->>Container: Scroll away from bottom
    Terminal->>Terminal: Detect user scroll -> pause auto-scroll
    Note over Observer,Container: Streaming updates mutate DOM
    Observer->>Terminal: Mutation callback
    Terminal->>Terminal: shouldAutoScroll() false -> do not scroll
    User->>Terminal: Scroll to bottom or re-enable auto-scroll
    Terminal->>Terminal: Set isProgrammaticScrollRef = true
    Terminal->>Container: scrollToBottom()
    Terminal->>Terminal: Clear isProgrammaticScrollRef
Loading
sequenceDiagram
    participant ToolCall as tool_call Event
    participant Parser as CodexOutputParser
    participant Storage as lastToolName
    participant ToolResult as tool_result Event

    ToolCall->>Parser: Receive tool_call (toolName='shell')
    Parser->>Storage: Store lastToolName = 'shell'
    Parser->>Parser: Emit tool_call event

    ToolResult->>Parser: Receive tool_result (no toolName)
    Parser->>Storage: Read lastToolName ('shell')
    Parser->>Parser: Emit tool_result with toolName='shell' and status='completed'
    Parser->>Storage: Clear lastToolName

    ToolResult->>Parser: Receive another tool_result
    Parser->>Storage: Read lastToolName (null)
    Parser->>Parser: Emit tool_result with toolName=undefined
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly summarizes the main changes: auto-scroll behavior following thinking streams and improvements to Codex tool display, matching the primary objectives of the PR.
Linked Issues check ✅ Passed All coding requirements from issue #402 are met: MutationObserver detects streaming updates, programmatic scroll guard prevents auto-scroll pausing, and unread-badge semantics are preserved.
Out of Scope Changes check ✅ Passed All changes are directly related to addressing auto-scroll issues and Codex tool display improvements outlined in the PR objectives; no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link

greptile-apps bot commented Feb 18, 2026

Greptile Summary

This PR fixes two critical auto-scroll bugs and enhances Codex tool display. The MutationObserver-based approach elegantly solves thinking stream auto-scroll failures by observing DOM mutations instead of React state, and the isProgrammaticScrollRef guard prevents race conditions during programmatic scrolls.

Key Changes:

  • Replaced filteredLogs.length dependency with MutationObserver to detect in-place text updates (thinking streams, tool status)
  • Added isProgrammaticScrollRef guard with queueMicrotask to prevent false auto-scroll pausing during effect-driven scrolls
  • Fixed Codex tool display: carry tool name from tool_call to tool_result, handle array-typed command args, add Codex arg patterns (path, cmd, code, content)
  • Added output truncation (10K chars) to prevent oversized log entries
  • Comprehensive test coverage: 4 new auto-scroll tests with MockMutationObserver, 3 new Codex parser tests

Architecture Highlights:

  • The separation of concerns is well-executed: MutationObserver handles scroll triggering, while filteredLogs.length effect handles unread badge logic (only NEW entries increment badge, not in-place updates)
  • The queueMicrotask timing is correct for synchronous behavior: 'auto' scrolls — flag is cleared after the same-tick onScroll event but before next user interaction
  • Tests validate both the fix (thinking text growth triggers scroll) and that existing behavior isn't broken (user scroll-up still pauses, unread badge doesn't increment on in-place updates)

Confidence Score: 5/5

  • Safe to merge — well-architected fix with comprehensive test coverage and clear separation of concerns
  • All 468 test files pass (19,984 tests), new tests validate both fixes and unchanged behavior, code follows project patterns (tabs, error handling, performance memos), MutationObserver is a standard browser API with proper cleanup, programmatic scroll guard uses correct microtask timing
  • No files require special attention

Important Files Changed

Filename Overview
src/renderer/components/TerminalOutput.tsx MutationObserver replaces filteredLogs.length dep for auto-scroll, isProgrammaticScrollRef guard added to prevent false pauses, safeCommand() handles array-typed commands, Codex-specific arg patterns added
src/main/parsers/codex-output-parser.ts lastToolName state added to carry tool name from tool_call to tool_result, output truncation at 10K chars in decodeToolOutput()
src/tests/renderer/components/auto-scroll.test.tsx 4 new tests: thinking text growth triggers scroll, unread badge doesn't increment on in-place updates, tool events trigger scroll, user scroll-up still pauses. Includes MockMutationObserver for JSDOM.
src/tests/main/parsers/codex-output-parser.test.ts 3 new tests: tool name carryover from tool_call to tool_result, lastToolName reset after consumption, output truncation at 10K chars with marker

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[New log entry or text update] --> B{Input Mode?}
    B -->|Terminal| C[Auto-scroll enabled]
    B -->|AI| D{autoScrollAiMode?}
    D -->|No| E[No auto-scroll]
    D -->|Yes| F{autoScrollPaused?}
    F -->|Yes| E
    F -->|No| C
    
    C --> G[MutationObserver detects change]
    G --> H[Calls scrollToBottom]
    H --> I[Set isProgrammaticScrollRef = true]
    I --> J[scrollTo top: scrollHeight, behavior: auto]
    J --> K[onScroll handler fires synchronously]
    K --> L{isProgrammaticScrollRef?}
    L -->|Yes| M[Skip pause logic]
    L -->|No| N{At bottom?}
    N -->|No| O[Set autoScrollPaused = true]
    N -->|Yes| P[Clear new message badge]
    M --> Q[queueMicrotask clears flag]
    Q --> R[Ready for next interaction]
    
    S[User scrolls up] --> T[onScroll handler]
    T --> U{isProgrammaticScrollRef?}
    U -->|No genuine user scroll| V[Set autoScrollPaused = true]
    U -->|Yes| W[Ignore - programmatic]
    
    X[filteredLogs.length changes] --> Y{New entries added?}
    Y -->|Yes| Z{At bottom?}
    Z -->|No| AA[Increment unread badge]
    Z -->|Yes| AB[Update read state]
    Y -->|No in-place update| AC[No badge change]
Loading

Last reviewed commit: c276056

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/parsers/codex-output-parser.ts`:
- Around line 364-368: Replace the silent catch around Buffer.from(output) so it
reports the caught error to Sentry before falling back: add an import from
'../utils/sentry' and in the catch block capture the exception (e.g.,
Sentry.captureException(err)) where err is the caught error, then set decoded =
output.toString(); ensure you reference the same decoded and output variables
and the Buffer.from(...) call when locating the code to change.

In `@src/renderer/components/TerminalOutput.tsx`:
- Around line 1489-1502: The guard is cleared too early via queueMicrotask
causing a race with the throttled onScroll; update the logic around
isProgrammaticScrollRef and the throttled handler (the ref,
scrollContainerRef.current.scrollTo call, and the useThrottledCallback-based
onScroll) so the guard remains true for the throttle window: either postpone
clearing the guard by using the same throttle delay (e.g., setTimeout with the
throttle duration) after calling scrollTo, or move the guard-reset into the
throttled onScroll callback itself and check isProgrammaticScrollRef at the
start of that throttled handler before it clears the flag; adjust whichever
approach you pick to use the existing throttle duration used by
useThrottledCallback to avoid the race.

…tion

Address CodeRabbit review on PR RunMaestro#410:

1. codex-output-parser.ts: Report Buffer.from() decode failures to
   Sentry with context (operation, outputType, outputLength) instead
   of silently swallowing the error.

2. TerminalOutput.tsx: Fix race condition where queueMicrotask cleared
   isProgrammaticScrollRef before the throttled scroll handler (16ms
   delay) could check it. The guard flag is now consumed (checked and
   cleared) inside handleScrollInner itself, eliminating the timing
   dependency between microtasks and the throttle window.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openasocket
Copy link
Contributor Author

Addressed Review Findings — ce7b086d

Both actionable items from the CodeRabbit review have been fixed:

1. Sentry reporting for Buffer.from() decode errors

File: src/main/parsers/codex-output-parser.ts

The catch block in decodeToolOutput previously swallowed errors silently. Now imports captureException from ../utils/sentry and reports the error with context (operation, outputType, outputLength) before falling back to output.toString(). This follows the project's error handling guidelines — recoverable errors are handled gracefully but still reported to Sentry for visibility.

2. Throttle race condition on isProgrammaticScrollRef

File: src/renderer/components/TerminalOutput.tsx

The bug: scrollToBottom() set isProgrammaticScrollRef = true, called scrollTo(), then cleared the flag via queueMicrotask (~0ms). But handleScroll is wrapped in useThrottledCallback(fn, 16) — a leading-edge throttle that defers trailing calls via setTimeout(fn, delay - elapsed). When a scroll event fell into the throttle window, the deferred handler fired ~16ms later, after the microtask had already cleared the guard, causing a false auto-scroll pause.

The fix: Removed the queueMicrotask clear entirely. Instead, the guard flag is now consumed (checked and cleared) inside handleScrollInner itself:

} else if (autoScrollAiMode) {
    if (isProgrammaticScrollRef.current) {
        // Consume the guard — this was our own scrollTo()
        isProgrammaticScrollRef.current = false;
    } else {
        // Genuine user scroll-up
        setAutoScrollPaused(true);
    }
}

This eliminates the timing dependency between microtasks and the throttle window. The flag persists until the scroll handler actually processes it, regardless of when the throttled invocation fires.

Tests: All 65 tests pass (auto-scroll + codex parser), TypeScript compiles clean.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/parsers/codex-output-parser.ts`:
- Around line 167-171: The parser currently uses lastToolName to associate
tool_result with the preceding tool_call which breaks when events are
out-of-order; add an optional tool_call_id?: string field to the CodexItem
interface, extract the tool_call_id from incoming Codex output in parseJsonLine
(or the code that builds CodexItem), and replace lastToolName-based pairing with
an ID map (e.g., toolCallIdToName: Map<string,string>) that records tool_call id
→ tool name when handling items of type 'tool_call' and looks up/removes that
entry when handling 'tool_result'; update any places using lastToolName (and the
lastToolName field itself) to use this id-based map so tool_result maps
correctly and the existing codex-session-storage cleanup logic (which expects
tool_call_id) receives the id on CodexItem.

In `@src/renderer/components/TerminalOutput.tsx`:
- Around line 1372-1382: The guard isProgrammaticScrollRef.current can remain
true if scrollToBottom is a no-op, so update the scroll handler around
autoScrollAiMode (the throttled handler that currently checks
isProgrammaticScrollRef.current and calls setAutoScrollPaused(true)) to always
clear isProgrammaticScrollRef.current on any scroll event (not only when true)
and additionally, in the code path that performs scrollToBottom, start a short
fallback timer (>= the throttle delay, e.g. 16ms) that clears
isProgrammaticScrollRef.current after the delay to cover no-scroll cases; ensure
the timer is cancelled or reset on subsequent programmatic scrolls to avoid
races and reuse the same references (isProgrammaticScrollRef, scrollToBottom,
setAutoScrollPaused) so behavior remains consistent.

Comment on lines +167 to +171
// Track tool name from tool_call to carry over to tool_result
// (Codex emits tool_call and tool_result as separate item.completed events,
// but tool_result doesn't include the tool name)
private lastToolName: string | null = null;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the codex-output-parser.ts file to understand the context
wc -l src/main/parsers/codex-output-parser.ts

Repository: RunMaestro/Maestro

Length of output: 105


🏁 Script executed:

# Read the relevant sections of the file
sed -n '160,180p' src/main/parsers/codex-output-parser.ts
sed -n '285,330p' src/main/parsers/codex-output-parser.ts

Repository: RunMaestro/Maestro

Length of output: 1903


🏁 Script executed:

# Search for test files or sample data related to Codex
find . -type f \( -name "*codex*" -o -name "*test*" -o -name "*spec*" \) | head -20

Repository: RunMaestro/Maestro

Length of output: 853


🏁 Script executed:

# Look for tool_call/tool_result in the codebase
rg "tool_call|tool_result" -t ts -t tsx -t json -C 3 | head -100

Repository: RunMaestro/Maestro

Length of output: 89


🏁 Script executed:

# Search for codex parser tests more specifically
find . -type f -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "codex\|CodexOutputParser" 2>/dev/null | head -10

Repository: RunMaestro/Maestro

Length of output: 570


🏁 Script executed:

# Search for Codex type definitions or interfaces
rg "interface.*Codex|type.*Codex|tool_call|tool_result" -t ts --max-count 20

Repository: RunMaestro/Maestro

Length of output: 9060


🏁 Script executed:

# Look for any fixture or test data files
find . -type f \( -name "*fixture*" -o -name "*mock*" -o -name "*.json" \) | grep -i codex

Repository: RunMaestro/Maestro

Length of output: 44


🏁 Script executed:

# Check the entire codex-output-parser.ts file structure
cat -n src/main/parsers/codex-output-parser.ts | head -100

Repository: RunMaestro/Maestro

Length of output: 3967


🏁 Script executed:

# Read the test file to understand test coverage
cat -n src/__tests__/main/parsers/codex-output-parser.test.ts

Repository: RunMaestro/Maestro

Length of output: 20682


🏁 Script executed:

# Check the CodexItem interface definition
sed -n '1,200p' src/main/parsers/codex-output-parser.ts | grep -A 20 "interface CodexItem"

Repository: RunMaestro/Maestro

Length of output: 505


🏁 Script executed:

# Look for more context on tool_call_id usage
rg "tool_call_id" -B 2 -A 2 src/main/parsers/ src/__tests__/main/parsers/

Repository: RunMaestro/Maestro

Length of output: 44


🏁 Script executed:

# Search for Codex documentation references
rg "codex|Codex" src/main/parsers/codex-output-parser.ts | head -10

Repository: RunMaestro/Maestro

Length of output: 501


🏁 Script executed:

# Look at how codex-session-storage actually uses tool_call_id
sed -n '1,50p' src/main/storage/codex-session-storage.ts
sed -n '150,250p' src/main/storage/codex-session-storage.ts | grep -B 5 -A 5 tool_call_id

Repository: RunMaestro/Maestro

Length of output: 1740


🏁 Script executed:

# Check if there's any real Codex output sample or fixture
find . -type f \( -name "*.json" -o -name "*.jsonl" -o -name "*.txt" \) -exec grep -l "item.completed.*tool_call\|tool_call.*tool_result" {} \; 2>/dev/null

Repository: RunMaestro/Maestro

Length of output: 44


🏁 Script executed:

# Verify tool_call_id is actually in CodexItem or if it's missing
rg "tool_call_id" src/main/parsers/codex-output-parser.ts src/__tests__/main/parsers/codex-output-parser.test.ts

Repository: RunMaestro/Maestro

Length of output: 44


🏁 Script executed:

# Get the actual codex-session-storage code that uses tool_call_id
sed -n '150,200p' src/main/storage/codex-session-storage.ts

Repository: RunMaestro/Maestro

Length of output: 1481


🏁 Script executed:

# Check the CodexItem definition more carefully for any ID field
sed -n '1,150p' src/main/parsers/codex-output-parser.ts | grep -A 30 "interface CodexItem"

Repository: RunMaestro/Maestro

Length of output: 421


🏁 Script executed:

# Search for any evidence of Codex sending tool_call_id or similar ID mapping
rg "tool_call_id\|item\.id.*tool" -C 3 src/main/storage/ src/main/parsers/

Repository: RunMaestro/Maestro

Length of output: 44


🏁 Script executed:

# Read more of codex-session-storage.ts to see tool_call_id context
wc -l src/main/storage/codex-session-storage.ts
sed -n '200,300p' src/main/storage/codex-session-storage.ts

Repository: RunMaestro/Maestro

Length of output: 3589


🏁 Script executed:

# Search for where tool_call_id is mentioned in session storage
grep -n "tool_call_id" src/main/storage/codex-session-storage.ts

Repository: RunMaestro/Maestro

Length of output: 259


🏁 Script executed:

# Check the actual implementation of parseJsonLine to see if it ever extracts or stores IDs
sed -n '200,350p' src/main/parsers/codex-output-parser.ts

Repository: RunMaestro/Maestro

Length of output: 3894


🏁 Script executed:

# Check what interface at line 1373 has tool_call_id
sed -n '1360,1380p' src/main/storage/codex-session-storage.ts

Repository: RunMaestro/Maestro

Length of output: 548


🏁 Script executed:

# Get the full context of how tool_call_id is used in cleanup logic
sed -n '1455,1475p' src/main/storage/codex-session-storage.ts

Repository: RunMaestro/Maestro

Length of output: 865


🏁 Script executed:

# Check if there are any comments in the parser about Codex output format or IDs
sed -n '1,50p' src/main/parsers/codex-output-parser.ts

Repository: RunMaestro/Maestro

Length of output: 1870


🏁 Script executed:

# Search for any references to Codex CLI schema or what fields it actually outputs
rg "tool_call_id|Codex.*output|Codex.*schema" -C 2 | head -50

Repository: RunMaestro/Maestro

Length of output: 3840


🏁 Script executed:

# Check if there's a reference link to Codex CLI documentation
rg "github.com/openai/codex" -B 5 -A 5

Repository: RunMaestro/Maestro

Length of output: 3526


🏁 Script executed:

# Look for any TODO or FIXME related to tool_call_id or ID mapping
rg "TODO|FIXME" src/main/parsers/codex-output-parser.ts src/main/storage/codex-session-storage.ts

Repository: RunMaestro/Maestro

Length of output: 44


Add tool_call_id field to CodexItem interface and use ID-based mapping for tool_call/tool_result pairing.

The current implementation relies on strict sequencing via lastToolName, which is fragile if Codex emits multiple tool_calls before tool_results or delivers them out of order. The codex-session-storage.ts cleanup logic already expects tool_call_id to be available (line 1462), but the parser doesn't extract or surface it. Add tool_call_id?: string to the CodexItem interface, extract it from Codex output, and use it for mapping tool results to their corresponding tool calls instead of relying solely on sequential tracking.

Test case showing current failure mode (lines 427–443):
it('should reset lastToolName after tool_result consumption', () => {
	const p = new CodexOutputParser();
	// tool_call → tool_result (consumes name) → another tool_result (no name)
	p.parseJsonLine(JSON.stringify({
		type: 'item.completed',
		item: { type: 'tool_call', tool: 'shell', args: {} },
	}));
	p.parseJsonLine(JSON.stringify({
		type: 'item.completed',
		item: { type: 'tool_result', output: 'ok' },
	}));
	const orphan = p.parseJsonLine(JSON.stringify({
		type: 'item.completed',
		item: { type: 'tool_result', output: 'orphan' },
	}));
	expect(orphan?.toolName).toBeUndefined();
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/parsers/codex-output-parser.ts` around lines 167 - 171, The parser
currently uses lastToolName to associate tool_result with the preceding
tool_call which breaks when events are out-of-order; add an optional
tool_call_id?: string field to the CodexItem interface, extract the tool_call_id
from incoming Codex output in parseJsonLine (or the code that builds CodexItem),
and replace lastToolName-based pairing with an ID map (e.g., toolCallIdToName:
Map<string,string>) that records tool_call id → tool name when handling items of
type 'tool_call' and looks up/removes that entry when handling 'tool_result';
update any places using lastToolName (and the lastToolName field itself) to use
this id-based map so tool_result maps correctly and the existing
codex-session-storage cleanup logic (which expects tool_call_id) receives the id
on CodexItem.

Comment on lines 1372 to +1382
} else if (autoScrollAiMode) {
// Pause auto-scroll when user scrolls away from bottom
setAutoScrollPaused(true);
if (isProgrammaticScrollRef.current) {
// This scroll event was triggered by our own scrollTo() call —
// consume the guard flag here inside the throttled handler to avoid
// the race where queueMicrotask clears the flag before a deferred
// throttled invocation fires (throttle delay is 16ms > microtask).
isProgrammaticScrollRef.current = false;
} else {
// Genuine user scroll away from bottom — pause auto-scroll
setAutoScrollPaused(true);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard can stick true when scrollToBottom doesn’t emit a scroll event.
If the scroll lands at bottom or is a no-op, isProgrammaticScrollRef stays true and the first user scroll away won’t pause auto-scroll. That can yank the user back on the next mutation. Consider clearing the guard on any scroll and add a short fallback timer after scrollToBottom (>= throttle delay) to handle no-scroll cases.

🛠️ Suggested fix
 const handleScrollInner = useCallback(() => {
 	if (!scrollContainerRef.current) return;
 	const { scrollTop, scrollHeight, clientHeight } = scrollContainerRef.current;
 	// Consider "at bottom" if within 50px of the bottom
 	const atBottom = scrollHeight - scrollTop - clientHeight < 50;
+	const wasProgrammatic = isProgrammaticScrollRef.current;
+	if (wasProgrammatic) {
+		isProgrammaticScrollRef.current = false;
+	}
 	setIsAtBottom(atBottom);
@@
-		} else if (autoScrollAiMode) {
-			if (isProgrammaticScrollRef.current) {
-				// This scroll event was triggered by our own scrollTo() call —
-				// consume the guard flag here inside the throttled handler to avoid
-				// the race where queueMicrotask clears the flag before a deferred
-				// throttled invocation fires (throttle delay is 16ms > microtask).
-				isProgrammaticScrollRef.current = false;
-			} else {
-				// Genuine user scroll away from bottom — pause auto-scroll
-				setAutoScrollPaused(true);
-			}
-		}
+		} else if (autoScrollAiMode) {
+			if (!wasProgrammatic) {
+				// Genuine user scroll away from bottom — pause auto-scroll
+				setAutoScrollPaused(true);
+			}
+		}
 const scrollToBottom = () => {
 	if (!scrollContainerRef.current) return;
 	requestAnimationFrame(() => {
 		if (scrollContainerRef.current) {
 			// Set guard flag BEFORE scrollTo — the throttled scroll handler
 			// checks this flag and consumes it (clears it) when it fires,
 			// preventing the programmatic scroll from being misinterpreted
 			// as a user scroll-up that should pause auto-scroll.
 			isProgrammaticScrollRef.current = true;
 			scrollContainerRef.current.scrollTo({
 				top: scrollContainerRef.current.scrollHeight,
 				behavior: 'auto',
 			});
+			setTimeout(() => {
+				isProgrammaticScrollRef.current = false;
+			}, 32);
 		}
 	});
 };

Also applies to: 1494-1501

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/components/TerminalOutput.tsx` around lines 1372 - 1382, The
guard isProgrammaticScrollRef.current can remain true if scrollToBottom is a
no-op, so update the scroll handler around autoScrollAiMode (the throttled
handler that currently checks isProgrammaticScrollRef.current and calls
setAutoScrollPaused(true)) to always clear isProgrammaticScrollRef.current on
any scroll event (not only when true) and additionally, in the code path that
performs scrollToBottom, start a short fallback timer (>= the throttle delay,
e.g. 16ms) that clears isProgrammaticScrollRef.current after the delay to cover
no-scroll cases; ensure the timer is cancelled or reset on subsequent
programmatic scrolls to avoid races and reuse the same references
(isProgrammaticScrollRef, scrollToBottom, setAutoScrollPaused) so behavior
remains consistent.

When scrollTo() is called but the container is already at the bottom,
the browser may not fire a scroll event. The guard flag stays true
indefinitely, causing the next genuine user scroll-up to be silently
consumed instead of pausing auto-scroll.

Add a 32ms setTimeout fallback (2x the 16ms throttle window) that
clears the flag if the throttled handler hasn't consumed it. If the
handler does consume it first, the timeout harmlessly sets false again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openasocket
Copy link
Contributor Author

Addressed Review Round 2 — a4b74d0e

Fixed: Stale guard on no-op scroll (Finding #4)

When scrollTo() is called but the container is already at the bottom, the browser doesn't fire a scroll event. The isProgrammaticScrollRef guard stays true indefinitely — the next genuine user scroll-up gets silently consumed instead of pausing auto-scroll.

Fix: Added a 32ms setTimeout fallback in scrollToBottom that clears the flag if the throttled handler hasn't consumed it. 32ms is 2x the 16ms throttle window, guaranteeing the handler has had its chance. If the handler does consume it first, the timeout harmlessly sets false again (idempotent).

Skipped: tool_call_id mapping (Finding #3)

The suggestion to replace lastToolName with an ID-based Map<string, string> keyed by tool_call_id is over-engineered for our use case. Codex CLI outputs sequential JSONL — tool_call always immediately precedes its tool_result in the stdout stream. Out-of-order events can't happen with CLI stdout. The existing test at lines 427-443 validates the intended behavior: an orphaned tool_result correctly gets undefined as its toolName.

Tests: All 19 auto-scroll tests pass, TypeScript compiles clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AI chat auto-scroll does not follow streaming updates when Thinking/Tool Calls are visible

2 participants