Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

Corrects the file path for the terminal-bench agent CLI entry point and fixes worker resolution when running from source.

Changes

  • benchmarks/terminal_bench/mux-run.sh: Updated path from src/debug/agentSessionCli.ts to src/cli/debug/agentSessionCli.ts.
  • src/node/utils/main/workerPool.ts: Added support for resolving tokenizer.worker.ts when running directly from source (Bun), fixing crash when dist/ is missing.
  • docs/benchmarking.md: Updated documentation to reflect the correct CLI path.

Verification

  • Verified locally that src/cli/debug/agentSessionCli.ts starts successfully and processes input without worker errors.
  • Fixes the '0 successful trials' issue in nightly benchmarks caused by the agent crashing on startup.

@ammar-agent ammar-agent force-pushed the debug-terminal-bench-trials branch 2 times, most recently from 1a0544e to 4882a8b Compare November 24, 2025 02:06
ammario pushed a commit that referenced this pull request Nov 24, 2025
Generated with `mux`

Fixes flaky integration tests observed in PR #701.

Changes:
- **Runtime Bash Test**: Decoupled tool execution time verification from
total test duration. Now measures strict execution time (<10s) using
event timestamps, while allowing a generous overall timeout (30s) to
accommodate CI network/AI latency.
- **Web Search Test**: Lowered `thinkingLevel` to 'low' to improve speed
and reliability, and increased timeout to 180s.

Verification:
- `tests/ipcMain/runtimeExecuteBash.test.ts` passes locally.
- `tests/ipcMain/openai-web-search.test.ts` passes locally.
@ammar-agent ammar-agent force-pushed the debug-terminal-bench-trials branch from 4882a8b to ddd8cc6 Compare November 24, 2025 18:32
Updates mux-run.sh to point to correct src/cli/debug/agentSessionCli.ts path.
Updates workerPool.ts to support running tokenizer worker from source (Bun) when dist is missing.
Updates benchmarking docs to reflect correct path.
Adds scripts/check-bench-agent.sh to verify that the agent entry point referenced in mux-run.sh exists and loads without import errors.
Includes this check in the main 'static-check' make target.
@ammar-agent ammar-agent force-pushed the debug-terminal-bench-trials branch from ddd8cc6 to 1a518e0 Compare November 24, 2025 20:44
The previous fix used extname(__filename) === '.ts' to detect source
execution, but ts-jest also sets __filename to .ts while running under
Node. Node cannot load .ts workers directly (only Bun can), so we now
check process.isBun to distinguish between:
- Bun running .ts source -> use tokenizer.worker.ts
- Node/ts-jest running .ts source -> use tokenizer.worker.js from dist/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant