Overview
Add a first-class Computer Use pane (compuse) to AgentMux, giving agents the ability to see and control the desktop — screenshots, mouse, keyboard, scroll — using the Anthropic computer_20251124 tool API. Modeled after Claude Code's new built-in computer-use MCP server (shipped v2.1.85, ~2026-03-26).
Background
On March 26, 2026 Anthropic shipped computer-use as a built-in MCP server in Claude Code. It lets Claude take screenshots, move the mouse, click, type, scroll, and switch displays on a real desktop. AgentMux should own this loop natively — the agent pane drives the tool loop, the CEF backend executes actions, and the pane shows a live annotated view.
The Anthropic Computer Use API
Tool definition:
{
"type": "computer_20251124",
"name": "computer",
"display_width_px": 1920,
"display_height_px": 1080,
"enable_zoom": true
}
Beta header required: computer-use-2025-11-24 (Opus 4.6, Sonnet 4.6, Opus 4.5).
Full action surface: screenshot, left_click, right_click, double_click, triple_click, mouse_move, left_click_drag, left_mouse_down, left_mouse_up, scroll, type, key, hold_key, wait, zoom
Agent loop:
Pane sends prompt + tools to Claude API
↓
Claude returns tool_use: { action: "screenshot" }
↓
AgentMux executes action → captures screenshot
↓
AgentMux returns tool_result with base64 image
↓
Claude reasons → next action
↓
Loop until task done
AgentMux is the execution layer — Claude is purely the reasoning engine.
Coordinate scaling: Screenshots are downsampled to max 1568px long edge before sending to Claude. Coordinates returned by Claude must be scaled back to actual screen resolution before executing clicks.
Architecture (CEF host)
┌─────────────────────────────────────────────────────┐
│ AgentMux Frontend (SolidJS in CEF webview) │
│ ┌──────────────────────────────────────────────┐ │
│ │ CompUsePane │ │
│ │ ├── ScreenView (annotated screenshot) │ │
│ │ ├── ActionLog (click/type/scroll history) │ │
│ │ ├── PromptBar │ │
│ │ └── AppApprovalDialog │ │
│ └──────────────────────────────────────────────┘ │
└────────────────────────┬────────────────────────────┘
│ HTTP POST IPC (cef-api.ts → agentmuxsrv-rs)
┌────────────────────────▼────────────────────────────┐
│ agentmuxsrv-rs (Rust sidecar) │
│ ├── AnthropicClient (computer_20251124 tool loop) │
│ ├── ScreenCapture (xcap crate) │
│ ├── InputDriver (enigo crate) │
│ ├── AppApprovalStore (session-scoped) │
│ └── CoordinateScaler │
└─────────────────────────────────────────────────────┘
spawned by agentmux-cef (CEF host)
Rust crates
xcap = "0.2" # screen capture — ScreenCaptureKit (macOS), DXGI (Windows)
enigo = "0.2" # mouse + keyboard simulation, cross-platform
accessibility = "0.1" # macOS AXUIElement — structural UI queries (Phase 2)
uiautomation = "0.3" # Windows UIA — structural UI queries (Phase 2)
New IPC commands (agentmuxsrv-rs)
POST /api/compuse/start { pane_id, prompt } → { session_id }
POST /api/compuse/approve { session_id, app_name } → {}
POST /api/compuse/cancel { session_id } → {}
GET /api/compuse/screenshot?session_id=… → { image_b64, width, height }
Push events → frontend (via CEF bridge / CustomEvent)
compuse:action { session_id, action_type, coordinate?, text? }
compuse:screenshot { session_id, image_b64, width, height }
compuse:approval { session_id, app_name, tier }
compuse:done { session_id, summary }
compuse:error { session_id, message }
UI
┌─────────────────────────────────────────────┐
│ [●] Screenshot [▶ Running] [✕ Cancel] │ ← toolbar
├─────────────────────────────────────────────┤
│ │
│ [annotated screenshot with action dot] │ ← ScreenView
│ │
├─────────────────────────────────────────────┤
│ ✓ screenshot │
│ ✓ left_click (1240, 340) │ ← ActionLog
│ ✓ type "search query" │
│ ⟳ screenshot │
├─────────────────────────────────────────────┤
│ [ Task prompt... ] [▶] │ ← PromptBar
└─────────────────────────────────────────────┘
Action annotations overlaid on screenshot: click → pulsing dot, type → keyboard icon, scroll → arrow.
widgets.json entry
{
"defwidget@compuse": {
"view": "compuse",
"display": { "label": "computer use", "order": 5, "icon": "desktop", "color": "#8b5cf6", "visible": true }
}
}
Security Model (mirrors Claude Code)
- Session-scoped app approval — user must approve each app before the agent can control it; approvals cleared on pane close
- App tier warnings — Terminal/shell apps flagged as "equivalent to shell access"; Finder as "can read or write any file"; System Settings as "can change system settings"
- Never auto-approve Terminal, Finder, System Settings
- Machine-wide mutex — one active computer-use session at a time (
Mutex<Option<SessionId>> in agentmuxsrv-rs)
- No self-screenshot — AgentMux window excluded from captures (prevents prompt injection from its own UI)
- Credential redact toggle — option to black out password fields before sending to Claude
Platform Support
| Feature |
macOS |
Windows |
Linux |
| Screenshot |
xcap (ScreenCaptureKit) ✓ |
xcap (DXGI) ✓ |
xcap (X11/PipeWire) ⚠ |
| Mouse simulation |
enigo ✓ |
enigo ✓ |
enigo (X11) ⚠ |
| Keyboard simulation |
enigo ✓ |
enigo ✓ |
enigo (X11) ⚠ |
| Structural UI queries |
accessibility (AXUIElement) Ph2 |
uiautomation (UIA) Ph2 |
AT-SPI2 (future) |
| Permissions |
TCC (Accessibility + Screen Recording) |
None required |
distro-dependent |
macOS: agentmux-cef needs NSAccessibilityUsageDescription + NSScreenCaptureUsageDescription in Info.plist. Portable ZIP distribution — no App Store sandbox constraint.
Phased Plan
Phase 1 — Windows + macOS pixel-based MVP
Phase 2 — Structural UI queries
Phase 3 — Power features
Open Questions
- Should the model be configurable per-pane (Opus 4.6 for complex tasks, Sonnet 4.6 for speed/cost)?
- Do we want to optionally proxy to Claude Code's built-in
computer-use MCP server when installed?
- UX for mid-task clarification questions from the agent?
References
Overview
Add a first-class Computer Use pane (
compuse) to AgentMux, giving agents the ability to see and control the desktop — screenshots, mouse, keyboard, scroll — using the Anthropiccomputer_20251124tool API. Modeled after Claude Code's new built-incomputer-useMCP server (shipped v2.1.85, ~2026-03-26).Background
On March 26, 2026 Anthropic shipped
computer-useas a built-in MCP server in Claude Code. It lets Claude take screenshots, move the mouse, click, type, scroll, and switch displays on a real desktop. AgentMux should own this loop natively — the agent pane drives the tool loop, the CEF backend executes actions, and the pane shows a live annotated view.The Anthropic Computer Use API
Tool definition:
{ "type": "computer_20251124", "name": "computer", "display_width_px": 1920, "display_height_px": 1080, "enable_zoom": true }Beta header required:
computer-use-2025-11-24(Opus 4.6, Sonnet 4.6, Opus 4.5).Full action surface:
screenshot,left_click,right_click,double_click,triple_click,mouse_move,left_click_drag,left_mouse_down,left_mouse_up,scroll,type,key,hold_key,wait,zoomAgent loop:
AgentMux is the execution layer — Claude is purely the reasoning engine.
Coordinate scaling: Screenshots are downsampled to max 1568px long edge before sending to Claude. Coordinates returned by Claude must be scaled back to actual screen resolution before executing clicks.
Architecture (CEF host)
Rust crates
New IPC commands (agentmuxsrv-rs)
Push events → frontend (via CEF bridge / CustomEvent)
UI
Action annotations overlaid on screenshot: click → pulsing dot, type → keyboard icon, scroll → arrow.
widgets.json entry
{ "defwidget@compuse": { "view": "compuse", "display": { "label": "computer use", "order": 5, "icon": "desktop", "color": "#8b5cf6", "visible": true } } }Security Model (mirrors Claude Code)
Mutex<Option<SessionId>>in agentmuxsrv-rs)Platform Support
xcap(ScreenCaptureKit) ✓xcap(DXGI) ✓xcap(X11/PipeWire) ⚠enigo✓enigo✓enigo(X11) ⚠enigo✓enigo✓enigo(X11) ⚠accessibility(AXUIElement) Ph2uiautomation(UIA) Ph2macOS:
agentmux-cefneedsNSAccessibilityUsageDescription+NSScreenCaptureUsageDescriptioninInfo.plist. Portable ZIP distribution — no App Store sandbox constraint.Phased Plan
Phase 1 — Windows + macOS pixel-based MVP
xcapscreen capture in agentmuxsrv-rsenigomouse + keyboard in agentmuxsrv-rscomputer_20251124tool loopAppApprovalStoredefwidget@compusein widgets.jsonPhase 2 — Structural UI queries
accessibilitycrate — AXUIElement on macOS (find button by label, not pixel)uiautomationcrate — Windows UIA element treePhase 3 — Power features
Open Questions
computer-useMCP server when installed?References
docs/specs/computer-use-pane.md