Linux/X11 computer-use tools for AI agents — Pi, Claude Code, OpenCode, and any MCP-aware client. AT-SPI + xdotool, ~1k LOC.
A Linux port of @injaneity/pi-computer-use. One bridge, three frontends:
- Pi extension for
mariozechner/pi-coding-agent - MCP server for Claude Code (and any other MCP host)
- MCP server for OpenCode
The macOS original uses Apple's Accessibility API + AppleScript + ScreenCaptureKit (~6,800 lines of Swift + TS). This port replaces the entire native layer with AT-SPI 2 + xdotool + scrot, ships a single ~470-line Python bridge, and trims the tool surface from 15 → 8 to keep prompts cheap.
| upstream macOS | this port | |
|---|---|---|
| Total LOC | ~6,866 | ~1,200 (-83%) |
| Tools registered | ~15 | 8 |
| Native helper | 2,065 lines Swift | 471 lines Python |
| Runtime deps | Swift toolchain, codesign | python3-gi, xdotool, wmctrl, scrot |
| Frontends | macOS only | Pi · Claude Code · OpenCode · any MCP client |
# Debian/Ubuntu
sudo apt-get install -y python3 python3-gi gir1.2-atspi-2.0 xdotool wmctrl scrot
# Enable AT-SPI on the desktop session (GNOME)
gsettings set org.gnome.desktop.interface toolkit-accessibility trueX11 only — Wayland sessions cannot capture other-app windows or synthesize input via xdotool. Run a GNOME-on-Xorg, KDE-on-X11, or XFCE session.
pi install git:github.com/tak-uukti/linux-computer-use@v0.2.0The postinstall script writes a small bash wrapper to ~/.pi/agent/helpers/linux-computer-use/bridge that execs python3 bridge/bridge.py. No build step, no codesign, no native compile.
In a Pi session, call screenshot first — it picks the focused window, returns AT-SPI refs (@e1, @e2, …) plus a PNG, then you can click({ref:"@e3"}), set_text({ref:"@e2", text:"…"}), etc.
Installable as an MCP server straight from GitHub via uvx (no clone, no manual venv):
claude mcp add linux-computer-use -- uvx --from git+https://github.com/tak-uukti/linux-computer-use linux-computer-use-mcpOr, equivalently, drop this into your Claude Code MCP config file (~/.claude.json under mcpServers):
{
"mcpServers": {
"linux-computer-use": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/tak-uukti/linux-computer-use",
"linux-computer-use-mcp"
]
}
}
}Restart Claude Code; the 8 tools (list_windows, screenshot, click, type_text, set_text, keypress, scroll, computer_actions) appear under the linux-computer-use namespace.
Add to ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"linux-computer-use": {
"type": "local",
"command": [
"uvx",
"--from",
"git+https://github.com/tak-uukti/linux-computer-use",
"linux-computer-use-mcp"
],
"enabled": true
}
}
}Restart OpenCode and the tools become available to the agent.
8 total. Schemas are deliberately terse — see extensions/computer-use.ts (Pi) or mcp_server/server.py (MCP).
| name | purpose |
|---|---|
list_windows |
enumerate visible X11 windows; returns @wN, title, pid, geometry, focus state |
screenshot |
focus a window, capture PNG, walk AT-SPI tree → @eN targets with role / name / bounds / capabilities |
click |
click @eN, @wN, or x,y; supports button and clickCount |
type_text |
xdotool-type literal text at the cursor |
set_text |
replace value of an @eN text/entry via AT-SPI EditableText (falls back to focus + Ctrl+A + type) |
keypress |
press keys/chords — ["Return"], ["Ctrl","A"], ["ctrl+l","Return"], etc. |
scroll |
scroll at ref/coords by pixel delta |
computer_actions |
batch up to 20 actions in a single call |
┌──────────────────────────────────────────────┐
│ Pi Claude Code OpenCode │
└──────┬─────────────┬──────────────────┬──────┘
│ │ │
│ extension │ MCP stdio │ MCP stdio
▼ ▼ ▼
┌──────────────┐ ┌────────────────────────────┐
│ extensions/ │ │ mcp_server/server.py │
│ computer- │ │ FastMCP wrapper (8 tools) │
│ use.ts │ └─────────────┬──────────────┘
└──────┬───────┘ │
│ │
▼ ▼
┌──────────────────────────────────────────────┐
│ bridge/bridge.py newline-JSON over stdio │
│ AT-SPI walk · xdotool · wmctrl · scrot │
└──────────────────────────────────────────────┘
The AT-SPI walker is depth-capped (12) and element-capped (200) to keep prompts lean. Element bounds use SCREEN coords with a fallback to WINDOW coords + window offset (necessary for GTK4 / Xwayland which report SCREEN as 0,0).
These captures are from the bridge running against a Xvfb :99 + openbox session, driving real Linux apps. Screenshots taken via scrot after the bridge issued the actions.
keypress: 7, +, 8, Return → display shows 15. 26 AT-SPI elements detected, every push button reports canPress: true and accurate bounds.
computer_actions: [click @e3, click @e7] (which the bridge resolves to push buttons "4" and "5") → display shows 45.
type_text: "Hello sir, … Linux X11 + AT-SPI + xdotool working end-to-end." → 169 characters typed. 190 AT-SPI elements found in gedit's window.
keypress ctrl+a → keypress Delete → type_text "Taksheel". Status bar reads Ln 1, Col 9.
| App | screenshot | AT-SPI refs | input |
|---|---|---|---|
| gnome-calculator | ✅ | ✅ 26 elements, full action metadata | ✅ |
| gedit | ✅ | ✅ 190 elements | ✅ |
| GTK / Qt apps with AT-SPI | ✅ | ✅ | ✅ |
| Google Chrome / Chromium | ✅ | --force-renderer-accessibility |
✅ (coords / keypress) |
| Firefox | ✅ | ✅ on a real session (gates on gsettings toolkit-accessibility) |
✅ |
| Electron apps | ✅ | --force-renderer-accessibility |
✅ |
| LibreOffice (real Xorg session) | ✅ | ✅ via SAL_USE_COMMON_ONE_ACCESSIBILITY=1 |
✅ |
| Xvfb / nested X | ✅ | partial (some apps misbehave under Xvfb without a real session bus) | ✅ |
- X11 only. Wayland sessions cannot capture other-app windows or synthesize input via xdotool.
- Apps must export AT-SPI for
@eNrefs to populate. Most GTK / Qt apps do; Electron / Chromium need--force-renderer-accessibility. - Mouse cursor physically moves — no stealth pointer on X11.
- Dropped vs upstream:
move_mouse,drag,wait,double_click,arrange_window,navigate_browser,list_apps. Usekeypress,type_text, andcomputer_actionsto compose what you need.
git clone https://github.com/tak-uukti/linux-computer-use
cd linux-computer-use
# Pi side (TypeScript)
npm install
npm run typecheck
# Bridge sanity
python3 -c "import ast; ast.parse(open('bridge/bridge.py').read())"
echo '{"id":"1","cmd":"list_windows"}' | python3 bridge/bridge.py
# MCP side
python3 -m venv .venv && .venv/bin/pip install -e .
.venv/bin/linux-computer-use-mcp # speaks MCP over stdioThe Pi extension API surface is stubbed locally in src/types.ts so typecheck runs without @mariozechner/pi-coding-agent installed.
.
├── assets/ logo + screenshots
├── bridge/
│ ├── bridge.py 471-line Python helper (AT-SPI + xdotool + scrot)
│ └── requirements.txt
├── extensions/
│ └── computer-use.ts Pi tool registration + JSON schemas
├── mcp_server/
│ ├── __init__.py
│ └── server.py FastMCP wrapper around the bridge (8 tools)
├── scripts/
│ └── setup-helper.mjs Pi postinstall — writes ~/.pi/.../bridge wrapper
├── skills/computer-use/SKILL.md pi skill — Quick Start + Pitfalls
├── src/
│ ├── bridge.ts Pi-side subprocess manager + JSON-line protocol
│ └── types.ts local stubs for the pi-coding-agent extension API
├── package.json npm metadata (Pi extension)
├── pyproject.toml MCP server packaging (uvx-installable)
├── tsconfig.json
├── CHANGELOG.md
├── LICENSE
└── README.md
@injaneity/pi-computer-use— macOS original, design and protocol shape.@mariozechner/pi-coding-agent— the Pi agent.- Model Context Protocol — Claude Code / OpenCode interop.
- AT-SPI 2, xdotool, wmctrl, scrot — the Linux building blocks doing all the real work.
MIT © 2026 Tak1tak · built by Tak1tak




