DOMShell — Browse the Web with Filesystem Commands (MCP Server) #693
apireno
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Pre-submission Checklist
Discussion Topic
I built an MCP server that maps Chrome's Accessibility Tree to a virtual filesystem. Instead of screenshots and pixel coordinates, agents use
ls,cd,grep,click, andtypeto navigate pages — the same way you'd work in a terminal.GitHub: github.com/apireno/DOMShell
npm:
npx @apireno/domshellWhy the Accessibility Tree?
Most browser automation feeds agents raw HTML or screenshots. The model burns through tool calls just figuring out what's on the page. Chrome's AX tree already solves this — it's a structured, role-annotated representation of the DOM that strips out layout noise and keeps semantics. DOMShell flattens it aggressively and maps it to a filesystem metaphor so agents can scope their work the way you'd
cdinto a directory.In controlled testing (Claude, 4 web tasks, 8 trials), this cut average API calls per task from 8.6 to 4.3 compared to screenshot-based browsing. Full experiment data →
What It Looks Like
Five calls. No screenshots. No coordinate math.
Quick Start
{ "mcpServers": { "domshell": { "command": "npx", "args": ["-y", "@apireno/domshell", "--allow-write"] } } }38 tools across three security tiers (read-only by default, write requires
--allow-write). Architecture and full tool list in the README.What I'm Working On
Currently testing with smaller local models (Qwen3-4B, Llama3.2-3B) to see how the filesystem metaphor holds up with tighter context windows. Also exploring headless mode for CI pipelines. Would love feedback on the tool design or if anyone's tried a similar AX-tree approach.
Beta Was this translation helpful? Give feedback.
All reactions