Skip to content

feat: add window capture MCP tool for GUI screenshots#279

Open
mokcontoro wants to merge 9 commits intodevelopfrom
feature/window-capture-tool
Open

feat: add window capture MCP tool for GUI screenshots#279
mokcontoro wants to merge 9 commits intodevelopfrom
feature/window-capture-tool

Conversation

@mokcontoro
Copy link
Copy Markdown
Contributor

@mokcontoro mokcontoro commented Mar 22, 2026

Summary

  • Add capture_window MCP tool that captures X11 GUI windows (TurtleSim, RViz, Gazebo, etc.) and returns them as ImageContent
  • Add list_windows MCP tool to discover available GUI windows
  • Enables AI to see what's displayed in ROS GUI applications without needing a camera topic

How it works

The tool uses python3-xlib to find and capture X11 windows by name. It converts the raw pixel data to JPEG and returns it as ImageContent that displays inline in the AI client.

capture_window(window_name="TurtleSim")
capture_window(window_name="RViz", resize_width=640, resize_height=480)
list_windows()

Use cases

  • AI observes turtlesim after sending movement commands
  • AI reads RViz visualizations for debugging
  • AI monitors Gazebo simulation state
  • Remote monitoring of any ROS GUI application
  • Works with any X11 window (WSLg, native Linux)

Dependencies

  • python3-xlib (optional — tools gracefully return an error message if not installed)
  • pillow, numpy (already in project dependencies)

Test plan

  • list_windows() returns available windows
  • capture_window(window_name="TurtleSim") returns screenshot as ImageContent
  • capture_window with resize option works
  • Graceful error when window not found (shows available windows)
  • Graceful error when X11 dependencies not installed

Tested on ROS 2 Jazzy / WSL2 (WSLg) with TurtleSim.

🤖 Generated with Claude Code

@stex2005
Copy link
Copy Markdown
Collaborator

That's a very cool feature, can be useful for integration tests too. I will try it.

@mokcontoro
Copy link
Copy Markdown
Contributor Author

woutervhaaften and others added 8 commits March 24, 2026 08:28
New users unfamiliar with MCP or rosbridge need a quick plain-language
explanation before diving into installation. Added:

- "What is this?" section with ASCII architecture diagram
- "What you need" 3-point prerequisites list
- Clearer formatting for key benefits section

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: safe response handling in services.py (#257)

* fix: safe response handling in topics.py (#258)

* fix: safe response handling in nodes.py (#259)

* fix: safe response handling in parameters.py (#260)

* fix: safe response handling in actions.py (#261)

* fix: safe response handling in ros_metadata.py (#251)
Add capture_window and list_windows MCP tools that capture X11 GUI
windows (TurtleSim, RViz, Gazebo, etc.) and return them as ImageContent.
This enables the AI to see what's displayed in ROS GUI applications.

Features:
- capture_window: screenshot any window by name, returns ImageContent
- list_windows: list all available GUI windows with sizes
- Optional resize support for bandwidth control
- Graceful fallback when X11/dependencies not available

Dependencies: python3-xlib (optional, for X11 capture)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Bump version to 3.1.0
- Update restructuring_plan.md with window capture tool category (33 tools)
- Update README.md features list with GUI window capture
- Add unit tests for window capture tools (test_window_capture.py)
- Add conftest.py stub for test environment compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@stex2005 stex2005 force-pushed the feature/window-capture-tool branch from a0c4e96 to 8aaad81 Compare March 24, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants