Feature: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.#100
Feature: Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.#100volkermauel wants to merge 4 commits intoopen-webui:mainfrom
Conversation
…use (open-webui#44) Introduce an optional virtual desktop environment (Xvfb + x11vnc + noVNC + openbox) that enables programmatic GUI interaction via screenshot, mouse, and keyboard APIs. Feature-gated behind OPEN_TERMINAL_ENABLE_DESKTOP (default false) so existing deployments are unaffected. When enabled, exposes: GET /desktop – status (running, screen size, ports) POST /desktop/start – start Xvfb, VNC, noVNC, window manager POST /desktop/stop – tear down all desktop processes POST /desktop/screenshot – capture PNG (base64 JSON or raw binary) POST /desktop/click – mouse click at (x, y) POST /desktop/mouse_move – move cursor POST /desktop/drag – mouse drag operation POST /desktop/type – type text into focused window POST /desktop/key – press key / key combo (e.g. ctrl+c) POST /desktop/scroll – scroll at position All endpoints are behind the existing API key auth. Desktop info is added to the system prompt and /api/config when enabled so agents can discover the capability. Includes integration test script (test_desktop.sh, 27 test cases) that verifies the full lifecycle against a running container.
- Backend: WebSocket proxy for noVNC/websockify with JWT auth via query param - Frontend: DesktopViewer component with start/stop/refresh/fullscreen controls - API helpers for desktop status, start, stop - Unit tests for backend router and frontend API helpers Ref: open-webui/open-terminal#100
- Backend: WebSocket proxy for noVNC/websockify with JWT auth via query param - Frontend: DesktopViewer component with start/stop/refresh/fullscreen controls - API helpers for desktop status, start, stop - Unit tests for backend router and frontend API helpers Ref: open-webui/open-terminal#100
|
In your testing did you notice how much memory footprint changed with these enabled? Just curious. |
|
i did not to be honest, sorry |
|
the grounding for clicking seems to be off, currently investigating with grid overlays and instructing the model as part of the result of desktop_screenshot. converting to draft as of now, until i feel this is more mature. |
…mputer use desktop - Add desktop_windows and desktop_window_focus endpoints for listing and focusing windows on the virtual desktop - Add persistent cursor overlay (Xlib) visible in VNC after xdotool moves, hides during clicks to avoid blocking input - Auto-start desktop on first tool call, hide non-essential tools from harness model when grounding model is configured - Remove screenshots from action tool results (status-only returns) - Add dmz-cursor-theme and x11-xserver-utils to Dockerfile - Remove hardcoded screen dimensions from tool descriptions
|
Great work! For a containerized desktop environment, https://github.com/m1k1o/neko could be a great reference. It uses WebRTC instead of VNC for output, but for a GUI-based AI tool, I think noVNC is enough. |
|
Thanks for your feedback. i‘ve opened a respective PR on open-webui to embed the novnc in the frontend. multi-user support does work in theory, but in my approach i‘m using a proxy in between that spawns a per-user pod in k8s and attaches a PVC,so the users are fully seperated in terms of data access and resources |
k8s? That sounds more like terminals, while I mean in single container, just as the multi user support here does. Or at least we should document this. |
Summary
Implements #44 — Virtual Desktop & noVNC Integration for GUI-based "Computer Use" capabilities.
Adds an optional virtual desktop environment (Xvfb + x11vnc + noVNC + openbox) that enables programmatic GUI interaction via screenshot, mouse, and keyboard APIs. Feature-gated behind
OPEN_TERMINAL_ENABLE_DESKTOP(defaultfalse) so existing deployments are unaffected.Changes
open_terminal/utils/desktop.py(new) —DesktopManagerclass managing Xvfb, x11vnc, noVNC, and openbox lifecycle, with screenshot (scrot), mouse (xdotool), and keyboard input methodsopen_terminal/env.py— Added 5 config vars:ENABLE_DESKTOP,DESKTOP_DISPLAY,DESKTOP_SCREEN_SIZE,DESKTOP_VNC_PORT,DESKTOP_NOVNC_PORTopen_terminal/main.py— 10 new API endpoints, desktop info in system prompt and/api/configDockerfile— Added xvfb, x11vnc, novnc, openbox, xdotool, scrot, chromium, fonts; exposed port 6080entrypoint.sh— DISPLAY setup and stale X11 lock file cleanup when desktop is enabledtest_desktop.sh(new) — 27 integration tests covering full desktop lifecycleAPI Endpoints
/desktop/desktop/start/desktop/stop/desktop/screenshot/desktop/click/desktop/mouse_move/desktop/drag/desktop/type/desktop/keyctrl+c)/desktop/scrollUsage
http://localhost:8000http://localhost:6080/vnc.htmlTest Results
All 27 integration tests pass against a locally built container.