Real-time screen capture with Vision API for AI agents. GPU-accelerated, change detection, and an HTTP server that lets any AI agent autonomously see your screen.
Built for Windows AI automation workflows — no manual screenshots needed.
- Live Desktop Stream in a VS Code panel at configurable frame rates (1-10 FPS)
- GPU-Accelerated via DXGI Desktop Duplication (
node-screenshots) — minimal CPU usage - Automatic Fallback to PowerShell-based capture when GPU is unavailable
- Dark Theme HUD with live FPS counter, status indicator, pause/resume controls
- HD Capture button for full-resolution PNG screenshots
- HTTP server on
127.0.0.1:7899— any agent or tool can request screen frames - Change Detection — only returns frames when the screen actually changed (saves bandwidth + tokens)
- 5 endpoints:
/frame,/frame/changed,/frame/hd,/status,/capture - Zero user interaction — agents see the screen autonomously
- Works with any language (Python, Node.js, Rust, etc.) via simple HTTP GET
- AI-Ready Screenshots saved to
~/.local-agent/screenshots/ - Compatible with the Local Agent Python framework for full computer-use automation
- Frame data returned as JPEG/PNG for direct LLM vision input
- Open the Command Palette (
Ctrl+Shift+P) - Run Local Agent: Iniciar Screen Viewer
- A panel opens showing your live desktop
- Hover to reveal Pause and Capture HD buttons
Ctrl+Shift+P> Local Agent: Iniciar Vision Server (API para Agente)- The server starts on
http://127.0.0.1:7899 - Any process can now request frames:
# Get current screen frame
curl http://127.0.0.1:7899/frame -o screen.jpg
# Get frame only if screen changed (304 if no change)
curl http://127.0.0.1:7899/frame/changed -o screen.jpg
# Full-resolution PNG
curl http://127.0.0.1:7899/frame/hd -o screen_hd.png
# Server status
curl http://127.0.0.1:7899/status
# Force immediate capture
curl -X POST http://127.0.0.1:7899/capture -o capture.jpg# Python example
import httpx
r = httpx.get("http://127.0.0.1:7899/frame")
with open("screen.jpg", "wb") as f:
f.write(r.content)
# Check if screen changed
r = httpx.get("http://127.0.0.1:7899/frame/changed")
if r.status_code == 200:
print("Screen changed!", len(r.content), "bytes")
elif r.status_code == 304:
print("No change")| Command | Description |
|---|---|
Local Agent: Iniciar Screen Viewer |
Opens viewer panel + starts Vision Server |
Local Agent: Parar Screen Viewer |
Stops capture and closes panel |
Local Agent: Capturar Tela para IA |
Saves high-res PNG to ~/.local-agent/screenshots/ |
Local Agent: Iniciar Vision Server |
Starts HTTP API only (no webview panel) |
Local Agent: Parar Vision Server |
Stops the HTTP Vision Server |
| Setting | Default | Description |
|---|---|---|
localAgent.screenViewer.fps |
5 |
Frames per second (1-10) |
localAgent.screenViewer.quality |
70 |
JPEG stream quality (1-100) |
localAgent.screenViewer.scale |
0.5 |
Image scale factor (0.1-1.0) |
localAgent.visionServer.port |
7899 |
Vision Server HTTP port |
localAgent.visionServer.diffThreshold |
0.02 |
Change detection threshold (0.001-0.5) |
Returns the latest screen frame as JPEG.
Response Headers:
X-Frame-Timestamp— Unix timestamp (ms) of the frameX-Frame-Changed—"true"or"false"(change detection result)
Returns the frame only if the screen changed since the last request. Returns 304 Not Modified if unchanged.
Captures and returns a full-resolution PNG (on demand, not from the stream).
Returns JSON with server and capture state:
{
"server": "vision-server",
"version": "0.2.0",
"running": true,
"capture": {
"active": true,
"backend": "GPU (DXGI)",
"hasFrame": true,
"frameCount": 1234,
"changeDetected": true
}
}Forces an immediate capture and returns the frame as JPEG.
+------------------------------------------+
| VS Code Extension |
| +----------------+ +----------------+ |
| | ScreenCapture |->| ChangeDetector | |
| | (DXGI / PS) | | (pixel sample) | |
| +----------------+ +-------+--------+ |
| | |
| +---------------------------v---------+ |
| | VisionServer :7899 | |
| | GET /frame | /frame/changed | /hd | |
| | GET /status | POST /capture | |
| +---------------------------+---------+ |
+---------------------------- -|------------+
| HTTP
+-------------------v-------------------+
| Any AI Agent (Python, Node, etc.) |
| httpx.get("127.0.0.1:7899/frame") |
+---------------------------------------+
| Backend | Technology | Performance |
|---|---|---|
| GPU (primary) | DXGI Desktop Duplication via node-screenshots |
Fastest, minimal CPU |
| PowerShell (fallback) | System.Drawing screen capture |
Works on all Windows |
Image resizing and JPEG encoding use sharp for optimal performance.
- Windows 10/11 (DXGI + PowerShell capture are Windows-specific)
- VS Code 1.85+
- Windows only — DXGI and PowerShell capture are Windows-specific APIs
- Primary monitor — captures the primary display only
- Vision Server — binds to
127.0.0.1(localhost only, not exposed to network)
Contributions are welcome! Please open an issue or pull request on GitHub.
MIT - Anderson Belem (Otimiza.pro)