Skip to content

BraveCowNoFear/desktop-control-for-windows

Repository files navigation

Desktop Control for Windows

English | 简体中文

Desktop Control for Windows is a Codex skill for controlling visible Windows desktop applications through local Python primitives. It can move and click the mouse, type text, paste through the clipboard, take screenshots, inspect pixels, match images, manage windows, coordinate a shared UI lock, and run deterministic multi-step action plans.

This project is intended for situations where no reliable API, DOM, CLI, or app-specific automation surface exists.

What Is Included

  • SKILL.md with the coordinator and UI-worker workflow
  • scripts/ui_control.py with keyboard, mouse, screen, window, clipboard, lock, overlay, and plan commands
  • references/control-api.md with CLI examples
  • references/subagent-workflow.md with worker prompt templates
  • agents/openai.yaml with Codex UI metadata

Safety Model

This skill can read the screen, inspect or modify the clipboard, type into the active window, click, drag, scroll, and close windows. Use it only in sessions where that level of local control is acceptable.

Recommended defaults:

  • Keep PyAutoGUI failsafe enabled.
  • Use the global UI lock for multi-step tasks.
  • Use --dry-run for generated plans.
  • Start the warm overlay when a UI worker begins, then switch it to the cool completion state when the worker finishes.
  • Use --require-approval for risky manual tests.
  • Keep plan JSON limited to action-specific fields. Global safety and lock options belong on the command line, not inside individual plan actions.
  • Avoid secrets, payments, UAC prompts, password managers, banking flows, and destructive actions unless the user explicitly requested the exact action.

The bundled controller is local-only. scripts/ui_control.py does not call external network services.

Install As A Codex Skill

Clone or copy this repository into your Codex skills directory:

$skills = if ($env:CODEX_HOME) { Join-Path $env:CODEX_HOME "skills" } else { Join-Path $HOME ".codex\skills" }
git clone https://github.com/BraveCowNoFear/desktop-control-for-windows.git (Join-Path $skills "desktop-control-for-windows")

Install Python dependencies in the Python environment Codex will use:

python -m pip install -r requirements.txt

Verify the CLI:

cd (Join-Path $skills "desktop-control-for-windows")
python scripts\ui_control.py --help
python scripts\ui_control.py status --windows

Quick CLI Examples

Run commands from the skill directory:

python scripts\ui_control.py overlay --mode start --task "example"
python scripts\ui_control.py lock acquire --owner "example"
python scripts\ui_control.py --lock-token <token> status --windows
python scripts\ui_control.py --lock-token <token> screenshot --out "$env:TEMP\screen.png"
python scripts\ui_control.py --lock-token <token> screenshot --out "$env:TEMP\active.png" --active
python scripts\ui_control.py --lock-token <token> snapshot --out "$env:TEMP\state.png" --windows --active
python scripts\ui_control.py --lock-token <token> find-image C:\path\button.png --window "Chrome"
python scripts\ui_control.py --lock-token <token> hotkey ctrl l
python scripts\ui_control.py --lock-token <token> type "hello world" --method paste
python scripts\ui_control.py lock release --token <token>
python scripts\ui_control.py overlay --mode finish --status success --task "example" --completed "Finished the requested UI task"

Global options such as --lock-token, --dry-run, and --require-approval must appear before the subcommand.

Use snapshot when a worker needs both status metadata and a screenshot in one call. Use --active or --window on screenshots, snapshots, and image search when the target app is known; it avoids full-screen capture and keeps visual loops smaller.

For the Siri-style status frame, call python scripts\ui_control.py overlay --mode start --task "..." when the UI worker takes control of the screen. This starts a warm-color click-through border. When the worker is done, call python scripts\ui_control.py overlay --mode finish --status success|partial|failed ... to switch the border to a cool completion state with result text. The completion state stays visible until the user clicks anywhere on the screen.

Provenance

This project was migrated from the local desktop-control patterns in ClawHub breckengan/control v1.0.0. The original ClawHub listing identified that package as MIT-0 licensed. Upstream demo scripts and rule-based app demos were intentionally not copied; Codex should keep high-level reasoning in the active agent loop and use plan for deterministic local batching.

License

MIT-0. See LICENSE.

About

Codex skill for local Windows desktop control: mouse, keyboard, screen control, screenshots, windows, clipboard, image matching, and UI locks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages