Desktop Control for Windows

Desktop Control for Windows is a Codex skill for controlling visible Windows desktop applications through local Python primitives. It can move and click the mouse, type text, paste through the clipboard, take screenshots, inspect pixels, match images, manage windows, coordinate a shared UI lock, and run deterministic multi-step action plans.

This project is intended for situations where no reliable API, DOM, CLI, or app-specific automation surface exists.

What Is Included

SKILL.md with the coordinator and UI-worker workflow
scripts/ui_control.py with keyboard, mouse, screen, window, clipboard, lock, overlay, and plan commands
references/control-api.md with CLI examples
references/subagent-workflow.md with worker prompt templates
agents/openai.yaml with Codex UI metadata

Safety Model

This skill can read the screen, inspect or modify the clipboard, type into the active window, click, drag, scroll, and close windows. Use it only in sessions where that level of local control is acceptable.

Recommended defaults:

Keep PyAutoGUI failsafe enabled.
Use the global UI lock for multi-step tasks.
Use --dry-run for generated plans.
Start the warm overlay when a UI worker begins, then switch it to the cool completion state when the worker finishes.
Use --require-approval for risky manual tests.
Keep plan JSON limited to action-specific fields. Global safety and lock options belong on the command line, not inside individual plan actions.
Avoid secrets, payments, UAC prompts, password managers, banking flows, and destructive actions unless the user explicitly requested the exact action.

The bundled controller is local-only. scripts/ui_control.py does not call external network services.

Install As A Codex Skill

Clone or copy this repository into your Codex skills directory:

$skills = if ($env:CODEX_HOME) { Join-Path $env:CODEX_HOME "skills" } else { Join-Path $HOME ".codex\skills" }
git clone https://github.com/BraveCowNoFear/desktop-control-for-windows.git (Join-Path $skills "desktop-control-for-windows")

Install Python dependencies in the Python environment Codex will use:

python -m pip install -r requirements.txt

Verify the CLI:

cd (Join-Path $skills "desktop-control-for-windows")
python scripts\ui_control.py --help
python scripts\ui_control.py status --windows

Quick CLI Examples

Run commands from the skill directory:

python scripts\ui_control.py overlay --mode start --task "example"
python scripts\ui_control.py lock acquire --owner "example"
python scripts\ui_control.py --lock-token <token> status --windows
python scripts\ui_control.py --lock-token <token> screenshot --out "$env:TEMP\screen.png"
python scripts\ui_control.py --lock-token <token> screenshot --out "$env:TEMP\active.png" --active
python scripts\ui_control.py --lock-token <token> snapshot --out "$env:TEMP\state.png" --windows --active
python scripts\ui_control.py --lock-token <token> find-image C:\path\button.png --window "Chrome"
python scripts\ui_control.py --lock-token <token> hotkey ctrl l
python scripts\ui_control.py --lock-token <token> type "hello world" --method paste
python scripts\ui_control.py lock release --token <token>
python scripts\ui_control.py overlay --mode finish --status success --task "example" --completed "Finished the requested UI task"

Global options such as --lock-token, --dry-run, and --require-approval must appear before the subcommand.

Use snapshot when a worker needs both status metadata and a screenshot in one call. Use --active or --window on screenshots, snapshots, and image search when the target app is known; it avoids full-screen capture and keeps visual loops smaller.

For the Siri-style status frame, call python scripts\ui_control.py overlay --mode start --task "..." when the UI worker takes control of the screen. This starts a warm-color click-through border. When the worker is done, call python scripts\ui_control.py overlay --mode finish --status success|partial|failed ... to switch the border to a cool completion state with result text. The completion state stays visible until the user clicks anywhere on the screen.

Provenance

This project was migrated from the local desktop-control patterns in ClawHub breckengan/control v1.0.0. The original ClawHub listing identified that package as MIT-0 licensed. Upstream demo scripts and rule-based app demos were intentionally not copied; Codex should keep high-level reasoning in the active agent loop and use plan for deterministic local batching.

License

MIT-0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agents		agents
references		references
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SKILL.md		SKILL.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Desktop Control for Windows

What Is Included

Safety Model

Install As A Codex Skill

Quick CLI Examples

Provenance

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Desktop Control for Windows

What Is Included

Safety Model

Install As A Codex Skill

Quick CLI Examples

Provenance

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages