Skip to content

feat: native mouse and keyboard control tools#515

Merged
senamakel merged 3 commits intotinyhumansai:mainfrom
senamakel:feat/mouse-keyboard
Apr 12, 2026
Merged

feat: native mouse and keyboard control tools#515
senamakel merged 3 commits intotinyhumansai:mainfrom
senamakel:feat/mouse-keyboard

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Apr 12, 2026

Summary

  • Adds two new native OS-control tools (mouse and keyboard) that use enigo for platform-native input simulation (Core Graphics on macOS, SendInput on Windows, X11/libxdo on Linux)
  • mouse tool: move, click, double_click, drag, scroll actions with absolute screen coordinates
  • keyboard tool: type (text string), press (single key), hotkey (key combinations like Ctrl+C) with comprehensive key name parser
  • Both tools are PermissionLevel::Dangerous and gated behind [computer_control] enabled = true in config (disabled by default)

Changes

File What
src/openhuman/tools/impl/computer/mouse.rs MouseTool — 5 actions, coordinate validation, button selection
src/openhuman/tools/impl/computer/keyboard.rs KeyboardTool — 3 actions, key name parser (modifiers, nav, arrows, F-keys, single chars), hotkey sequencing
src/openhuman/tools/impl/computer/mod.rs Module exports
src/openhuman/tools/impl/mod.rs Wire computer module
src/openhuman/tools/ops.rs Register tools conditionally when computer_control.enabled
src/openhuman/config/schema/tools.rs ComputerControlConfig struct
src/openhuman/config/schema/types.rs Add computer_control to Config
src/openhuman/config/schema/mod.rs Re-export ComputerControlConfig

Test plan

  • 37 unit tests covering schemas, key parsing, coordinate validation, parameter validation, and error handling
  • cargo check passes clean
  • cargo fmt applied
  • Pre-push hooks (format, lint, typecheck, rust check) all pass
  • Manual test: enable [computer_control] enabled = true, invoke mouse/keyboard tools via agent

Summary by CodeRabbit

  • New Features
    • Added keyboard control tool for typing, key presses, and shortcuts.
    • Added mouse control tool for cursor movement, clicking, dragging, and scrolling.
    • Added a top-level computer_control config toggle to enable/disable these input-control tools.
  • Tests
    • Added coverage verifying tools are present when enabled and absent when disabled.

- Introduced `ComputerControlConfig` to manage mouse and keyboard tool activation.
- Implemented `KeyboardTool` and `MouseTool` for native input control using platform-native APIs.
- Updated configuration schema to include computer control settings.
- Enhanced tool registration logic to conditionally include mouse and keyboard tools based on user configuration.
- Added comprehensive documentation and tests for new functionalities, ensuring robust integration and usability.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

Adds native mouse and keyboard tools, a new ComputerControlConfig flag (defaults false) in the config schema, and conditional registration of the new tools based on that flag.

Changes

Cohort / File(s) Summary
Config Schema
src/openhuman/config/schema/mod.rs, src/openhuman/config/schema/tools.rs, src/openhuman/config/schema/types.rs
Added ComputerControlConfig struct with pub enabled: bool (serde default, Default impl false), re-exported it in the schema module, and added computer_control: ComputerControlConfig to top-level Config with #[serde(default)].
Computer Tools Implementation
src/openhuman/tools/impl/computer/keyboard.rs, src/openhuman/tools/impl/computer/mouse.rs
Added KeyboardTool and MouseTool implementing Tool. Each validates parameters, enforces SecurityPolicy (can_act / record_action), runs Enigo operations in spawn_blocking, and returns ToolResult success/error. Includes unit tests and detailed key/button parsing and validation.
Computer Module & Re-exports
src/openhuman/tools/impl/computer/mod.rs, src/openhuman/tools/impl/mod.rs
Created computer module, declared keyboard and mouse submodules, and re-exported KeyboardTool and MouseTool via the computer module and main impl module.
Tool Registration
src/openhuman/tools/ops.rs
Updated all_tools_with_runtime to conditionally append MouseTool and KeyboardTool when root_config.computer_control.enabled is true; added tests asserting presence/absence based on the flag.

Sequence Diagram

sequenceDiagram
    participant Client as Client Request
    participant Registry as Tool Registry
    participant Tool as Mouse/Keyboard Tool
    participant Security as SecurityPolicy
    participant Enigo as Enigo (Native)
    participant Result as ToolResult

    Client->>Registry: request tool execution with params
    Registry->>Tool: execute(args)
    Tool->>Security: can_act()?
    alt denied
        Security-->>Tool: false
        Tool-->>Result: error (permission denied)
    else allowed
        Security-->>Tool: true
        Tool->>Security: record_action()
        Tool->>Tool: parse & validate parameters
        alt invalid
            Tool-->>Result: error (invalid input)
        else valid
            Tool->>Enigo: spawn_blocking { perform native action }
            Enigo-->>Tool: operation complete / errors
            Tool-->>Result: success or error
        end
    end
    Result-->>Client: ToolResult
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hopped to the keys and nudged the mouse light,
A toggle kept closed till the day turned right,
With careful checks and gentle paw,
I type and click—respecting the law,
A small rabbit cheers the new control in sight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding native mouse and keyboard control tools to the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
src/openhuman/tools/ops.rs (1)

157-162: Please lock down the new gate with a small registry test.

This branch is the only thing keeping dangerous native-input tools default-off, but this file doesn’t yet assert either the disabled or enabled case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/ops.rs` around lines 157 - 162, Add a small
unit/regression test that asserts the native-input gate behavior around
root_config.computer_control: verify that when computer_control.enabled is false
the resulting tools registry/vector does NOT contain MouseTool or KeyboardTool,
and when enabled is true the registry DOES contain MouseTool and KeyboardTool;
locate the code that constructs the tools vector (the block that checks
root_config.computer_control.enabled and pushes MouseTool::new and
KeyboardTool::new) and write tests that toggle that config, call the
tool-construction function, and assert presence/absence of those specific types
by inspecting the tools vector or registry.
src/openhuman/tools/impl/computer/keyboard.rs (1)

145-150: Add debug/trace coverage for blocked and failure paths.

Right now only the request/success path is observable. When can_act / record_action blocks, validation rejects, or enigo fails, there’s no structured trace explaining why the tool did nothing.

As per coding guidelines, src/**/*.rs: Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions, external calls, retries/timeouts, state transitions, and error handling paths.

Also applies to: 157-161, 182-196, 210-224, 254-283

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/impl/computer/keyboard.rs` around lines 145 - 150, Add
development-oriented trace/debug logs for the blocked/failure branches in
keyboard handling: when self.security.can_act() returns false and when
self.security.record_action() returns false, log a structured message (including
agent id/session/state if available), the reason ("autonomy read-only" or "rate
limit exceeded") and early-exit via ToolResult::error; similarly add trace/debug
entries around validation failures, enigo calls, and their error branches (the
call sites referenced around lines 157-161, 182-196, 210-224, 254-283) so each
entry/exit, branch decision, external enigo invocation, and error path emits
context-rich logs. Use the existing logging/tracing facility used in the crate
(e.g., trace!/debug!/warn! with key fields) and ensure logs include unique
identifiers so blocked actions are observable in traces.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/tools/impl/computer/keyboard.rs`:
- Around line 254-271: The current spawn_blocking block using Enigo::new and
iterating over keys stops on the first Press/Release error and can leave
modifier keys held; change it to track which keys were successfully pressed
(e.g., a Vec pressed_keys) as you iterate in the Press loop, and if a Press
fails immediately jump to a cleanup section that attempts to Release all
pressed_keys in reverse order (ignoring or logging any Release errors so cleanup
is best-effort). Also wrap the normal end-of-block Release loop in the same
best-effort pattern (release each key and log but do not return early on Release
errors). Use the existing symbols Enigo, keys, HOTKEY_INTER_KEY_DELAY and the
spawn_blocking closure to locate where to implement the pressed_keys tracking
and the guaranteed reverse-order release cleanup.
- Around line 94-97: The hotkey handler currently drops non-string entries and
doesn't enforce modifier-first plus one final non-modifier pattern; update the
hotkey validation to (1) reject any non-string entries up-front, (2) map strings
to Key values and fail if any mapping fails, (3) use is_modifier(&Key) to
require at least one modifier and exactly one trailing non-modifier with all
modifiers preceding it (e.g., modifiers... + final non-modifier), and (4) return
an error/false without sending any key events for malformed arrays. Locate the
hotkey parsing/firing logic near the code that currently filters or iterates
over the array (the block around where strings are converted to Key and keys are
fired) and add this validation before any calls that press/release keys so
malformed inputs like ["Ctrl", 1], ["Ctrl"], or ["a", "Ctrl"] are rejected.

In `@src/openhuman/tools/impl/computer/mouse.rs`:
- Around line 225-239: The drag closure in tokio::task::spawn_blocking calls
enigo.button(Direction::Press) then may return early on subsequent errors
leaving the button pressed; refactor so the press is followed by a best-effort
cleanup that always calls enigo.button(button, Direction::Release) on error or
panic. Concretely: after Enigo::new and the initial move_mouse and
button(...Press) call, perform the remaining operations inside a fallible block
(e.g., let result = (|| -> Result<_, _> { ... })();), and if result.is_err() or
a panic is caught, call enigo.button(button, Direction::Release) (ignoring its
errors) before propagating the original error; alternatively implement a small
RAII/guard type tied to the enigo instance that calls Release in Drop unless
explicitly disarmed. Ensure you reference the existing enigo variable, the
button press/release calls, and the move_mouse invocations when applying the
fix.
- Around line 28-34: parse_button currently returns Button and maps any unknown
string to Button::Left; change its signature to return Result<Button,
anyhow::Error> (or your crate's common error type) and return Err (e.g.,
anyhow::bail!("invalid mouse button: {}", value)) for unknown or non-string
"button" values instead of defaulting to Left, and update all call sites to use
parse_button(&args)? so the error propagates (references: parse_button and the
Button enum).
- Around line 253-255: The "scroll" match arm currently narrows scroll_x and
scroll_y using `as i32`, which can silently wrap out-of-range i64 values; update
the handling for `scroll_x` and `scroll_y` to validate the i64 range before
conversion (e.g., use i32::try_from(...) and handle the Err case or clamp values
to i32::MIN..=i32::MAX) and return or propagate an error when inputs are
invalid; locate the variables `scroll_x` and `scroll_y` in the "scroll" arm of
the match in mouse.rs and replace the `as i32` narrowing with a range-safe
conversion and proper error handling.

---

Nitpick comments:
In `@src/openhuman/tools/impl/computer/keyboard.rs`:
- Around line 145-150: Add development-oriented trace/debug logs for the
blocked/failure branches in keyboard handling: when self.security.can_act()
returns false and when self.security.record_action() returns false, log a
structured message (including agent id/session/state if available), the reason
("autonomy read-only" or "rate limit exceeded") and early-exit via
ToolResult::error; similarly add trace/debug entries around validation failures,
enigo calls, and their error branches (the call sites referenced around lines
157-161, 182-196, 210-224, 254-283) so each entry/exit, branch decision,
external enigo invocation, and error path emits context-rich logs. Use the
existing logging/tracing facility used in the crate (e.g., trace!/debug!/warn!
with key fields) and ensure logs include unique identifiers so blocked actions
are observable in traces.

In `@src/openhuman/tools/ops.rs`:
- Around line 157-162: Add a small unit/regression test that asserts the
native-input gate behavior around root_config.computer_control: verify that when
computer_control.enabled is false the resulting tools registry/vector does NOT
contain MouseTool or KeyboardTool, and when enabled is true the registry DOES
contain MouseTool and KeyboardTool; locate the code that constructs the tools
vector (the block that checks root_config.computer_control.enabled and pushes
MouseTool::new and KeyboardTool::new) and write tests that toggle that config,
call the tool-construction function, and assert presence/absence of those
specific types by inspecting the tools vector or registry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21ecf1b5-f16c-4381-a4d9-2e6513ea21ea

📥 Commits

Reviewing files that changed from the base of the PR and between ef9c117 and 3cd7180.

📒 Files selected for processing (8)
  • src/openhuman/config/schema/mod.rs
  • src/openhuman/config/schema/tools.rs
  • src/openhuman/config/schema/types.rs
  • src/openhuman/tools/impl/computer/keyboard.rs
  • src/openhuman/tools/impl/computer/mod.rs
  • src/openhuman/tools/impl/computer/mouse.rs
  • src/openhuman/tools/impl/mod.rs
  • src/openhuman/tools/ops.rs

… gate tests

- keyboard hotkey: track pressed keys and always release in reverse on
  error (prevents stuck modifiers); validate modifier-first pattern
  (reject ["a","Ctrl"], ["Ctrl"], ["Ctrl","Shift"])
- mouse drag: guarantee button release via best-effort cleanup after
  press, even when move_mouse fails
- mouse parse_button: return Result, reject unknown/non-string values
  instead of silently defaulting to left
- mouse scroll: use i32::try_from instead of silent `as i32` truncation
- Add debug logs on security block branches (can_act / record_action)
- Add ops.rs regression tests for computer_control enabled/disabled gate
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/openhuman/tools/impl/computer/mouse.rs (1)

155-161: Use debug/trace for routine action-success logs.

These success-path diagnostics are currently info!; switching to debug! aligns better with repo logging policy for development-oriented traces.

Suggested log-level adjustment
-                    info!(
+                    debug!(
...
-                    info!(
+                    debug!(
...
-                    info!(
+                    debug!(
...
-                    info!(
+                    debug!(
...
-                    info!(
+                    debug!(

As per coding guidelines: “In Rust, use log/tracing at debug or trace level for development-oriented diagnostics in Rust.”

Also applies to: 179-183, 206-210, 265-270, 316-322

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/impl/computer/mouse.rs` around lines 155 - 161, Change
the success-path tracing calls from info! to debug! for the mouse tool so
routine, development-oriented diagnostics are emitted at debug level;
specifically replace the info! invocation that logs tool="mouse", action="move",
x/x, y/y and the other similar info! blocks (the ones around the reported ranges
179-183, 206-210, 265-270, 316-322) with debug! while keeping the same
structured fields and message text, and ensure the tracing/log crate is imported
or available for debug! macros where these changes are made.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/tools/impl/computer/mouse.rs`:
- Around line 91-110: Update the JSON schema descriptions for the "x" and "y"
properties to include that they are required for the "drag" action as well as
move/click/double_click; locate the schema entries for "x" and "y" (keys "x",
"y" in the mouse action schema) and change their description text to mention
"Required for move, click, double_click, and drag" so runtime expectations for
the "drag" action (which uses x/y as end coordinates) match the schema.

---

Nitpick comments:
In `@src/openhuman/tools/impl/computer/mouse.rs`:
- Around line 155-161: Change the success-path tracing calls from info! to
debug! for the mouse tool so routine, development-oriented diagnostics are
emitted at debug level; specifically replace the info! invocation that logs
tool="mouse", action="move", x/x, y/y and the other similar info! blocks (the
ones around the reported ranges 179-183, 206-210, 265-270, 316-322) with debug!
while keeping the same structured fields and message text, and ensure the
tracing/log crate is imported or available for debug! macros where these changes
are made.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1fe9cc88-e5a3-4f2f-aa40-d99496bd1467

📥 Commits

Reviewing files that changed from the base of the PR and between 3cd7180 and 02202eb.

📒 Files selected for processing (3)
  • src/openhuman/tools/impl/computer/keyboard.rs
  • src/openhuman/tools/impl/computer/mouse.rs
  • src/openhuman/tools/ops.rs
✅ Files skipped from review due to trivial changes (1)
  • src/openhuman/tools/impl/computer/keyboard.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/tools/ops.rs

Comment on lines +91 to +110
"x": {
"type": "integer",
"description": "Target X coordinate (absolute screen pixels). Required for move, click, double_click."
},
"y": {
"type": "integer",
"description": "Target Y coordinate (absolute screen pixels). Required for move, click, double_click."
},
"button": {
"type": "string",
"enum": ["left", "right", "middle"],
"description": "Mouse button for click/double_click/drag. Default: left."
},
"start_x": {
"type": "integer",
"description": "Drag start X coordinate (absolute). Required for drag."
},
"start_y": {
"type": "integer",
"description": "Drag start Y coordinate (absolute). Required for drag."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Schema text under-documents drag end coordinates.

At runtime, drag requires end coordinates via x/y (Line 229), but the schema descriptions at Line 93 and Line 97 say those are only for move/click/double_click. This mismatch can cause avoidable tool-call errors.

Suggested schema text fix
- "description": "Target X coordinate (absolute screen pixels). Required for move, click, double_click."
+ "description": "Target X coordinate (absolute screen pixels). Required for move, click, double_click, and drag (end position)."

- "description": "Target Y coordinate (absolute screen pixels). Required for move, click, double_click."
+ "description": "Target Y coordinate (absolute screen pixels). Required for move, click, double_click, and drag (end position)."

Also applies to: 229-229

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/impl/computer/mouse.rs` around lines 91 - 110, Update the
JSON schema descriptions for the "x" and "y" properties to include that they are
required for the "drag" action as well as move/click/double_click; locate the
schema entries for "x" and "y" (keys "x", "y" in the mouse action schema) and
change their description text to mention "Required for move, click,
double_click, and drag" so runtime expectations for the "drag" action (which
uses x/y as end coordinates) match the schema.

@senamakel senamakel merged commit 0a09d72 into tinyhumansai:main Apr 12, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant