Skip to content

Releases: askui/python-sdk

v0.10.0

30 Jul 15:03

Choose a tag to compare

What's Changed

🚀 New Features

  • Google Gemini API Support: The askui model now uses gemini-2.5-flash as the default model, falling back to the original askui model (Inference API's VQA endpoint) if the Google GenAI API fails, e.g., because of missing support of schema or for unknown reason. For example, Google GenAI API does not support recursive schemas at the moment.
  • New Model Options: askui/gemini-2.5-flash and askui/gemini-2.5-pro are now supported as model choices.

🚨 Breaking Changes

  • Default Model Change: The askui default model for AgentBase.get() (and, therefore, VisionAgent.get() etc.) has changed, which may affect the behavior of existing implementations.

Full Changelog: v0.9.7...v0.10.0

v0.9.7

30 Jul 08:27

Choose a tag to compare

What's Changed

Rerelease of v0.9.6 due to a problem while releasing.

Full Changelog: v0.9.6...v0.9.7

v0.9.6

30 Jul 08:20
e0dce94

Choose a tag to compare

What's Changed

Rerelease of v0.9.5 due to a dependency problem.

Full Changelog: v0.9.5...v0.9.6

v0.9.5

29 Jul 16:10

Choose a tag to compare

What's Changed

🚀 New Features

  • Computer Tool support of "cursor_position" action": VisionAgent can now retrieve the current cursor position, e.g., to answer questions like Is the cursor currently hovering over the star icon button?

  • Multiple Display Support: Comprehensive multi-display functionality for computer vision agents

    • New Display Management Tools ⟶ e.g., VisionAgent searches now through all available displays if something cannot be found on one:
      • ListDisplaysTool: List all available displays with their properties
      • SetActiveDisplayTool: Set the active display for screenshots and actions
      • RetrieveActiveDisplayTool: Get information about the currently active display
  • AskUI Controller Path Configuration: Enhanced flexibility in controller setup

    • Direct Path Setting: New ASKUI_CONTROLLER_PATH environment variable for direct controller executable specification
    • Priority-Based Resolution: Controller path resolution with precedence: direct path > component registry > installation directory
    • Cross-Platform Support: Improved path resolution for Windows, macOS, and Linux
    • Better Error Handling: Clear error messages for missing or invalid controller paths
  • Modernized GRPC Controller Architecture: Major overhaul of the controller communication system

    • JSON Schema Integration: New JSON schema definitions for AgentOS-Send-Request-2501 and AgentOS-Send-Response-2501
    • Automated Code Generation:
      • New grpc:gen and json:gen PDM scripts for automated code generation
      • Generated Python classes from JSON schemas using datamodel-code-generator
      • Updated GRPC bindings with enhanced type safety
    • Enhanced Command Helpers: New command_helpers.py module with functions for:
      • Mouse position management (create_get_mouse_position_command, create_set_mouse_position_command)
      • Render object management (quad, line, image, text commands)
      • Styling system with create_style function supporting CSS-like properties
    • Improved Proto Definitions: Updated Controller_V1.proto with expanded command support

🔧 Improvements

  • Development Workflow Enhancements:
    • New Dev Dependency Group: Separated development dependencies (datamodel-code-generator, grpcio-tools) into dedicated group
    • Automated Generation Scripts: New scripts/grpc-gen.sh for GRPC code generation

🐞 Bug Fixes

  • Unicode Encoding: Fixed encoding in chat api persistence layer

🔄 Dependencies

  • Moved to Dev Dependencies:
    • grpcio-tools>=1.73.1: Moved from core to dev dependencies (was >=1.67.0)
    • datamodel-code-generator>=0.31.2: Added for automated code generation

Full Changelog: v0.9.4...v0.9.5

v0.9.4

22 Jul 10:00

Choose a tag to compare

What's Changed

🚀 New Features

  • Web Testing Agent: We've introduced the WebTestingAgent for doing simple exploratory testing. Given an url, it explores the features of a website or webapp and creates testing scenarios and executes them.
    • Main Limitations:
      • Features, scenarios and executions are currently not scoped to a particular url. So if you try to test multiple apps (across different chats/conversations) it may get confused.
      • It can go off rails, e.g., if it encounters a link to another website/webapp on the website/app it should test, it may also test the other one.
      • With growing number of features, scenarios, executions, it may get more and more confused, as it is currently not scalable.
      • It currently lacks focus in what to test so that it may sometimes test things that are not really important.
      • It shares the current issues of the WebVisionAgent.

🐞 Bug Fixes

  • Performance Optimization: Fixed slow typing performance in Playwright agent OS integration for better user experience because of incorrect units

🔧 Improvements

  • Python 3.13 Compatibility: Enhanced NOT_GIVEN implementation now works correctly as dataclass field defaults in Python 3.13
  • Configuration Management:
    • Extracted mypy configuration to separate mypy.ini file for better import handling and module-specific settings
    • Improved VS Code debugger configuration for chat API module path
  • Enhanced Utility Modules: New utility modules have been added:
    • api_utils: Streamlined API interaction utilities
    • datetime_utils: Enhanced datetime handling capabilities
    • id_utils: Improved ID generation and validation
    • not_given: Better handling of optional parameters with immutable NOT_GIVEN implementation

🔄 Dependencies

  • Added:
    • jsonref>=1.1.0 for JSON reference handling in testing tools

Full Changelog: v0.9.3...v0.9.4

v0.9.3

14 Jul 13:50

Choose a tag to compare

What's Changed

🐞 Bug Fixes

  • allow overriding betas flag with empty list
  • override it with empty list in AndroidVisionAgent so that it does not use computer beta flag (AskUI Inference API default)
  • fix serialization issues in telemetry module for AgentBase.act()

Full Changelog: v0.9.2...v0.9.3

v0.9.2

11 Jul 12:11

Choose a tag to compare

What's Changed

🐞 Bug Fixes

  • make askui token optional if ASKUI__AUTHORIZATION is set when using AskUI Inference API

Full Changelog: v0.9.1...v0.9.2

v0.9.1

11 Jul 11:54

Choose a tag to compare

What's Changed

🚀 New Features

  • AskUI Inference API: Configure authorization header using ASKUI__AUTHORIZATION env variable (take precedence over constructing authorization header from ASKUI_TOKEN)
  • Chat API: Allow to configure workspace id and AskUI Inference API authorization header from headers to enable client, e.g., https://hub.askui.com to set it

Other

  • Allow importing OnMessageCbParam from askui

Full Changelog: v0.9.0...v0.9.1

v0.9.0

10 Jul 13:09

Choose a tag to compare

What's Changed

🚀 New Features

  • Web Automation Support: We've introduced the WebVisionAgent for browser automation, powered by Playwright. This new agent allows you to automate tasks directly within web browsers. You can install the required dependencies using pip install askui[web].
  • Chat API is Now Part of the Package: The AskUI Chat API has been integrated into the askui package under the askui.chat module. You can now run it directly using python -m askui.chat.
  • Enhanced Agent Capabilities:
    • The act method across all agents now accepts tools and settings parameters, allowing for more fine-grained control over agent execution.
    • The AndroidVisionAgent can now leverage the Claude 4 model for its operations.
    • Agents now support new actions like scroll and wait for more complex interactions.

🔧 Improvements

  • Composable Agent Architecture: Agents have been significantly refactored for better composability and extensibility. A new base class, AgentBase, has been introduced, from which VisionAgent, AndroidVisionAgent, and the new WebVisionAgent inherit.
  • Refactored Settings Management: Settings of AskUI Inference API and Anthropic API have been refactored to be more consistent and easier to use and allow access to all API settings, e.g., you can now set the model to be used for the act method when using AskUI Inference API by export ASKUI__MESSAGES__MODEL=anthropic-claude-3-5-sonnet-20241022. Check inference_api.py or messages_api.py for more details.
  • Chat API Enhancements:
    • The Chat API now uses a consistent default port of 9261 for easier testing and setup.
    • The Chat UI has been moved and is now hosted on the [AskUI Hub](https://hub.askui.com/).
  • Refined Keyboard Tooling: The internal keyboard tooling has been improved to better support a wider range of keys and modifier combinations.

🚨 Breaking Changes

  • Optional Dependencies: Core dependencies have been made optional to provide a more lightweight installation. You now need to install extras based on your needs. For example:
    • For web automation: pip install askui[web]
    • For Android automation: pip install askui[android]
    • For using the Chat API: pip install askui[chat]
    • To install everything: pip install askui[all]
  • Removed APIs and Exceptions:
    • The unused exceptions AskUiApiError and AskUiApiRequestFailedError have been removed. Please use more specific exceptions.
    • Older methods of configuring Inference and Anthropic APIs via environment variables and settings classes have been removed in favor of the new Pydantic-based settings management.
  • ActModel.act Method Signature: The signature for ActModel.act has been extended with new tools and settings parameters. If you have implemented custom models, you will need to update your method signatures accordingly.
  • Configuration Changes: All environment variables for configuring the AskUI Inference API or Anthropic API have been replaced by more consistent environment variables except for the ANTHROPIC_API_KEY, ASKUI_TOKEN, ASKUI_WORKSPACE_ID and ASKUI_INFERENCE_ENDPOINT which are still supported.

📜 Documentation

  • The README.md has been significantly updated to reflect the new architecture, installation procedures, and features.
  • Updated installation instructions to use extras like pip install askui[chat].
  • Updated usage examples for running the chat API with python -m askui.chat.
  • Clarified that the Chat UI is now hosted on the [AskUI Hub](https://hub.askui.com/).

🔄 Dependencies

  • Added:
    • playwright>=1.41.0 for web automation support.
    • greenlet>=3.1.1 and pyee<14,>=13 as dependencies for playwright.
  • Restructured:
    • Dependencies are now managed in optional groups (android, chat, mcp, pynput, web) to reduce the size of the default installation. You must now install the extras you need.
  • Removed from Core Dependencies:
    • fastmcp, mcp, openapi-pydantic, python-multipart, and typer have been moved to the [mcp] optional dependency group.
    • httpx-sse and sse-starlette were removed as they are no longer needed.

🧪 Experimental

  • AskUI Chat: The AskUI Chat feature remains in an experimental stage. We welcome your feedback as we continue to improve its functionality and user experience.

Full Changelog: v0.8.0...v0.9.0

v0.8.0

26 Jun 13:56

Choose a tag to compare

What's Changed

🚀 Features

  • add support for clicking/focusing and clearing to VisionAgent.type() so that consumers don't have to use custom solutions
  • add support for Claude 4 + thinking (new computer tool only supported partially for now)

🐞 Bug Fixes

  • Fix that exceptions were hidden by serialization exception from telemetry module by fixing serialization of (exception) classes

🚨 Breaking Changes

  • default model used for VisionAgent.act() changed which may make these calls behave differently from before

Full Changelog: v0.7.0...v0.8.0