Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
1c3910b
Update README.md
Lumpin-askui Nov 18, 2025
8687152
web-agent
Lumpin-askui Nov 29, 2025
ba0ff52
Rework demo to test macOS Calculator (20-10=10)
Lumpin-askui Feb 8, 2026
2ad68e8
Add example reports, enable caching, update README
Lumpin-askui Feb 8, 2026
f9bb0db
Update requirements.txt
Lumpin-askui Feb 10, 2026
2b9652c
Update act.py
Lumpin-askui Feb 16, 2026
1e91bef
Update act.py
Lumpin-askui Feb 16, 2026
1cb8bde
Update calculator_test.csv
Lumpin-askui Feb 16, 2026
5d9f4c5
refactor: update to new repo structure
philipph-askui Mar 10, 2026
c7bc67d
Merge pull request #2 from askui/chore/new_repo_structure
philipph-askui Mar 10, 2026
7fb5abb
Update README.md
philipph-askui Mar 10, 2026
365b47b
Update README.md
philipph-askui Mar 10, 2026
585e0c5
Update README.md
philipph-askui Mar 10, 2026
9027e65
Update README.md
philipph-askui Mar 10, 2026
7380642
Update README.md
philipph-askui Mar 10, 2026
4e02ace
Update README.md
philipph-askui Mar 10, 2026
7dd3dd2
Update README.md
philipph-askui Mar 10, 2026
24117dd
Update README.md
philipph-askui Mar 10, 2026
e635627
Update README.md
philipph-askui Mar 10, 2026
8f1b4eb
fix: add missing env template
philipph-askui Mar 12, 2026
4a8ee83
add .env to gitignore
philipph-askui Mar 12, 2026
dc0b94c
chore: update requirements.txt to latest askui version
philipph-askui Mar 12, 2026
2e815c4
Update requirements.txt
Lumpin-askui Mar 23, 2026
28900dc
Update README.md
Lumpin-askui Mar 24, 2026
824b838
Merge branch 'main' into chore/new_repo_structure
programminx-askui Mar 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added .env.template
Empty file.
2 changes: 0 additions & 2 deletions .gitattributes

This file was deleted.

10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.DS_Store
.pdm-python

.venv
.mypy_cache
.ruff_cache
__pycache__

.askui_cache
.env
304 changes: 223 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,112 +1,254 @@
## Description
# AskUI Demo Project

This is a proof of concept demonstrating the use of **AskUI** to automate desktop devices.
As an example, we tested a grocery cashier game and sent a quote request to Bahn Business.
The goal is to showcase AskUI's capabilities for desktop automation (Android and Web are supported too).
A task-driven automation framework built on AskUI Agent that reads tasks from the `tasks/` folder, performs UI interactions, and generates per-task reports with screenshots in a timestamped workspace. Tasks are organized in a hierarchical folder structure with support for rules, setup, and teardown.

## What is AskUI Agent?
## Overview

AskUI Agent acts as a "brain" that follows instructions and uses a provided set of tools to perform tasks.
This project automates UI tasks defined in text-based files under the `tasks/` directory. The AskUI Agent:

AskUI provides a default toolset for each agent type. For example:
- Reads tasks from the **Task Folder** (`tasks/`) — supports `.txt`, `.md`, `.csv`, `.json`, and `.pdf`
- Supports **hierarchical task organization** with rules, setup, and teardown per folder
- Executes each task step-by-step via UI automation
- Writes a summary report per task (what was done, result, issues, conclusion)
- Saves screenshots of system interactions and includes them in reports
- Writes all outputs into a timestamped **Agent Workspace** directory
- Supports **custom tools** via the `helpers/` module
- Supports **caching** for repeated task runs

* **Default Computer Agent toolset**:
## Project Structure

* Full control of mouse and keyboard
* Take and analyze screenshots
* Uses the default display but can list and switch displays
```
solution-delivery-template/
├── main.py # Entry point - hierarchical folder runner
├── system_prompt.py # System prompt builder (reads from prompts/)
├── requirements.txt # Python dependencies
├── ruff.toml # Linting/formatting configuration
├── .vscode/settings.json # Editor & AskUI Shell terminal profile
├── helpers/ # Custom tools and utilities
│ ├── __init__.py
│ ├── get_tools.py # Tool factory function
│ └── tools/
│ ├── __init__.py
│ └── greeting_tool.py # Example custom tool
├── prompts/ # Prompt parts for the system prompt (MD files)
│ ├── system_capabilities.md # Agent capabilities description
│ ├── device_information.md # Desktop device context
│ └── report_format.md # Report formatting guidelines
├── tasks/ # Task definitions (hierarchical)
│ └── demo/
│ ├── rules.md # Rules for this task group
│ ├── calculator.csv # CSV test case
│ ├── clock_demo.txt # Text task
│ ├── notepad_hello.md # Markdown task
│ └── webbrowser.json # JSON task
├── agent_workspace/ # Generated per run (timestamped)
├── .gitignore
└── README.md # This file
```

## Task Hierarchy

Tasks are organized in folders under `tasks/`. Each folder can contain:

The toolset can be expanded with custom tools.
For this POC, we added custom tools in [tools.py](./helpers/tools.py).
| File | Purpose |
|------|---------|
| `rules.(md\|txt\|csv\|json\|pdf)` | Context/rules injected as system prompt for all tasks in folder |
| `setup.(md\|txt\|csv\|json\|pdf)` | Executed before tasks in folder |
| `teardown.(md\|txt\|csv\|json\|pdf)` | Executed after all tasks in folder |
| `*.csv`, `*.md`, `*.txt`, `*.json`, `*.pdf` | Task files (executed in sorted order) |
| Subdirectories | Subgroups that inherit parent rules |

* Example tools: **FileWriteTool** and **FileReadTool** allow the agent to read a CSV and write a `.md` report.
Rules cascade from parent to child folders, so subgroups inherit their parent's context.

**ℹ️ Remark:** Tools can be anything: sending emails, making API calls, interacting with devices, etc.
### Example: setup.md

## Requirements
A setup file runs before any tasks in the folder. Use it to prepare the environment:

* [AskUI Suite Installed](https://docs.askui.com/01-tutorials/00-installation)
* Chrome browser
```markdown
## Setup Steps

## Example Agent Output
1. Open the Settings application.
2. Navigate to the "Network" section.
3. Ensure WiFi is enabled before proceeding with tests.
```

All outputs are saved under [reports](./reports).
### Example: teardown.md

### Cashier Game
A teardown file runs after all tasks in the folder complete. Use it to clean up:

Prompt used to play the game:
```markdown
## Teardown Steps

```python
desktop_agent.act(
goal="""
Open "https://www.mortgagecalculator.org/money-games/grocery-cashier/"
Play the first level of the game.
Save a screenshot after each interaction.
Write a detailed report of the gameplay.
Use only the mouse for interaction.
Use tools in parallel to speed up gameplay.
""",
tools=custom_tools,
)
1. Close the Settings application.
2. Return to the home screen.
3. Clear any temporary files created during testing.
```

After execution, the agent generated this
[Execution Report.](./reports/20250923_grocery_cashier_game/grocery_cashier_game_test_report.md)

### Automating GK Software Website to make a contact request.

The test plan is defined in a [CSV file](./csv_files/demo.csv).

Code snippet used:

```python
desktop_agent.act(
goal="""
You are an AI UI Automation Engineer created with AskUI Agent.
You are running in a controlled test environment with full control of the computer, browser, and UI.
You can analyze and solve image-based questions as part of test execution.
Analyze the icons and click on the correct one based on the question.
Execute the test case from the CSV file step by step.
After each interaction, capture and save a screenshot.
Write a detailed report of every interaction, including visual findings if images are involved.
Use tools in parallel to optimize and speed up execution.
Do Not raise any Exception and do not ask the user for any input.
""",
tools=[
FileReadTool(absolute_csv_file_path),
] + custom_tools,
)
## Prerequisites

Before you begin, ensure you have:

- **AskUI Shell** installed on your system
- Python 3.12 or higher
- Access to the AskUI platform with valid credentials

### Installing AskUI Shell

If you haven't already, install AskUI Shell following the [official installation guide](https://docs.askui.com/).

## Installation

### Step 1: Open AskUI Shell

Launch the AskUI Shell environment:

```bash
askui-shell
```

After execution, the agent generated this [Execution Report.](./reports/20250923_submit_contact_request_via_GKSoftware_website/TC001_Test_Report.md)
### Step 2: Configure AskUI Credentials (First Time Only)

## Setup Steps
1. **Create an Access Token**
Follow the [Access Token Guide](https://docs.askui.com/02-how-to-guides/01-account-management/04-tokens#create-access-token).

2. **Set Up Your Credentials**
Follow the [Credentials Setup Guide](https://docs.askui.com/04-reference/02-askui-suite/02-askui-suite/ADE/Public/AskUI-SetSettings#askui-setsettings).

### Step 3: Set Up Python Environment

Activate the virtual environment (run this each time you start a new terminal):

```powershell
AskUI-EnablePythonEnvironment -name 'AskUI-POC' -CreateIfNotExists
```

### Step 4: Install Dependencies

Install required packages (only needed the first time or when `requirements.txt` is updated):

```powershell
pip install -r requirements.txt
```

### Step 5: Configure Environment Variables

```bash
cp .env.template .env
# Edit .env file with your API keys
```

## Configuration

Key paths are defined in `main.py`:

- **`TASK_FOLDER`** (`tasks/`): Folder containing task files the agent reads and executes.
- **`AGENT_WORKSPACE`** (`agent_workspace/YYYY-MM-DD_HH-MM-SS/`): Where the agent can write reports and screenshots (timestamped per run).

You can customize the system prompt by editing the markdown files in `prompts/`:
- `system_capabilities.md` — Agent capabilities and behavior rules
- `device_information.md` — Information about the device being controlled
- `report_format.md` — Report formatting guidelines

## Usage

### Running Tasks

```bash
# Run all tasks from the default tasks/ folder
python main.py

# Run tasks from a specific subfolder
python main.py tasks/demo

# Run a single task file (with setup/teardown from its folder hierarchy)
python main.py tasks/demo/calculator.csv

# Custom caching options
python main.py tasks/demo --cache-strategy auto --cache-dir .askui_cache
```

### Output Structure

Each run creates a new workspace directory:

```
agent_workspace/YYYY-MM-DD_HH-MM-SS/
├── <task_name>/
│ ├── <task_name>_report.md
│ └── <task_name>_screenshot.png
└── ... (HTML report artifacts from SimpleHtmlReporter)
```

### Adding New Tasks

1. Create a new folder under `tasks/` for your task group
2. Add a `rules.md` with context and rules for the group
3. Optionally add `setup` and `teardown` files
4. Add task files (CSV, Markdown, etc.) — they execute in sorted order

### Adding Custom Tools

1. Create new tool classes in `helpers/tools/`
2. Inherit from `askui.models.shared.tools.Tool`
3. Register tools in `helpers/get_tools.py`

See `helpers/tools/greeting_tool.py` for an example.

## Task Formats

Tasks can be provided in several formats. The agent reads files from `tasks/` and interprets them as tasks to execute.

### Plain text (`.txt`)

Short step-by-step instructions, e.g. open an app, read and report information, include a screenshot.

### Markdown (`.md`)

Structured task with objective, steps, and deliverables.

### CSV

Table format with test case ID, name, preconditions, step number, step description, and expected result — suitable for test-case style automation.

**Example columns:** `Test case ID`, `Test case name`, `Precondition`, `Step number`, `Step description`, `Expected result`

### JSON

Structured task with `id`, `name`, `description`, `precondition`, `steps` (array of `number`, `action`, `expectedResult`), and optional `deliverables`.

### PDF

PDF files are supported as task references. The agent will note the PDF path for processing.

1. **Open AskUI Shell**
## Agent Tools

```bash
askui-shell
```
The **AskUI Agent** comes with built-in computer tools for UI automation, including:

2. **Configure AskUI Credentials** (first-time setup only)
- Mouse control (move, click, press, drag)
- Keyboard input (typing, key presses)
- Taking screenshots
- Other desktop interaction capabilities

1. Create an access token: [Access Token Guide](https://docs.askui.com/02-how-to-guides/01-account-management/04-tokens)
2. Set up credentials: [Credentials Setup Guide](https://docs.askui.com/04-reference/02-askui-suite/02-askui-suite/ADE/Public/AskUI-SetSettings)
**In addition**, this project adds the following tools:

3. **Enable Python Environment**
- **ReadFromFileTool** (base: Task Folder): Read task file contents
- **ListFilesTool** (Task Folder & Agent Workspace): List files in those directories
- **WriteToFileTool** (base: Agent Workspace): Write reports and other files
- **ComputerSaveScreenshotTool** (base: Agent Workspace): Capture and save screenshots to disk
- **PrintToConsoleTool**: Print messages to the console
- **Window management tools**: Virtual display, process/window listing, focus control
- **Custom tools**: Registered via `helpers/get_tools.py` (e.g., GreetingTool)

```bash
AskUI-EnablePythonEnvironment -name 'AskUIDemo' -CreateIfNotExists
```
Reporting is enhanced by **SimpleHtmlReporter**, which writes HTML reports into the agent workspace.

4. **Install Dependencies** (re-run if `requirements.txt` changes)
## Available VLMs
The project supports multiple Claude models:

```bash
pip install -r requirements.txt
```
claude-opus-4-6
claude-sonnet-4-6 (default)
claude-haiku-4-5-20251001
claude-sonnet-4-5-20250929

5. **Run the Agent**
## License

```bash
python ./act.py
```
This project is provided as an AskUI solution delivery template.
Loading