diff --git a/README.md b/README.md index 63e405b..0e0d16b 100644 --- a/README.md +++ b/README.md @@ -16,13 +16,21 @@ ## Star Overview -**Computer Use OOTB**Star is an out-of-the-box (OOTB) solution for Desktop GUI Agent, including API-based (**Claude 3.5 Computer Use**) and locally-running models (**ShowUI**, **UI-TARS**). +**Computer Use OOTB**Star is an out-of-the-box (OOTB) solution for Desktop GUI Agent, including API-based (**Claude 3.5 Computer Use**, **OpenRouter**) and locally-running models (**ShowUI**, **UI-TARS**). **No Docker** is required, and it supports both **Windows** and **macOS**. OOTB provides a user-friendly interface based on Gradio.๐ŸŽจ +### โšก **Key Optimizations & Features** +- ๐Ÿ”„ **Smart Model Routing**: Automatically select optimal models via OpenRouter +- ๐Ÿ’ฐ **Cost Optimization**: Reduced token costs with intelligent model selection +- ๐Ÿš€ **Enhanced Performance**: Improved inference speed with 4-bit quantization +- ๐Ÿ“Š **Multi-Provider Support**: Seamless switching between OpenAI, Anthropic, Qwen, and OpenRouter +- ๐Ÿ› ๏ธ **Flexible Architecture**: Unified & modular planner-actor configurations + Visit our study on GUI Agent of Claude 3.5 Computer Use [[project page]](https://computer-use-ootb.github.io). ๐ŸŒ ## Update +- **[2025/01/22]** ๐Ÿš€ **OpenRouter Integration** & **Performance Optimizations** are now live! Access 100+ AI models through a single API with [**OpenRouter**](https://openrouter.ai) - including GPT-4o, Claude, Qwen-VL, and more. Enjoy **cost-efficient routing**, **automatic failover**, and **competitive pricing** ๐Ÿ’ฐ! - **[2025/02/08]** We've added the support for [**UI-TARS**](https://github.com/bytedance/UI-TARS). Follow [Cloud Deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#cloud-deployment) or [VLLM deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#local-deployment-vllm) to implement UI-TARS and run it locally in OOTB. - **Major Update! [2024/12/04]** **Local Run๐Ÿ”ฅ** is now live! Say hello to [**ShowUI**](https://github.com/showlab/ShowUI), an open-source 2B vision-language-action (VLA) model for GUI Agent. Now compatible with `"gpt-4o + ShowUI" (~200x cheaper)`* & `"Qwen2-VL + ShowUI" (~30x cheaper)`* for only few cents for each task๐Ÿ’ฐ! *compared to Claude Computer Use. - **[2024/11/20]** We've added some examples to help you get hands-on experience with Claude 3.5 Computer Use. @@ -87,7 +95,36 @@ pip install -r requirements.txt 2. Test your UI-TARS sever with the script `.\install_tools\test_ui-tars_server.py`. -### 2.4 (Optional) If you want to deploy Qwen model as planner on ssh server +### 2.4 (Optional) Get Prepared for **OpenRouter** Integration ๐ŸŒ + +[OpenRouter](https://openrouter.ai) provides unified access to 100+ AI models through a single API, offering cost-efficient routing and competitive pricing. + +**Benefits:** +- ๐Ÿ”„ **Automatic failover** between models +- ๐Ÿ’ฐ **Cost optimization** with smart routing +- ๐Ÿš€ **100+ models** including GPT-4o, Claude, Gemini, and more +- ๐Ÿ“Š **Transparent pricing** and usage analytics + +**Setup:** +1. Sign up at [OpenRouter](https://openrouter.ai/) +2. Get your API key from the [Keys page](https://openrouter.ai/keys) +3. Set your environment variable: + ```bash + # Windows PowerShell + $env:OPENROUTER_API_KEY="sk-or-xxxxx" + + # macOS/Linux + export OPENROUTER_API_KEY="sk-or-xxxxx" + ``` + +**Popular Models Available:** +- `openrouter/auto` - Automatically route to the best available model +- GPT-4o, GPT-4o-mini +- Claude 3.5 Sonnet, Claude 3 Haiku +- Gemini Pro, PaLM 2 +- And many more... + +### 2.5 (Optional) If you want to deploy Qwen model as planner on ssh server 1. git clone this project on your ssh server 2. python computer_use_demo/remote_inference.py @@ -104,13 +141,14 @@ If you successfully start the interface, you will see two URLs in the terminal: ``` -> For convenience, we recommend running one or more of the following command to set API keys to the environment variables before starting the interface. Then you donโ€™t need to manually pass the keys each run. On Windows Powershell (via the `set` command if on cmd): +> For convenience, we recommend running one or more of the following command to set API keys to the environment variables before starting the interface. Then you don't need to manually pass the keys each run. On Windows Powershell (via the `set` command if on cmd): > ```bash > $env:ANTHROPIC_API_KEY="sk-xxxxx" (Replace with your own key) > $env:QWEN_API_KEY="sk-xxxxx" > $env:OPENAI_API_KEY="sk-xxxxx" +> $env:OPENROUTER_API_KEY="sk-xxxxx" # For OpenRouter integration > ``` -> On macOS/Linux, replace `$env:ANTHROPIC_API_KEY` with `export ANTHROPIC_API_KEY` in the above command. +> On macOS/Linux, replace `$env:ANTHROPIC_API_KEY` with `export ANTHROPIC_API_KEY` in the above command. ### 4. Control Your Computer with Any Device can Access the Internet @@ -173,6 +211,7 @@ Now, OOTB supports customizing the GUI Agent via the following models: