diff --git a/README.md b/README.md
index 63e405b..0e0d16b 100644
--- a/README.md
+++ b/README.md
@@ -16,13 +16,21 @@
##
Overview
-**Computer Use OOTB**
is an out-of-the-box (OOTB) solution for Desktop GUI Agent, including API-based (**Claude 3.5 Computer Use**) and locally-running models (**ShowUI**, **UI-TARS**).
+**Computer Use OOTB**
is an out-of-the-box (OOTB) solution for Desktop GUI Agent, including API-based (**Claude 3.5 Computer Use**, **OpenRouter**) and locally-running models (**ShowUI**, **UI-TARS**).
**No Docker** is required, and it supports both **Windows** and **macOS**. OOTB provides a user-friendly interface based on Gradio.๐จ
+### โก **Key Optimizations & Features**
+- ๐ **Smart Model Routing**: Automatically select optimal models via OpenRouter
+- ๐ฐ **Cost Optimization**: Reduced token costs with intelligent model selection
+- ๐ **Enhanced Performance**: Improved inference speed with 4-bit quantization
+- ๐ **Multi-Provider Support**: Seamless switching between OpenAI, Anthropic, Qwen, and OpenRouter
+- ๐ ๏ธ **Flexible Architecture**: Unified & modular planner-actor configurations
+
Visit our study on GUI Agent of Claude 3.5 Computer Use [[project page]](https://computer-use-ootb.github.io). ๐
## Update
+- **[2025/01/22]** ๐ **OpenRouter Integration** & **Performance Optimizations** are now live! Access 100+ AI models through a single API with [**OpenRouter**](https://openrouter.ai) - including GPT-4o, Claude, Qwen-VL, and more. Enjoy **cost-efficient routing**, **automatic failover**, and **competitive pricing** ๐ฐ!
- **[2025/02/08]** We've added the support for [**UI-TARS**](https://github.com/bytedance/UI-TARS). Follow [Cloud Deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#cloud-deployment) or [VLLM deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#local-deployment-vllm) to implement UI-TARS and run it locally in OOTB.
- **Major Update! [2024/12/04]** **Local Run๐ฅ** is now live! Say hello to [**ShowUI**](https://github.com/showlab/ShowUI), an open-source 2B vision-language-action (VLA) model for GUI Agent. Now compatible with `"gpt-4o + ShowUI" (~200x cheaper)`* & `"Qwen2-VL + ShowUI" (~30x cheaper)`* for only few cents for each task๐ฐ! *compared to Claude Computer Use.
- **[2024/11/20]** We've added some examples to help you get hands-on experience with Claude 3.5 Computer Use.
@@ -87,7 +95,36 @@ pip install -r requirements.txt
2. Test your UI-TARS sever with the script `.\install_tools\test_ui-tars_server.py`.
-### 2.4 (Optional) If you want to deploy Qwen model as planner on ssh server
+### 2.4 (Optional) Get Prepared for **OpenRouter** Integration ๐
+
+[OpenRouter](https://openrouter.ai) provides unified access to 100+ AI models through a single API, offering cost-efficient routing and competitive pricing.
+
+**Benefits:**
+- ๐ **Automatic failover** between models
+- ๐ฐ **Cost optimization** with smart routing
+- ๐ **100+ models** including GPT-4o, Claude, Gemini, and more
+- ๐ **Transparent pricing** and usage analytics
+
+**Setup:**
+1. Sign up at [OpenRouter](https://openrouter.ai/)
+2. Get your API key from the [Keys page](https://openrouter.ai/keys)
+3. Set your environment variable:
+ ```bash
+ # Windows PowerShell
+ $env:OPENROUTER_API_KEY="sk-or-xxxxx"
+
+ # macOS/Linux
+ export OPENROUTER_API_KEY="sk-or-xxxxx"
+ ```
+
+**Popular Models Available:**
+- `openrouter/auto` - Automatically route to the best available model
+- GPT-4o, GPT-4o-mini
+- Claude 3.5 Sonnet, Claude 3 Haiku
+- Gemini Pro, PaLM 2
+- And many more...
+
+### 2.5 (Optional) If you want to deploy Qwen model as planner on ssh server
1. git clone this project on your ssh server
2. python computer_use_demo/remote_inference.py
@@ -104,13 +141,14 @@ If you successfully start the interface, you will see two URLs in the terminal:
```
-> For convenience, we recommend running one or more of the following command to set API keys to the environment variables before starting the interface. Then you donโt need to manually pass the keys each run. On Windows Powershell (via the `set` command if on cmd):
+> For convenience, we recommend running one or more of the following command to set API keys to the environment variables before starting the interface. Then you don't need to manually pass the keys each run. On Windows Powershell (via the `set` command if on cmd):
> ```bash
> $env:ANTHROPIC_API_KEY="sk-xxxxx" (Replace with your own key)
> $env:QWEN_API_KEY="sk-xxxxx"
> $env:OPENAI_API_KEY="sk-xxxxx"
+> $env:OPENROUTER_API_KEY="sk-xxxxx" # For OpenRouter integration
> ```
-> On macOS/Linux, replace `$env:ANTHROPIC_API_KEY` with `export ANTHROPIC_API_KEY` in the above command.
+> On macOS/Linux, replace `$env:ANTHROPIC_API_KEY` with `export ANTHROPIC_API_KEY` in the above command.
### 4. Control Your Computer with Any Device can Access the Internet
@@ -173,6 +211,7 @@ Now, OOTB supports customizing the GUI Agent via the following models:
- GPT-4o
- Qwen2-VL-Max
+ - OpenRouter (100+ models)
- Qwen2-VL-2B(ssh)
- Qwen2-VL-7B(ssh)
- Qwen2.5-VL-7B(ssh)
diff --git a/app.py b/app.py
index 635c78f..3a4ff4a 100644
--- a/app.py
+++ b/app.py
@@ -61,6 +61,8 @@ def setup_state(state):
state["anthropic_api_key"] = os.getenv("ANTHROPIC_API_KEY", "")
if "qwen_api_key" not in state:
state["qwen_api_key"] = os.getenv("QWEN_API_KEY", "")
+ if "openrouter_api_key" not in state:
+ state["openrouter_api_key"] = os.getenv("OPENROUTER_API_KEY", "")
if "ui_tars_url" not in state:
state["ui_tars_url"] = ""
@@ -72,6 +74,8 @@ def setup_state(state):
state["planner_api_key"] = state["anthropic_api_key"]
elif state["planner_provider"] == "qwen":
state["planner_api_key"] = state["qwen_api_key"]
+ elif state["planner_provider"] == "openrouter":
+ state["planner_api_key"] = state["openrouter_api_key"]
else:
state["planner_api_key"] = ""
@@ -278,7 +282,7 @@ def process_input(user_input, state):
label="API Provider",
choices=[option.value for option in APIProvider],
value="openai",
- interactive=False,
+ interactive=True,
)
with gr.Column():
planner_api_key = gr.Textbox(
@@ -393,9 +397,9 @@ def update_planner_model(model_selection, state):
logger.info(f"Model updated to: {state['planner_model']}")
if model_selection == "qwen2-vl-max":
- provider_choices = ["qwen"]
+ provider_choices = ["qwen", "openrouter"]
provider_value = "qwen"
- provider_interactive = False
+ provider_interactive = True
api_key_interactive = True
api_key_placeholder = "qwen API key"
actor_model_choices = ["ShowUI", "UI-TARS"]
@@ -432,10 +436,10 @@ def update_planner_model(model_selection, state):
state["api_key"] = ""
elif model_selection == "gpt-4o" or model_selection == "gpt-4o-mini":
- # Set provider to "openai", make it unchangeable
- provider_choices = ["openai"]
+ # Allow OpenAI or OpenRouter as provider
+ provider_choices = ["openai", "openrouter"]
provider_value = "openai"
- provider_interactive = False
+ provider_interactive = True
api_key_interactive = True
api_key_type = "password" # Display API key in password form
@@ -470,6 +474,8 @@ def update_planner_model(model_selection, state):
state["api_key"] = state.get("anthropic_api_key", "")
elif provider_value == "qwen":
state["api_key"] = state.get("qwen_api_key", "")
+ elif provider_value == "openrouter":
+ state["api_key"] = state.get("openrouter_api_key", "")
elif provider_value == "local":
state["api_key"] = ""
# SSH็ๆ
ๅตๅทฒ็ปๅจไธ้ขๅค็่ฟไบ๏ผ่ฟ้ไธ้่ฆ้ๅคๅค็
@@ -502,19 +508,44 @@ def update_actor_model(actor_model_selection, state):
logger.info(f"Actor model updated to: {state['actor_model']}")
def update_api_key_placeholder(provider_value, model_selection):
+ # Persist provider selection into state for use in sampling loop
+ state.value["planner_provider"] = provider_value
+ # Choose placeholder and value based on provider/model
if model_selection == "claude-3-5-sonnet-20241022":
if provider_value == "anthropic":
- return gr.update(placeholder="anthropic API key")
+ placeholder = "anthropic API key"
+ value = state.value.get("anthropic_api_key", "")
elif provider_value == "bedrock":
- return gr.update(placeholder="bedrock API key")
+ placeholder = "bedrock API key"
+ value = "" # credentials via environment
elif provider_value == "vertex":
- return gr.update(placeholder="vertex API key")
+ placeholder = "vertex API key"
+ value = "" # credentials via environment
else:
- return gr.update(placeholder="")
- elif model_selection == "gpt-4o + ShowUI":
- return gr.update(placeholder="openai API key")
+ placeholder = ""
+ value = ""
else:
- return gr.update(placeholder="")
+ if provider_value == "openai":
+ placeholder = "openai API key"
+ value = state.value.get("openai_api_key", "")
+ elif provider_value == "openrouter":
+ placeholder = "openrouter API key"
+ value = state.value.get("openrouter_api_key", "")
+ elif provider_value == "qwen":
+ placeholder = "qwen API key"
+ value = state.value.get("qwen_api_key", "")
+ elif provider_value == "ssh":
+ placeholder = "ssh host and port (e.g. localhost:8000)"
+ value = state.value.get("planner_api_key", "")
+ elif provider_value == "local":
+ placeholder = "not required"
+ value = ""
+ else:
+ placeholder = ""
+ value = ""
+ # Update state mirrored key used by loop
+ state.value["planner_api_key"] = value
+ return gr.update(placeholder=placeholder, value=value, type="password", interactive=True)
def update_system_prompt_suffix(system_prompt_suffix, state):
state["custom_system_prompt"] = system_prompt_suffix
diff --git a/computer_use_demo/gui_agent/llm_utils/oai.py b/computer_use_demo/gui_agent/llm_utils/oai.py
index a7359ff..3b426de 100644
--- a/computer_use_demo/gui_agent/llm_utils/oai.py
+++ b/computer_use_demo/gui_agent/llm_utils/oai.py
@@ -1,3 +1,71 @@
+def run_openrouter_interleaved(messages: list, system: str, llm: str, api_key: str, max_tokens=256, temperature=0):
+
+ api_key = api_key or os.environ.get("OPENROUTER_API_KEY")
+ if not api_key:
+ raise ValueError("OPENROUTER_API_KEY is not set")
+
+ headers = {"Content-Type": "application/json",
+ "Authorization": f"Bearer {api_key}"}
+
+ final_messages = [{"role": "system", "content": system}]
+
+ if type(messages) == list:
+ for item in messages:
+ print(f"item: {item}")
+ contents = []
+ if isinstance(item, dict):
+ for cnt in item["content"]:
+ if isinstance(cnt, str):
+ if is_image_path(cnt):
+ base64_image = encode_image(cnt)
+ content = {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
+ else:
+ content = {"type": "text", "text": cnt}
+
+ contents.append(content)
+ message = {"role": item["role"], "content": contents}
+
+ elif isinstance(item, str):
+ if is_image_path(item):
+ base64_image = encode_image(item)
+ contents.append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}})
+ message = {"role": "user", "content": contents}
+ else:
+ contents.append({"type": "text", "text": item})
+ message = {"role": "user", "content": contents}
+
+ else: # str
+ contents.append({"type": "text", "text": item})
+ message = {"role": "user", "content": contents}
+
+ final_messages.append(message)
+
+
+ elif isinstance(messages, str):
+ final_messages.append({"role": "user", "content": messages})
+
+ print("[openrouter] sending messages:", [f"{k}: {v}, {k}" for k, v in final_messages])
+
+ payload = {
+ "model": llm,
+ "messages": final_messages,
+ "max_tokens": max_tokens,
+ "temperature": temperature,
+ }
+
+ response = requests.post(
+ "https://openrouter.ai/api/v1/chat/completions", headers=headers, json=payload
+ )
+
+ try:
+ text = response.json()['choices'][0]['message']['content']
+ token_usage = int(response.json()['usage']['total_tokens'])
+ return text, token_usage
+
+ except Exception as e:
+ print(f"Error in interleaved openAI: {e}. This may due to your invalid OPENROUTER_API_KEY. Please check the response: {response.json()} ")
+ return response.json()
+
import os
import logging
import base64
@@ -214,17 +282,16 @@ def encode_image(image_path: str, max_size=1024) -> str:
# temperature=0)
# print(text, token_usage)
- text, token_usage = run_ssh_llm_interleaved(
- messages= [{"content": [
- "What is in the screenshot?",
- "tmp/outputs/screenshot_5a26d36c59e84272ab58c1b34493d40d.png"],
- "role": "user"
- }],
- llm="Qwen2.5-VL-7B-Instruct",
- ssh_host="10.245.92.68",
- ssh_port=9192,
+ text, token_usage = run_openrouter_interleaved(
+ messages=[{"content": [
+ "What is in the screenshot?",
+ "tmp/outputs/screenshot_5a26d36c59e84272ab58c1b34493d40d.png"],
+ "role": "user"
+ }],
+ llm="openrouter/auto",
+ system="You are a helpful assistant",
+ api_key=api_key,
max_tokens=256,
- temperature=0.7
- )
+ temperature=0)
+
print(text, token_usage)
- # There is an introduction describing the Calyx... 36986
diff --git a/computer_use_demo/gui_agent/planner/api_vlm_planner.py b/computer_use_demo/gui_agent/planner/api_vlm_planner.py
index 7ab537f..71bd791 100644
--- a/computer_use_demo/gui_agent/planner/api_vlm_planner.py
+++ b/computer_use_demo/gui_agent/planner/api_vlm_planner.py
@@ -11,7 +11,7 @@
from anthropic.types.beta import BetaMessage, BetaTextBlock, BetaToolUseBlock, BetaMessageParam
from computer_use_demo.tools.screen_capture import get_screenshot
-from computer_use_demo.gui_agent.llm_utils.oai import run_oai_interleaved, run_ssh_llm_interleaved
+from computer_use_demo.gui_agent.llm_utils.oai import run_oai_interleaved, run_ssh_llm_interleaved, run_openrouter_interleaved
from computer_use_demo.gui_agent.llm_utils.qwen import run_qwen
from computer_use_demo.gui_agent.llm_utils.llm_utils import extract_data, encode_image
from computer_use_demo.tools.colorful_text import colorful_text_showui, colorful_text_vlm
@@ -43,9 +43,10 @@ def __init__(
self.model = "Qwen2-VL-7B-Instruct"
elif model == "qwen2.5-vl-7b (ssh)":
self.model = "Qwen2.5-VL-7B-Instruct"
+ elif model == "openrouter/auto":
+ self.model = "openrouter/auto"
else:
raise ValueError(f"Model {model} not supported")
-
self.provider = provider
self.system_prompt_suffix = system_prompt_suffix
self.api_key = api_key
@@ -92,7 +93,23 @@ def __call__(self, messages: list):
print(f"Sending messages to VLMPlanner: {planner_messages}")
- if self.model == "gpt-4o-2024-11-20":
+ # If provider is explicitly OpenRouter, route via OpenRouter regardless of model string
+ provider_str = self.provider.value if hasattr(self.provider, "value") else str(self.provider)
+ if provider_str == "openrouter":
+ # Use a generic auto model on OpenRouter unless a specific compatible ID is set elsewhere
+ or_model = "openrouter/auto"
+ vlm_response, token_usage = run_openrouter_interleaved(
+ messages=planner_messages,
+ system=self.system_prompt,
+ llm=or_model,
+ api_key=self.api_key,
+ max_tokens=self.max_tokens,
+ temperature=0,
+ )
+ print(f"openrouter token usage: {token_usage}")
+ self.total_token_usage += token_usage
+ self.total_cost += (token_usage * 0.15 / 1000000) # Placeholder cost
+ elif self.model == "gpt-4o-2024-11-20":
vlm_response, token_usage = run_oai_interleaved(
messages=planner_messages,
system=self.system_prompt,
@@ -117,6 +134,18 @@ def __call__(self, messages: list):
print(f"qwen token usage: {token_usage}")
self.total_token_usage += token_usage
self.total_cost += (token_usage * 0.02 / 7.25 / 1000) # 1USD=7.25CNY, https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-qianwen-vl-plus-api
+ elif self.model == "openrouter/auto":
+ vlm_response, token_usage = run_openrouter_interleaved(
+ messages=planner_messages,
+ system=self.system_prompt,
+ llm=self.model,
+ api_key=self.api_key,
+ max_tokens=self.max_tokens,
+ temperature=0,
+ )
+ print(f"openrouter token usage: {token_usage}")
+ self.total_token_usage += token_usage
+ self.total_cost += (token_usage * 0.15 / 1000000) # Placeholder cost
elif "Qwen" in self.model:
# ไปapi_keyไธญ่งฃๆhostๅport
try:
diff --git a/computer_use_demo/loop.py b/computer_use_demo/loop.py
index 5a511d9..2e8e60b 100644
--- a/computer_use_demo/loop.py
+++ b/computer_use_demo/loop.py
@@ -24,6 +24,7 @@ class APIProvider(StrEnum):
OPENAI = "openai"
QWEN = "qwen"
SSH = "ssh"
+ OPENROUTER = "openrouter"
PROVIDER_TO_DEFAULT_MODEL_NAME: dict[APIProvider, str] = {
@@ -33,6 +34,7 @@ class APIProvider(StrEnum):
APIProvider.OPENAI: "gpt-4o",
APIProvider.QWEN: "qwen2vl",
APIProvider.SSH: "qwen2-vl-2b",
+ APIProvider.OPENROUTER: "openrouter/auto",
}
PLANNER_MODEL_CHOICES_MAPPING = {
@@ -106,7 +108,7 @@ def sampling_loop_sync(
loop_mode = "unified"
- elif planner_model in ["gpt-4o", "gpt-4o-mini", "qwen2-vl-max"]:
+ elif planner_model in ["gpt-4o", "gpt-4o-mini", "qwen2-vl-max", "openrouter/auto"]:
from computer_use_demo.gui_agent.planner.api_vlm_planner import APIVLMPlanner
@@ -134,7 +136,6 @@ def sampling_loop_sync(
model=planner_model,
provider=planner_provider,
system_prompt_suffix=system_prompt_suffix,
- api_key=api_key,
api_response_callback=api_response_callback,
selected_screen=selected_screen,
output_callback=output_callback,