Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 5 additions & 33 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,7 @@ OpenBrowser/
| Highlight tool | `server/agent/tools/highlight_tool.py` | HighlightTool for element discovery |
| Element interaction | `server/agent/tools/element_interaction_tool.py` | ElementInteractionTool with 2PC flow |
| Dialog tool | `server/agent/tools/dialog_tool.py` | DialogTool for dialog handling |
| JavaScript tool | `server/agent/tools/javascript_tool.py` | JavaScriptTool for fallback execution |
| ToolSet aggregator | `server/agent/tools/toolset.py` | OpenBrowserToolSet aggregates all 5 tools |
| ToolSet aggregator | `server/agent/tools/toolset.py` | OpenBrowserToolSet aggregates all 4 tools |
| Extension entry | `extension/src/background/index.ts` | Command handler, dialog processing |
| Dialog manager | `extension/src/commands/dialog.ts` | CDP dialog events, cascading |
| JavaScript execution | `extension/src/commands/javascript.ts` | CDP Runtime.evaluate, dialog race |
Expand Down Expand Up @@ -154,26 +153,15 @@ OpenBrowser uses Jinja2 templates for agent prompts, enabling dynamic content in
### Template Structure
- **Location**: `server/agent/prompts/` directory
- **Format**: `.j2` extension with Jinja2 syntax
- **5 Tool Templates**: Each of the 5 focused tools has its own template:
- **4 Tool Templates**: Each of the 4 focused tools has its own template:
- `tab_tool.j2` - Tab management documentation
- `highlight_tool.j2` - Element discovery with color coding
- `element_interaction_tool.j2` - 2PC flow with orange confirmations
- `dialog_tool.j2` - Dialog handling
- `javascript_tool.j2` - JavaScript fallback

### Dynamic JavaScript Control
The `javascript_execute` command can be disabled via environment variable:
```bash
export OPEN_BROWSER_DISABLE_JAVASCRIPT_EXECUTE=1
```
When disabled:
- Template removes all `javascript_execute` references using `{% if not disable_javascript %}` conditionals
- `OpenBrowserAction.type` description excludes `'javascript_execute'`
- Command execution returns error if attempted

### Template Features
- **Conditional rendering**: Use `{% if %}` blocks for configurable sections
- **Variable injection**: Pass context variables like `disable_javascript` at render time
- **Variable injection**: Pass context variables like model profile flags at render time
- **Clean output**: `trim_blocks=True` and `lstrip_blocks=True` remove extra whitespace
- **Caching**: Templates are cached after first load for performance

Expand Down Expand Up @@ -246,34 +234,18 @@ Elements are identified by a 6-character hash string:
| `scroll_element` | Scroll by element ID | `{element_id: "m5k2p8", direction: "down"}` |
| `keyboard_input` | Type into element | `{element_id: "j4n7q1", text: "hello"}` |

### Tool Mapping (5-Tool Architecture)
The visual interaction workflow is implemented across 5 focused tools:
### Tool Mapping (4-Tool Architecture)
The visual interaction workflow is implemented across 4 focused tools:

| Tool | Commands | Purpose |
|------|----------|---------|
| `tab` | `tab init`, `tab open`, `tab close`, `tab switch`, `tab list`, `tab refresh`, `tab view`, `tab back`, `tab forward` | Session and tab management |
| `highlight` | `highlight_elements` | Element discovery with blue overlays |
| `element_interaction` | `click_element`, `confirm_click_element`, `hover_element`, `scroll_element`, `keyboard_input`, `confirm_keyboard_input`, `select_element` | Element interaction with 2PC only for click and keyboard input |
| `dialog` | `handle_dialog` | Dialog handling (accept/dismiss) |
| `javascript` | `javascript_execute` | JavaScript fallback execution |

## UNIQUE PATTERNS

### JavaScript-First Automation (Fallback)
For complex interactions not covered by visual commands:
```javascript
// Click by visible text (universal pattern)
(() => {
const text = 'YOUR_TEXT';
const leaf = Array.from(document.querySelectorAll('*'))
.find(el => el.children.length === 0 && el.textContent.includes(text));
if (!leaf) return 'not found';
const target = leaf.closest('a, button, [role="button"]') || leaf;
target.click();
return 'clicked: ' + target.tagName;
})()
```

### Multi-Session Tab Isolation
- `tab init <url>` creates managed session with tab group
- `conversation_id` ties all commands to session
Expand Down
14 changes: 10 additions & 4 deletions eval/evaluate_browser_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,7 @@ def cleanup_managed_tabs(self, conversation_id: str) -> bool:

return all_closed


class EvalServerClient:
"""Client for evaluation server tracking API"""

Expand Down Expand Up @@ -609,15 +610,17 @@ def start_openbrowser(self) -> bool:
return True

root_dir = EVAL_DIR.parent
logger.error(f"""
logger.error(
f"""
❌ OpenBrowser server is not running!
Please start the OpenBrowser server manually with:

cd {root_dir}
uv run local-chrome-server serve

The server should start on port 8765 (REST API) and 8766 (WebSocket).
""")
"""
)
return False

except Exception as e:
Expand All @@ -634,7 +637,8 @@ def start_eval_server(self) -> bool:

eval_dir = EVAL_DIR
root_dir = EVAL_DIR.parent
logger.error(f"""
logger.error(
f"""
❌ Eval server is not running!
Please start the eval server manually with:

Expand All @@ -646,7 +650,8 @@ def start_eval_server(self) -> bool:
uv run python eval/server.py

The server should start on port 16605.
""")
"""
)
return False

except Exception as e:
Expand Down Expand Up @@ -1397,6 +1402,7 @@ def _check_count_min_condition(

def _event_matches_expected(self, event: Dict, expected: Dict) -> bool:
"""Check if a track event matches expected criteria"""

def normalize_text(value: Any) -> str:
return str(value or "").casefold()

Expand Down
Loading
Loading