From 2afbff0e5f15cf7537d0ca36fa0583ed89b78f89 Mon Sep 17 00:00:00 2001 From: jackwener Date: Sun, 19 Apr 2026 02:14:53 +0800 Subject: [PATCH] =?UTF-8?q?docs:=20rewrite=20browser=20sections=20?= =?UTF-8?q?=E2=80=94=20browser=20is=20for=20AI=20Agents,=20not=20manual=20?= =?UTF-8?q?use?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From first principles, `opencli browser` commands exist for AI Agents to operate websites through the browser skill. Reframe both READMEs to reflect this: show users how to install the skill into their AI agent and describe tasks in natural language, rather than listing raw CLI commands. --- README.md | 47 ++++++++++++++++++++++++++++++++--------------- README.zh-CN.md | 47 ++++++++++++++++++++++++++++++++--------------- 2 files changed, 64 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index 38be48fd..d59fe84f 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ OpenCLI gives you one surface for three different kinds of automation: - **Use built-in adapters** for sites like Bilibili, Zhihu, Xiaohongshu, Reddit, HackerNews, Twitter/X, and [many more](#built-in-commands). -- **Drive a live browser directly** with `opencli browser` when an AI agent needs to click, type, extract, or inspect a page in real time. +- **Let AI Agents operate any website** — install the `opencli-browser` skill in your AI agent (Claude Code, Cursor, etc.), and it can navigate, click, type, extract, and inspect any page using your logged-in browser. - **Generate new adapters** from real browser behavior with `explore`, `synthesize`, `generate`, and `cascade`. It also works as a **CLI hub** for local tools such as `gh`, `docker`, and other binaries you register yourself, plus **desktop app adapters** for Electron apps like Cursor, Codex, Antigravity, ChatGPT, and Notion. @@ -19,10 +19,10 @@ It also works as a **CLI hub** for local tools such as `gh`, `docker`, and other ## Highlights - **Desktop App Control** — Drive Electron apps (Cursor, Codex, ChatGPT, Notion, etc.) directly from the terminal via CDP. -- **Browser Automation** — `browser` gives AI agents direct browser control: click, type, extract, screenshot — fully scriptable. +- **Browser Automation for AI Agents** — Install the `opencli-browser` skill, and your AI agent can operate any website: navigate, click, type, extract, screenshot — all through your logged-in Chrome session. - **Website → CLI** — Turn any website into a deterministic CLI: 90+ pre-built adapters, or generate your own with `opencli generate`. - **Account-safe** — Reuses Chrome/Chromium logged-in state; your credentials never leave the browser. -- **AI Agent ready** — `explore` discovers APIs, `synthesize` generates adapters, `cascade` finds auth strategies, `browser` controls the browser directly. +- **AI Agent ready** — `explore` discovers APIs, `synthesize` generates adapters, `cascade` finds auth strategies. The `opencli-browser` skill gives any AI agent full browser control. - **CLI Hub** — Discover, auto-install, and passthrough commands to any external CLI (gh, docker, obsidian, etc). - **Zero LLM cost** — No tokens consumed at runtime. Run 10,000 times and pay nothing. - **Deterministic** — Same command, same output schema, every time. Pipeable, scriptable, CI-friendly. @@ -70,12 +70,9 @@ Use OpenCLI directly when you want a reliable command instead of a live browser ## For AI Agents -Use two different entry points depending on the task: +OpenCLI's browser commands are designed to be used by AI Agents — not run manually. Install skills into your AI agent (Claude Code, Cursor, etc.), and the agent operates websites on your behalf using your logged-in Chrome session. -- [`skills/opencli-explorer/SKILL.md`](./skills/opencli-explorer/SKILL.md): the entry point for creating new adapters — supports both fully automated generation (`opencli generate `) and manual exploration workflows. -- [`skills/opencli-browser/SKILL.md`](./skills/opencli-browser/SKILL.md): the low-level control surface for live browsing, debugging, and manual intervention. - -Install the packaged skills with: +### Install skills ```bash npx skills add jackwener/opencli @@ -90,22 +87,42 @@ npx skills add jackwener/opencli --skill opencli-explorer npx skills add jackwener/opencli --skill opencli-oneshot ``` -In practice: +### Which skill to use + +| Skill | When to use | Example prompt to your AI agent | +|-------|------------|-------------------------------| +| **opencli-browser** | Operate any website in real time | "Help me post this content on Xiaohongshu" / "Check my Twitter notifications and summarize them" | +| **opencli-explorer** | Create a reusable CLI for a site | "Generate an adapter for douyin trending" | +| **opencli-oneshot** | Quick one-off: URL + goal → adapter | "Make a command that grabs the top posts from this page" | +| **opencli-usage** | Use existing built-in adapters | "Get the top 5 Bilibili trending videos" | + +### How it works -- start with `opencli-explorer` when the agent needs a reusable command for a site (it covers both automated and manual flows) -- use `opencli-browser` when the agent needs to inspect or steer the page directly +Once the `opencli-browser` skill is installed, your AI agent can: -Available browser commands include `open`, `state`, `click`, `type`, `select`, `keys`, `wait`, `get`, `screenshot`, `scroll`, `back`, `eval`, `network`, `init`, `verify`, and `close`. +1. **Navigate** to any URL using your logged-in browser +2. **Read** page content via structured DOM snapshots (not screenshots) +3. **Interact** — click buttons, fill forms, select options, press keys +4. **Extract** data from the page or intercept network API responses +5. **Wait** for elements, text, or page transitions + +The agent handles all the `opencli browser` commands internally — you just describe what you want done in natural language. + +**Skill references:** +- [`skills/opencli-browser/SKILL.md`](./skills/opencli-browser/SKILL.md) — real-time browser operation +- [`skills/opencli-explorer/SKILL.md`](./skills/opencli-explorer/SKILL.md) — adapter creation workflow ## Core Concepts -### `browser`: live control +### `browser`: AI Agent browser control + +`opencli browser` commands are the low-level primitives that AI Agents use to operate websites. You don't run these manually — instead, install the `opencli-browser` skill into your AI agent, describe what you want in natural language, and the agent handles the browser operations. -Use `opencli browser` when the task is inherently interactive and the agent needs to operate the page directly. +For example, tell your agent: *"Help me check my Xiaohongshu notifications"* — the agent will use `opencli browser open`, `state`, `click`, etc. under the hood. ### Built-in adapters: stable commands -Use site-specific commands such as `opencli hackernews top` or `opencli reddit hot` when the capability already exists and you want deterministic output. +Use site-specific commands such as `opencli hackernews top` or `opencli reddit hot` when the capability already exists. These are deterministic and work without browser — ideal for both humans and AI agents. ### `explore` / `synthesize` / `generate`: create new CLIs diff --git a/README.zh-CN.md b/README.zh-CN.md index 298ba8b1..fb1a44ca 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -11,7 +11,7 @@ OpenCLI 可以用同一套 CLI 做三类事情: - **直接使用现成适配器**:B站、知乎、小红书、Twitter/X、Reddit、HackerNews 等 [90+ 站点](#内置命令) 开箱即用。 -- **直接驱动浏览器**:用 `opencli browser` 让 AI Agent 实时点击、输入、提取、截图、检查页面状态。 +- **让 AI Agent 操作任意网站**:在你的 AI Agent(Claude Code、Cursor 等)中安装 `opencli-browser` skill,Agent 就能用你的已登录浏览器导航、点击、输入、提取任意网页内容。 - **把新网站生成成 CLI**:通过 `explore`、`synthesize`、`generate`、`cascade` 从真实页面行为推导出新的适配器。 除了网站能力,OpenCLI 还是一个 **CLI 枢纽**:你可以把 `gh`、`docker` 等本地工具统一注册到 `opencli` 下,也可以通过桌面端适配器控制 Cursor、Codex、Antigravity、ChatGPT、Notion 等 Electron 应用。 @@ -19,10 +19,10 @@ OpenCLI 可以用同一套 CLI 做三类事情: ## 亮点 - **桌面应用控制** — 通过 CDP 直接在终端驱动 Electron 应用(Cursor、Codex、ChatGPT、Notion 等)。 -- **浏览器自动化** — `browser` 让 AI Agent 直接控制浏览器:点击、输入、提取、截图,完全可编程。 +- **AI Agent 浏览器自动化** — 安装 `opencli-browser` skill,你的 AI Agent 就能操作任意网站:导航、点击、输入、提取、截图——全部通过你的已登录 Chrome 会话完成。 - **网站 → CLI** — 把任何网站变成确定性 CLI:90+ 内置适配器,或用 `opencli generate` 生成新的。 - **账号安全** — 复用 Chrome/Chromium 登录态,凭证永远不会离开浏览器。 -- **面向 AI Agent** — `explore` 发现 API,`synthesize` 生成适配器,`cascade` 探测认证策略,`browser` 直接控制浏览器。 +- **面向 AI Agent** — `explore` 发现 API,`synthesize` 生成适配器,`cascade` 探测认证策略。`opencli-browser` skill 让任何 AI Agent 都能完全控制浏览器。 - **CLI 枢纽** — 统一发现、自动安装、纯透传任何外部 CLI(gh、docker、obsidian 等)。 - **零 LLM 成本** — 运行时不消耗模型 token,跑 10,000 次也不花一分钱。 - **确定性输出** — 相同命令,相同输出结构,每次一致。可管道、可脚本、CI 友好。 @@ -68,12 +68,9 @@ opencli bilibili hot --limit 5 ## 给 AI Agent -按任务类型,AI Agent 有两个不同入口: +OpenCLI 的 browser 命令是给 AI Agent 用的——不是手动执行的。把 skill 安装到你的 AI Agent(Claude Code、Cursor 等)中,Agent 就能用你的已登录 Chrome 会话替你操作网站。 -- [`skills/opencli-explorer/SKILL.md`](./skills/opencli-explorer/SKILL.md):适配器创建入口,支持全自动生成(`opencli generate `)和手动探索两种流程。 -- [`skills/opencli-browser/SKILL.md`](./skills/opencli-browser/SKILL.md):底层控制入口,适合实时操作页面、debug 和人工介入。 - -安装全部 OpenCLI skills: +### 安装 skill ```bash npx skills add jackwener/opencli @@ -88,22 +85,42 @@ npx skills add jackwener/opencli --skill opencli-explorer npx skills add jackwener/opencli --skill opencli-oneshot ``` -实际使用上: +### 选择哪个 skill + +| Skill | 适用场景 | 你对 AI Agent 说的话 | +|-------|---------|-------------------| +| **opencli-browser** | 实时操作任意网站 | "帮我在小红书上发布这篇内容" / "看看我的 Twitter 通知并总结" | +| **opencli-explorer** | 为某个站点生成可复用 CLI | "帮我做一个抖音热门的适配器" | +| **opencli-oneshot** | 快速一次性:URL + 目标 → 适配器 | "帮我做一个抓取这个页面热帖的命令" | +| **opencli-usage** | 使用已有的内置适配器 | "获取 B 站热搜前 5" | + +### 工作原理 -- 需要把某个站点收成可复用命令时,优先走 `opencli-explorer`(涵盖自动和手动两种路径) -- 需要直接检查页面、操作页面时,再走 `opencli-browser` +安装 `opencli-browser` skill 后,你的 AI Agent 可以: -`browser` 可用命令包括:`open`、`state`、`click`、`type`、`select`、`keys`、`wait`、`get`、`screenshot`、`scroll`、`back`、`eval`、`network`、`init`、`verify`、`close`。 +1. **导航**到任意 URL,使用你的已登录浏览器 +2. **读取**页面内容——通过结构化 DOM 快照(不是截图) +3. **交互**——点击按钮、填写表单、选择选项、按键 +4. **提取**页面数据或拦截网络 API 响应 +5. **等待**元素、文本或页面跳转 + +Agent 在内部自动处理所有 `opencli browser` 命令——你只需用自然语言描述想做的事。 + +**Skill 参考文档:** +- [`skills/opencli-browser/SKILL.md`](./skills/opencli-browser/SKILL.md) — 实时浏览器操作 +- [`skills/opencli-explorer/SKILL.md`](./skills/opencli-explorer/SKILL.md) — 适配器创建工作流 ## 核心概念 -### `browser`:实时操作 +### `browser`:AI Agent 的浏览器控制层 + +`opencli browser` 命令是 AI Agent 操作网站的底层原语。你不需要手动运行这些命令——把 `opencli-browser` skill 安装到你的 AI Agent 中,用自然语言描述你想做的事,Agent 会自动处理浏览器操作。 -当任务本身就是交互式页面操作时,使用 `opencli browser` 直接驱动浏览器。 +比如你告诉 Agent:*"帮我看看小红书的通知"*——Agent 会在底层调用 `opencli browser open`、`state`、`click` 等命令。 ### 内置适配器:稳定命令 -当某个站点能力已经存在时,优先使用 `opencli hackernews top`、`opencli reddit hot` 这类稳定命令,而不是重新走一遍浏览器操作。 +当某个站点能力已经存在时,优先使用 `opencli hackernews top`、`opencli reddit hot` 这类稳定命令。这些命令是确定性的,无需浏览器——人类和 AI Agent 都可以直接使用。 ### `explore` / `synthesize` / `generate`:生成新的 CLI