From 2afbff0e5f15cf7537d0ca36fa0583ed89b78f89 Mon Sep 17 00:00:00 2001
From: jackwener <jakevingoo@gmail.com>
Date: Sun, 19 Apr 2026 02:14:53 +0800
Subject: [PATCH] =?UTF-8?q?docs:=20rewrite=20browser=20sections=20?=
 =?UTF-8?q?=E2=80=94=20browser=20is=20for=20AI=20Agents,=20not=20manual=20?=
 =?UTF-8?q?use?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

From first principles, `opencli browser` commands exist for AI Agents to
operate websites through the browser skill. Reframe both READMEs to reflect
this: show users how to install the skill into their AI agent and describe
tasks in natural language, rather than listing raw CLI commands.
---
 README.md       | 47 ++++++++++++++++++++++++++++++++---------------
 README.zh-CN.md | 47 ++++++++++++++++++++++++++++++++---------------
 2 files changed, 64 insertions(+), 30 deletions(-)
diff --git a/README.md b/README.md
index 38be48fd..d59fe84f 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@
 OpenCLI gives you one surface for three different kinds of automation:
 
 - **Use built-in adapters** for sites like Bilibili, Zhihu, Xiaohongshu, Reddit, HackerNews, Twitter/X, and [many more](#built-in-commands).
-- **Drive a live browser directly** with `opencli browser` when an AI agent needs to click, type, extract, or inspect a page in real time.
+- **Let AI Agents operate any website** — install the `opencli-browser` skill in your AI agent (Claude Code, Cursor, etc.), and it can navigate, click, type, extract, and inspect any page using your logged-in browser.
 - **Generate new adapters** from real browser behavior with `explore`, `synthesize`, `generate`, and `cascade`.
 
 It also works as a **CLI hub** for local tools such as `gh`, `docker`, and other binaries you register yourself, plus **desktop app adapters** for Electron apps like Cursor, Codex, Antigravity, ChatGPT, and Notion.
@@ -19,10 +19,10 @@ It also works as a **CLI hub** for local tools such as `gh`, `docker`, and other
 ## Highlights
 
 - **Desktop App Control** — Drive Electron apps (Cursor, Codex, ChatGPT, Notion, etc.) directly from the terminal via CDP.
-- **Browser Automation** — `browser` gives AI agents direct browser control: click, type, extract, screenshot — fully scriptable.
+- **Browser Automation for AI Agents** — Install the `opencli-browser` skill, and your AI agent can operate any website: navigate, click, type, extract, screenshot — all through your logged-in Chrome session.
 - **Website → CLI** — Turn any website into a deterministic CLI: 90+ pre-built adapters, or generate your own with `opencli generate`.
 - **Account-safe** — Reuses Chrome/Chromium logged-in state; your credentials never leave the browser.
-- **AI Agent ready** — `explore` discovers APIs, `synthesize` generates adapters, `cascade` finds auth strategies, `browser` controls the browser directly.
+- **AI Agent ready** — `explore` discovers APIs, `synthesize` generates adapters, `cascade` finds auth strategies. The `opencli-browser` skill gives any AI agent full browser control.
 - **CLI Hub** — Discover, auto-install, and passthrough commands to any external CLI (gh, docker, obsidian, etc).
 - **Zero LLM cost** — No tokens consumed at runtime. Run 10,000 times and pay nothing.
 - **Deterministic** — Same command, same output schema, every time. Pipeable, scriptable, CI-friendly.
@@ -70,12 +70,9 @@ Use OpenCLI directly when you want a reliable command instead of a live browser
 
 ## For AI Agents
 
-Use two different entry points depending on the task:
+OpenCLI's browser commands are designed to be used by AI Agents — not run manually. Install skills into your AI agent (Claude Code, Cursor, etc.), and the agent operates websites on your behalf using your logged-in Chrome session.
 
-- [`skills/opencli-explorer/SKILL.md`](./skills/opencli-explorer/SKILL.md): the entry point for creating new adapters — supports both fully automated generation (`opencli generate <url>`) and manual exploration workflows.
-- [`skills/opencli-browser/SKILL.md`](./skills/opencli-browser/SKILL.md): the low-level control surface for live browsing, debugging, and manual intervention.
-
-Install the packaged skills with:
+### Install skills
 
 ```bash
 npx skills add jackwener/opencli
@@ -90,22 +87,42 @@ npx skills add jackwener/opencli --skill opencli-explorer
 npx skills add jackwener/opencli --skill opencli-oneshot
 ```
 
-In practice:
+### Which skill to use
+
+| Skill | When to use | Example prompt to your AI agent |
+|-------|------------|-------------------------------|
+| **opencli-browser** | Operate any website in real time | "Help me post this content on Xiaohongshu" / "Check my Twitter notifications and summarize them" |
+| **opencli-explorer** | Create a reusable CLI for a site | "Generate an adapter for douyin trending" |
+| **opencli-oneshot** | Quick one-off: URL + goal → adapter | "Make a command that grabs the top posts from this page" |
+| **opencli-usage** | Use existing built-in adapters | "Get the top 5 Bilibili trending videos" |
+
+### How it works
 
-- start with `opencli-explorer` when the agent needs a reusable command for a site (it covers both automated and manual flows)
-- use `opencli-browser` when the agent needs to inspect or steer the page directly
+Once the `opencli-browser` skill is installed, your AI agent can:
 
-Available browser commands include `open`, `state`, `click`, `type`, `select`, `keys`, `wait`, `get`, `screenshot`, `scroll`, `back`, `eval`, `network`, `init`, `verify`, and `close`.
+1. **Navigate** to any URL using your logged-in browser
+2. **Read** page content via structured DOM snapshots (not screenshots)
+3. **Interact** — click buttons, fill forms, select options, press keys
+4. **Extract** data from the page or intercept network API responses
+5. **Wait** for elements, text, or page transitions
+
+The agent handles all the `opencli browser` commands internally — you just describe what you want done in natural language.
+
+**Skill references:**
+- [`skills/opencli-browser/SKILL.md`](./skills/opencli-browser/SKILL.md) — real-time browser operation
+- [`skills/opencli-explorer/SKILL.md`](./skills/opencli-explorer/SKILL.md) — adapter creation workflow
 
 ## Core Concepts
 
-### `browser`: live control
+### `browser`: AI Agent browser control
+
+`opencli browser` commands are the low-level primitives that AI Agents use to operate websites. You don't run these manually — instead, install the `opencli-browser` skill into your AI agent, describe what you want in natural language, and the agent handles the browser operations.
 
-Use `opencli browser` when the task is inherently interactive and the agent needs to operate the page directly.
+For example, tell your agent: *"Help me check my Xiaohongshu notifications"* — the agent will use `opencli browser open`, `state`, `click`, etc. under the hood.
 
 ### Built-in adapters: stable commands
 
-Use site-specific commands such as `opencli hackernews top` or `opencli reddit hot` when the capability already exists and you want deterministic output.
+Use site-specific commands such as `opencli hackernews top` or `opencli reddit hot` when the capability already exists. These are deterministic and work without browser — ideal for both humans and AI agents.
 
 ### `explore` / `synthesize` / `generate`: create new CLIs
 
diff --git a/README.zh-CN.md b/README.zh-CN.md
index 298ba8b1..fb1a44ca 100644
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@@ -11,7 +11,7 @@
 OpenCLI 可以用同一套 CLI 做三类事情：
 
 - **直接使用现成适配器**：B站、知乎、小红书、Twitter/X、Reddit、HackerNews 等 [90+ 站点](#内置命令) 开箱即用。
-- **直接驱动浏览器**：用 `opencli browser` 让 AI Agent 实时点击、输入、提取、截图、检查页面状态。
+- **让 AI Agent 操作任意网站**：在你的 AI Agent（Claude Code、Cursor 等）中安装 `opencli-browser` skill，Agent 就能用你的已登录浏览器导航、点击、输入、提取任意网页内容。
 - **把新网站生成成 CLI**：通过 `explore`、`synthesize`、`generate`、`cascade` 从真实页面行为推导出新的适配器。
 
 除了网站能力，OpenCLI 还是一个 **CLI 枢纽**：你可以把 `gh`、`docker` 等本地工具统一注册到 `opencli` 下，也可以通过桌面端适配器控制 Cursor、Codex、Antigravity、ChatGPT、Notion 等 Electron 应用。
@@ -19,10 +19,10 @@ OpenCLI 可以用同一套 CLI 做三类事情：
 ## 亮点
 
 - **桌面应用控制** — 通过 CDP 直接在终端驱动 Electron 应用（Cursor、Codex、ChatGPT、Notion 等）。
-- **浏览器自动化** — `browser` 让 AI Agent 直接控制浏览器：点击、输入、提取、截图，完全可编程。
+- **AI Agent 浏览器自动化** — 安装 `opencli-browser` skill，你的 AI Agent 就能操作任意网站：导航、点击、输入、提取、截图——全部通过你的已登录 Chrome 会话完成。
 - **网站 → CLI** — 把任何网站变成确定性 CLI：90+ 内置适配器，或用 `opencli generate` 生成新的。
 - **账号安全** — 复用 Chrome/Chromium 登录态，凭证永远不会离开浏览器。
-- **面向 AI Agent** — `explore` 发现 API，`synthesize` 生成适配器，`cascade` 探测认证策略，`browser` 直接控制浏览器。
+- **面向 AI Agent** — `explore` 发现 API，`synthesize` 生成适配器，`cascade` 探测认证策略。`opencli-browser` skill 让任何 AI Agent 都能完全控制浏览器。
 - **CLI 枢纽** — 统一发现、自动安装、纯透传任何外部 CLI（gh、docker、obsidian 等）。
 - **零 LLM 成本** — 运行时不消耗模型 token，跑 10,000 次也不花一分钱。
 - **确定性输出** — 相同命令，相同输出结构，每次一致。可管道、可脚本、CI 友好。
@@ -68,12 +68,9 @@ opencli bilibili hot --limit 5
 
 ## 给 AI Agent
 
-按任务类型，AI Agent 有两个不同入口：
+OpenCLI 的 browser 命令是给 AI Agent 用的——不是手动执行的。把 skill 安装到你的 AI Agent（Claude Code、Cursor 等）中，Agent 就能用你的已登录 Chrome 会话替你操作网站。
 
-- [`skills/opencli-explorer/SKILL.md`](./skills/opencli-explorer/SKILL.md)：适配器创建入口，支持全自动生成（`opencli generate <url>`）和手动探索两种流程。
-- [`skills/opencli-browser/SKILL.md`](./skills/opencli-browser/SKILL.md)：底层控制入口，适合实时操作页面、debug 和人工介入。
-
-安装全部 OpenCLI skills：
+### 安装 skill
 
 ```bash
 npx skills add jackwener/opencli
@@ -88,22 +85,42 @@ npx skills add jackwener/opencli --skill opencli-explorer
 npx skills add jackwener/opencli --skill opencli-oneshot
 ```
 
-实际使用上：
+### 选择哪个 skill
+
+| Skill | 适用场景 | 你对 AI Agent 说的话 |
+|-------|---------|-------------------|
+| **opencli-browser** | 实时操作任意网站 | "帮我在小红书上发布这篇内容" / "看看我的 Twitter 通知并总结" |
+| **opencli-explorer** | 为某个站点生成可复用 CLI | "帮我做一个抖音热门的适配器" |
+| **opencli-oneshot** | 快速一次性：URL + 目标 → 适配器 | "帮我做一个抓取这个页面热帖的命令" |
+| **opencli-usage** | 使用已有的内置适配器 | "获取 B 站热搜前 5" |
+
+### 工作原理
 
-- 需要把某个站点收成可复用命令时，优先走 `opencli-explorer`（涵盖自动和手动两种路径）
-- 需要直接检查页面、操作页面时，再走 `opencli-browser`
+安装 `opencli-browser` skill 后，你的 AI Agent 可以：
 
-`browser` 可用命令包括：`open`、`state`、`click`、`type`、`select`、`keys`、`wait`、`get`、`screenshot`、`scroll`、`back`、`eval`、`network`、`init`、`verify`、`close`。
+1. **导航**到任意 URL，使用你的已登录浏览器
+2. **读取**页面内容——通过结构化 DOM 快照（不是截图）
+3. **交互**——点击按钮、填写表单、选择选项、按键
+4. **提取**页面数据或拦截网络 API 响应
+5. **等待**元素、文本或页面跳转
+
+Agent 在内部自动处理所有 `opencli browser` 命令——你只需用自然语言描述想做的事。
+
+**Skill 参考文档：**
+- [`skills/opencli-browser/SKILL.md`](./skills/opencli-browser/SKILL.md) — 实时浏览器操作
+- [`skills/opencli-explorer/SKILL.md`](./skills/opencli-explorer/SKILL.md) — 适配器创建工作流
 
 ## 核心概念
 
-### `browser`：实时操作
+### `browser`：AI Agent 的浏览器控制层
+
+`opencli browser` 命令是 AI Agent 操作网站的底层原语。你不需要手动运行这些命令——把 `opencli-browser` skill 安装到你的 AI Agent 中，用自然语言描述你想做的事，Agent 会自动处理浏览器操作。
 
-当任务本身就是交互式页面操作时，使用 `opencli browser` 直接驱动浏览器。
+比如你告诉 Agent：*"帮我看看小红书的通知"*——Agent 会在底层调用 `opencli browser open`、`state`、`click` 等命令。
 
 ### 内置适配器：稳定命令
 
-当某个站点能力已经存在时，优先使用 `opencli hackernews top`、`opencli reddit hot` 这类稳定命令，而不是重新走一遍浏览器操作。
+当某个站点能力已经存在时，优先使用 `opencli hackernews top`、`opencli reddit hot` 这类稳定命令。这些命令是确定性的，无需浏览器——人类和 AI Agent 都可以直接使用。
 
 ### `explore` / `synthesize` / `generate`：生成新的 CLI