From 1f02bfc7a3b079ee105b0db8f9b930bcdf885704 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 10:15:08 +0800 Subject: [PATCH 01/14] docs: add CLI migration design spec Design spec for migrating skills repo from curl/API to ListenHub CLI, adding slides and music skills, and rebranding as ListenHub CLI Skills. Part of marswaveai/listenhub-ralph#44 --- docs/specs/cli-migration.md | 380 ++++++++++++++++++++++++++++++++++++ 1 file changed, 380 insertions(+) create mode 100644 docs/specs/cli-migration.md diff --git a/docs/specs/cli-migration.md b/docs/specs/cli-migration.md new file mode 100644 index 0000000..2ddfde3 --- /dev/null +++ b/docs/specs/cli-migration.md @@ -0,0 +1,380 @@ +# Skills 仓库 CLI 迁移设计文档 + +Part of marswaveai/listenhub-ralph#44 + +## 背景 + +ListenHub CLI (`@marswave/listenhub-cli`) 已发布,提供完整的命令行工具覆盖所有内容创作功能。当前 skills 仓库中所有 skill 通过 curl + API 文档的方式调用后端,需要全面迁移到 CLI 调用方式。 + +## 目标 + +1. 新增 `slides` 和 `music` 两个 skill +2. 所有 skill 从 curl/API 调用迁移到 `listenhub` CLI 调用 +3. 项目品牌从 "ListenHub Skills" 升级为 "ListenHub CLI Skills" +4. 新增 `listenhub-cli` 伞型 skill,替代已废弃的 `listenhub` skill + +## 非目标 + +- 不迁移 `asr`(纯本地 SpeechBrain/Whisper,不涉及远端 API) +- 不迁移 `content-parser`(CLI 暂无 content-extract 命令,保留 curl 方式) +- 不改变用户交互流程(AskUserQuestion 收参 → 确认 → 执行的模式不变) + +--- + +## 一、执行模型变更 + +### 1.1 认证:API Key → OAuth + +| 维度 | 迁移前 | 迁移后 | +|------|--------|--------| +| 认证方式 | `LISTENHUB_API_KEY` 环境变量 | `listenhub auth login` OAuth | +| 检查方式 | 读 `$LISTENHUB_API_KEY` | `listenhub auth status --json` | +| 凭证存储 | 用户手动配 `.zshrc` | CLI 自动管理 `~/.config/listenhub/credentials.json` | +| Token 刷新 | 无(永久 key) | CLI 自动刷新 | + +**迁移后的 Step -1(所有 skill 统一):** + +```bash +# 检查 CLI 是否安装 +if ! command -v listenhub &>/dev/null; then + echo "需要安装 ListenHub CLI: npm install -g @marswave/listenhub-cli" + exit 1 +fi + +# 检查是否已登录 +AUTH=$(listenhub auth status --json 2>/dev/null) +if [ "$(echo "$AUTH" | jq -r '.authenticated')" != "true" ]; then + echo "请先登录: listenhub auth login" + exit 1 +fi +``` + +### 1.2 API 调用:curl → CLI 命令 + +| 维度 | 迁移前 | 迁移后 | +|------|--------|--------| +| 提交任务 | `curl -sS -X POST ...` 构造完整请求 | `listenhub create --flag ...` | +| 轮询状态 | `run_in_background` bash 循环 + jq 解析 | CLI 内建轮询(默认行为) | +| 异步模式 | 无 | `--no-wait` 立即返回 ID | +| 结果解析 | jq 从 curl 响应提取字段 | `--json` 输出结构化 JSON | +| 超时控制 | `seq 1 30` + `sleep 10` 硬编码 | `--timeout ` | + +**示例 — 播客生成:** + +迁移前: +```bash +curl -sS -X POST "https://api.marswave.ai/openapi/v1/podcast/episodes" \ + -H "Authorization: Bearer $LISTENHUB_API_KEY" \ + -H "Content-Type: application/json" \ + -H "X-Source: skills" \ + -d '{"sources": [...], "speakers": [...], "language": "zh", "mode": "quick"}' + +# 然后用 background polling loop... +``` + +迁移后: +```bash +listenhub podcast create \ + --query "2026年AI趋势" \ + --mode quick \ + --lang zh \ + --speaker "原野" \ + --json +``` + +### 1.3 Speaker 查询 + +迁移前: +```bash +curl -sS "https://api.marswave.ai/openapi/v1/speakers/list?language=zh" \ + -H "Authorization: Bearer $LISTENHUB_API_KEY" \ + -H "X-Source: skills" +``` + +迁移后: +```bash +listenhub speakers list --lang zh --json +``` + +--- + +## 二、新增 Skill + +### 2.1 `/slides` — 幻灯片生成 + +基于 storybook API 的 `slides` mode,CLI 命令 `listenhub slides create`。 + +**定位**:从主题/URL/文本生成幻灯片,可选语音旁白。与 `/explainer` 共用 storybook 后端但交互语义不同——slides 偏演示文稿,explainer 偏视频讲解。 + +**CLI 映射**: + +| 参数 | CLI flag | 默认值 | +|------|----------|--------| +| 主题 | `--query` | — | +| 参考 URL | `--source-url`(可重复) | — | +| 参考文本 | `--source-text`(可重复) | — | +| 语言 | `--lang` | 自动检测 | +| 主播 | `--speaker` / `--speaker-id` | 内建默认 | +| 图片尺寸 | `--image-size` | 2K | +| 宽高比 | `--aspect-ratio` | 16:9 | +| 视觉风格 | `--style` | — | +| 跳过音频 | 默认跳过,`--no-skip-audio` 启用 | 跳过 | + +**交互流程**: + +1. 主题/内容(自由文本) +2. 语言(从输入推断,可覆盖) +3. 是否需要语音旁白(默认否) +4. 如需旁白 → Speaker 选择 +5. 视觉风格(可选) +6. 确认 & 生成 + +**SKILL.md 参照**:以 `explainer/SKILL.md` 为模板,调整 mode 为 slides,默认跳过音频。 + +### 2.2 `/music` — AI 音乐生成 + +全新功能,CLI 命令 `listenhub music generate` 和 `listenhub music cover`。 + +**定位**:从文字描述生成原创音乐,或从参考音频创建翻唱版本。 + +**CLI 映射**: + +| 子命令 | 功能 | 关键参数 | +|--------|------|----------| +| `music generate` | 文生音乐 | `--prompt`(必填), `--style`, `--title`, `--instrumental` | +| `music cover` | 翻唱 | `--audio`(必填,本地文件或 URL), `--prompt`, `--style`, `--title`, `--instrumental` | +| `music list` | 列表 | `--page`, `--page-size`, `--status` | +| `music get ` | 详情 | — | + +**交互流程**: + +1. 创作模式:原创 / 翻唱 +2. (原创)音乐描述 prompt +3. (翻唱)参考音频文件或 URL +4. 风格(可选) +5. 标题(可选,可自动生成) +6. 是否纯音乐(无人声) +7. 确认 & 生成 + +**注意**:music 的超时时间较长(默认 600s),polling 间隔 10s。 + +--- + +## 三、现有 Skill 迁移 + +以下 skill 需要从 curl/API 迁移到 CLI 调用: + +### 3.1 `/podcast` + +| 项目 | 变更 | +|------|------| +| 认证 | API Key → CLI auth | +| 创建 | curl POST → `listenhub podcast create` | +| 轮询 | bash loop → CLI 内建等待 | +| Speaker | curl speakers API → `listenhub speakers list` | +| 参考 | `shared/api-podcast.md` → CLI `--help` | + +**CLI 命令对应**: +```bash +listenhub podcast create \ + --query "{topic}" \ + --source-url "{url}" \ # 可重复 + --source-text "{text}" \ # 可重复 + --mode {quick|deep|debate} \ + --lang {en|zh|ja} \ + --speaker "{name}" \ # 可重复,最多 2 个 + --json +``` + +Two-step 流程:用 `--no-wait` 提交,`listenhub creation get --json` 轮询文本,确认后用直接 API 提交音频(CLI 暂不支持 two-step 的第二步,保留 curl)。 + +### 3.2 `/tts` + +```bash +listenhub tts create \ + --text "{text}" \ + --source-url "{url}" \ + --source-text "{text}" \ + --mode {smart|direct} \ + --lang {en|zh|ja} \ + --speaker "{name}" \ + --json +``` + +### 3.3 `/explainer` + +```bash +listenhub explainer create \ + --query "{topic}" \ + --source-url "{url}" \ + --mode {info|story} \ + --lang {en|zh|ja} \ + --speaker "{name}" \ + --image-size {2K|4K} \ + --aspect-ratio {16:9|9:16|1:1} \ + --style "{style}" \ + --skip-audio \ # 仅文本脚本 + --json +``` + +注意:explainer 的 `--skip-audio` 用于"仅文本脚本"模式,映射原来的 "Text script only" 选项。 + +### 3.4 `/image-gen` + +```bash +listenhub image create \ + --prompt "{description}" \ + --model "{model}" \ + --aspect-ratio {16:9|9:16|1:1} \ + --size {1K|2K|4K} \ + --reference "{path-or-url}" \ # 可重复,最多 5 个 + --json +``` + +### 3.5 不迁移的 Skill + +| Skill | 原因 | +|-------|------| +| `/asr` | 纯本地(SpeechBrain/Whisper),不调用远端 API | +| `/content-parser` | CLI 暂无 content-extract 命令 | +| `/creator` | 编排层,调用子 skill;子 skill 迁移后自动受益,但 creator 内部使用的 content-parser 仍走 curl | + +--- + +## 四、shared/ 目录变更 + +### 4.1 新增文件 + +| 文件 | 内容 | +|------|------| +| `shared/cli-authentication.md` | CLI 安装检查 + `listenhub auth login/status` | +| `shared/cli-patterns.md` | CLI 执行模式:`--json` 输出解析、`--no-wait` 异步、`--timeout` 控制、错误处理 | +| `shared/cli-speakers.md` | `listenhub speakers list --json` 替代 curl speaker API | + +### 4.2 保留不变的文件(仍被 content-parser / creator 使用) + +| 文件 | 原因 | +|------|------| +| `shared/authentication.md` | content-parser 仍需 API Key | +| `shared/api-content-extract.md` | content-parser 使用 | +| `shared/api-speakers.md` | 保留为参考,但 CLI skill 不再直接引用 | +| `shared/config-pattern.md` | config 管理模式不变(outputMode/language/defaultSpeakers) | +| `shared/output-mode.md` | 输出模式选择不变 | +| `shared/common-patterns.md` | content-parser 的轮询仍需要 | +| `shared/speaker-selection.md` | 交互流程不变,只是底层查询改用 CLI | + +### 4.3 保留但不再被 CLI skill 引用的文件 + +这些 API 文档保留在仓库中作为参考,但迁移后的 SKILL.md 不再引用: + +- `shared/api-podcast.md` +- `shared/api-tts.md` +- `shared/api-image.md` +- `shared/api-storybook.md` + +--- + +## 五、`listenhub-cli` 伞型 Skill + +新增 `listenhub-cli/SKILL.md`,替代已废弃的 `listenhub/SKILL.md`。 + +**作用**:当用户触发通用 ListenHub 关键词时,路由到具体的子 skill。 + +```yaml +--- +name: listenhub-cli +description: | + ListenHub CLI skills 入口。当用户触发任何 ListenHub 相关操作时路由到对应 skill。 + 触发词: "make a podcast", "explainer video", "read aloud", "TTS", + "generate image", "做播客", "解说视频", "朗读", "生成图片", + "幻灯片", "slides", "音乐", "music", "generate music". +--- +``` + +**路由表**: + +| 用户意图 | 路由到 | +|---------|--------| +| 播客 | `/podcast` | +| 讲解视频 | `/explainer` | +| 朗读 / TTS | `/tts` | +| 生成图片 | `/image-gen` | +| 幻灯片 | `/slides` | +| 音乐 | `/music` | +| 提取 URL 内容 | `/content-parser` | + +同时更新已废弃的 `listenhub/SKILL.md`,在路由表中补充 slides 和 music。 + +--- + +## 六、SKILL.md 元数据变更 + +所有迁移后的 SKILL.md frontmatter 统一调整: + +```yaml +# 迁移前 +metadata: + openclaw: + emoji: "🎙️" + requires: + env: ["LISTENHUB_API_KEY"] + primaryEnv: "LISTENHUB_API_KEY" + +# 迁移后 +metadata: + openclaw: + emoji: "🎙️" + requires: + bin: ["listenhub"] + primaryBin: "listenhub" +``` + +- `requires.env` → `requires.bin`:从环境变量依赖改为二进制依赖 +- `primaryEnv` → `primaryBin`:主要依赖标识 + +--- + +## 七、README 更新 + +- `README.md` / `README.zh.md` 中的项目名称升级为 "ListenHub CLI Skills" +- 安装说明新增 CLI 安装步骤:`npm install -g @marswave/listenhub-cli` +- Skill 列表补充 slides 和 music +- 认证说明从 API Key 改为 OAuth + +--- + +## 八、Config 模式保留 + +`.listenhub//config.json` 的 config 管理模式保持不变: + +- `outputMode`(inline/download/both) +- `language` +- `defaultSpeakers` +- skill 特有的默认值 + +变化点: +- 不再有 "Step -1: API Key Check" → 改为 "Step -1: CLI Auth Check" +- Speaker 查询从 curl 改为 `listenhub speakers list --json` +- 其余 config 读写、Zero-Question Boot、Setup Flow 逻辑不变 + +--- + +## 九、实现顺序 + +建议分步实施,降低一次性变更风险: + +| 步骤 | 内容 | 依赖 | +|------|------|------| +| 1 | 新增 `shared/cli-authentication.md`、`shared/cli-patterns.md`、`shared/cli-speakers.md` | 无 | +| 2 | 新增 `/slides` SKILL.md | 步骤 1 | +| 3 | 新增 `/music` SKILL.md | 步骤 1 | +| 4 | 迁移 `/podcast` SKILL.md | 步骤 1 | +| 5 | 迁移 `/tts` SKILL.md | 步骤 1 | +| 6 | 迁移 `/explainer` SKILL.md | 步骤 1 | +| 7 | 迁移 `/image-gen` SKILL.md | 步骤 1 | +| 8 | 新增 `listenhub-cli/SKILL.md` + 更新 `listenhub/SKILL.md` | 步骤 2-7 | +| 9 | 更新 `shared/speaker-selection.md` 中的查询方式 | 步骤 1 | +| 10 | 更新 README | 步骤 2-8 | +| 11 | 更新 `creator/` 模板中引用(content-parser 不变,其余子 skill 引用更新) | 步骤 4-7 | + +步骤 2-7 可并行执行。 From b927b0e006cbc09af986de143c5bbcc4b37be8b3 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 10:58:48 +0800 Subject: [PATCH 02/14] docs: fix spec review findings - auth model, cleanup strategy, consistency - Add API auth isolation section (OpenAPI API-Key vs CLI OAuth/JWT) - Confirm podcast two-step is OpenAPI-only, remove from CLI migration - Delete all shared/ API docs, inline content-parser dependencies - Fix skip-audio semantic asymmetry between slides and explainer - Add --speaker-id to all speaker-supporting skills consistently - Add --lang to image-gen CLI mapping - Add music --status enum values - Clarify creator/ migration: auto-benefits + explicit template updates - Update implementation steps with content-parser inlining and shared/ cleanup --- docs/specs/cli-migration.md | 71 +++++++++++++++++++++++++------------ 1 file changed, 49 insertions(+), 22 deletions(-) diff --git a/docs/specs/cli-migration.md b/docs/specs/cli-migration.md index 2ddfde3..39891fc 100644 --- a/docs/specs/cli-migration.md +++ b/docs/specs/cli-migration.md @@ -16,9 +16,25 @@ ListenHub CLI (`@marswave/listenhub-cli`) 已发布,提供完整的命令行 ## 非目标 - 不迁移 `asr`(纯本地 SpeechBrain/Whisper,不涉及远端 API) -- 不迁移 `content-parser`(CLI 暂无 content-extract 命令,保留 curl 方式) +- 不迁移 `content-parser`(CLI 暂无 content-extract 命令,保留 curl + API Key 方式) - 不改变用户交互流程(AskUserQuestion 收参 → 确认 → 执行的模式不变) +## 关键约束:API 认证模型隔离 + +经调研 `listenhub-api-server`,OpenAPI 路由和 CLI/Regular 路由使用**完全隔离**的认证体系: + +| 维度 | OpenAPI 路由 | CLI/Regular 路由 | +|------|-------------|-----------------| +| 认证方式 | API Key (`lh_sk_*`) | JWT(OAuth Token) | +| Token 格式 | `Bearer lh_sk_keyId_secret` | `Bearer ` | +| 验证方式 | bcrypt 哈希比对 | JWT 签名验证 | +| 端点访问 | 仅 `/openapi/v1/*` | 仅 `/api/*` | + +**影响**: +- Podcast two-step(`POST /v1/podcast/episodes/text-content` + `POST /v1/podcast/episodes/{episodeId}/audio`)**仅注册在 OpenAPI 路由**,CLI 用户无法访问 +- CLI 不发送 `X-Source` header(该 header 仅用于分析埋点,不影响鉴权) +- `content-parser` 和 `creator` 中使用 curl 的部分仍需 `LISTENHUB_API_KEY` + --- ## 一、执行模型变更 @@ -131,6 +147,8 @@ listenhub speakers list --lang zh --json **SKILL.md 参照**:以 `explainer/SKILL.md` 为模板,调整 mode 为 slides,默认跳过音频。 +> **skip-audio 语义对比**:slides 默认跳过音频(加 `--no-skip-audio` 启用旁白),explainer 默认生成音频(加 `--skip-audio` 跳过)。交互流程中 slides 问"是否需要语音旁白?(默认否)",explainer 问"输出类型?(文本脚本 / 文本+视频)"。 + ### 2.2 `/music` — AI 音乐生成 全新功能,CLI 命令 `listenhub music generate` 和 `listenhub music cover`。 @@ -143,7 +161,7 @@ listenhub speakers list --lang zh --json |--------|------|----------| | `music generate` | 文生音乐 | `--prompt`(必填), `--style`, `--title`, `--instrumental` | | `music cover` | 翻唱 | `--audio`(必填,本地文件或 URL), `--prompt`, `--style`, `--title`, `--instrumental` | -| `music list` | 列表 | `--page`, `--page-size`, `--status` | +| `music list` | 列表 | `--page`, `--page-size`, `--status`(可选值:`pending` / `generating` / `uploading` / `success` / `failed`) | | `music get ` | 详情 | — | **交互流程**: @@ -183,10 +201,11 @@ listenhub podcast create \ --mode {quick|deep|debate} \ --lang {en|zh|ja} \ --speaker "{name}" \ # 可重复,最多 2 个 + --speaker-id "{id}" \ # 可重复,直接指定 speaker inner ID --json ``` -Two-step 流程:用 `--no-wait` 提交,`listenhub creation get --json` 轮询文本,确认后用直接 API 提交音频(CLI 暂不支持 two-step 的第二步,保留 curl)。 +**Two-step 流程移除**:Podcast two-step(先生成文本再生成音频)是 OpenAPI 专属功能,路由仅注册在 `openapi-controllers/episode.ts`,且只接受 API Key 认证。CLI 用户(OAuth/JWT)无法访问这些端点。迁移后 podcast skill 仅支持 one-step 模式,原 two-step 相关的交互步骤(Generation Method 选择、draft 预览、脚本编辑)全部移除。 ### 3.2 `/tts` @@ -198,6 +217,7 @@ listenhub tts create \ --mode {smart|direct} \ --lang {en|zh|ja} \ --speaker "{name}" \ + --speaker-id "{id}" \ --json ``` @@ -210,6 +230,7 @@ listenhub explainer create \ --mode {info|story} \ --lang {en|zh|ja} \ --speaker "{name}" \ + --speaker-id "{id}" \ --image-size {2K|4K} \ --aspect-ratio {16:9|9:16|1:1} \ --style "{style}" \ @@ -217,7 +238,7 @@ listenhub explainer create \ --json ``` -注意:explainer 的 `--skip-audio` 用于"仅文本脚本"模式,映射原来的 "Text script only" 选项。 +**skip-audio 语义**:explainer 默认**生成音频**(加 `--skip-audio` 跳过)。与 slides 相反——slides 默认**跳过音频**(加 `--no-skip-audio` 启用)。Skill 交互中需明确体现这一默认值差异。 ### 3.4 `/image-gen` @@ -225,9 +246,10 @@ listenhub explainer create \ listenhub image create \ --prompt "{description}" \ --model "{model}" \ + --lang "{lang}" \ # 提示词语言提示 --aspect-ratio {16:9|9:16|1:1} \ --size {1K|2K|4K} \ - --reference "{path-or-url}" \ # 可重复,最多 5 个 + --reference "{path-or-url}" \ # 可重复,最多 5 个,支持本地文件和 URL --json ``` @@ -237,7 +259,7 @@ listenhub image create \ |-------|------| | `/asr` | 纯本地(SpeechBrain/Whisper),不调用远端 API | | `/content-parser` | CLI 暂无 content-extract 命令 | -| `/creator` | 编排层,调用子 skill;子 skill 迁移后自动受益,但 creator 内部使用的 content-parser 仍走 curl | +| `/creator` | 编排层——调用子 skill 时读取子 skill 的 SKILL.md,子 skill 迁移后 creator 自动使用新的 CLI 执行方式。creator 模板中显式引用 shared/ 文档的地方需要更新(步骤 11)。creator 直接调用 content-parser 的部分走 content-parser 内联的 curl 方式 | --- @@ -251,26 +273,28 @@ listenhub image create \ | `shared/cli-patterns.md` | CLI 执行模式:`--json` 输出解析、`--no-wait` 异步、`--timeout` 控制、错误处理 | | `shared/cli-speakers.md` | `listenhub speakers list --json` 替代 curl speaker API | -### 4.2 保留不变的文件(仍被 content-parser / creator 使用) +### 4.2 保留不变的文件 | 文件 | 原因 | |------|------| -| `shared/authentication.md` | content-parser 仍需 API Key | -| `shared/api-content-extract.md` | content-parser 使用 | -| `shared/api-speakers.md` | 保留为参考,但 CLI skill 不再直接引用 | -| `shared/config-pattern.md` | config 管理模式不变(outputMode/language/defaultSpeakers) | +| `shared/config-pattern.md` | config 管理模式不变(outputMode/language/defaultSpeakers),移除其中的 API Key Check 章节 | | `shared/output-mode.md` | 输出模式选择不变 | -| `shared/common-patterns.md` | content-parser 的轮询仍需要 | -| `shared/speaker-selection.md` | 交互流程不变,只是底层查询改用 CLI | +| `shared/speaker-selection.md` | 交互流程不变,底层查询改用 `listenhub speakers list --json` | -### 4.3 保留但不再被 CLI skill 引用的文件 +### 4.3 删除旧 API 文档 -这些 API 文档保留在仓库中作为参考,但迁移后的 SKILL.md 不再引用: +以下文件全部删除: - `shared/api-podcast.md` - `shared/api-tts.md` - `shared/api-image.md` - `shared/api-storybook.md` +- `shared/api-content-extract.md` +- `shared/api-speakers.md` +- `shared/authentication.md`(API Key 认证,被 `shared/cli-authentication.md` 替代) +- `shared/common-patterns.md`(curl 轮询模式,被 `shared/cli-patterns.md` 替代) + +**content-parser 内联处理**:`content-parser` 仍使用 curl + API Key(CLI 暂无 content-extract 命令),将其依赖的 API 信息(认证 header、端点地址、请求/响应格式、轮询模式)全部内联到 `content-parser/SKILL.md` 中,不再引用 `shared/` 目录。这样 `shared/` 可以干净地只包含 CLI 相关的文档。 --- @@ -368,13 +392,16 @@ metadata: | 1 | 新增 `shared/cli-authentication.md`、`shared/cli-patterns.md`、`shared/cli-speakers.md` | 无 | | 2 | 新增 `/slides` SKILL.md | 步骤 1 | | 3 | 新增 `/music` SKILL.md | 步骤 1 | -| 4 | 迁移 `/podcast` SKILL.md | 步骤 1 | +| 4 | 迁移 `/podcast` SKILL.md(移除 two-step,仅 one-step) | 步骤 1 | | 5 | 迁移 `/tts` SKILL.md | 步骤 1 | | 6 | 迁移 `/explainer` SKILL.md | 步骤 1 | | 7 | 迁移 `/image-gen` SKILL.md | 步骤 1 | -| 8 | 新增 `listenhub-cli/SKILL.md` + 更新 `listenhub/SKILL.md` | 步骤 2-7 | -| 9 | 更新 `shared/speaker-selection.md` 中的查询方式 | 步骤 1 | -| 10 | 更新 README | 步骤 2-8 | -| 11 | 更新 `creator/` 模板中引用(content-parser 不变,其余子 skill 引用更新) | 步骤 4-7 | - -步骤 2-7 可并行执行。 +| 8 | 内联 `/content-parser` SKILL.md(将 shared/ API 文档信息内联) | 无 | +| 9 | 新增 `listenhub-cli/SKILL.md` + 更新 `listenhub/SKILL.md` | 步骤 2-7 | +| 10 | 更新 `shared/speaker-selection.md` 中的查询方式 | 步骤 1 | +| 11 | 删除旧 `shared/` API 文档(api-*.md、authentication.md、common-patterns.md) | 步骤 8 | +| 12 | 更新 `creator/` 模板中对 shared/ 的引用 | 步骤 4-7, 11 | +| 13 | 更新 `shared/config-pattern.md`(移除 API Key Check 章节) | 步骤 1 | +| 14 | 更新 README | 步骤 2-9 | + +步骤 2-8 可并行执行。 From a6310f2377b80b21455638b9c42f0d30549f5d86 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:12:38 +0800 Subject: [PATCH 03/14] docs: fix branding and listenhub skill sync - Keep project name as "ListenHub Skills" (no rename) - listenhub/SKILL.md becomes identical to listenhub-cli/SKILL.md (not deprecated, just different name field) - Update implementation steps accordingly --- docs/specs/cli-migration.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/docs/specs/cli-migration.md b/docs/specs/cli-migration.md index 39891fc..c36bae7 100644 --- a/docs/specs/cli-migration.md +++ b/docs/specs/cli-migration.md @@ -10,8 +10,7 @@ ListenHub CLI (`@marswave/listenhub-cli`) 已发布,提供完整的命令行 1. 新增 `slides` 和 `music` 两个 skill 2. 所有 skill 从 curl/API 调用迁移到 `listenhub` CLI 调用 -3. 项目品牌从 "ListenHub Skills" 升级为 "ListenHub CLI Skills" -4. 新增 `listenhub-cli` 伞型 skill,替代已废弃的 `listenhub` skill +3. 新增 `listenhub-cli` 伞型 skill,同时将已废弃的 `listenhub` skill 恢复为与 `listenhub-cli` 完全一致的内容 ## 非目标 @@ -298,15 +297,17 @@ listenhub image create \ --- -## 五、`listenhub-cli` 伞型 Skill +## 五、`listenhub-cli` 伞型 Skill + `listenhub` 同步 -新增 `listenhub-cli/SKILL.md`,替代已废弃的 `listenhub/SKILL.md`。 +新增 `listenhub-cli/SKILL.md`,同时将 `listenhub/SKILL.md` 更新为**完全一致**的内容(不再是废弃状态)。 + +两个 skill 内容相同,仅 frontmatter 的 `name` 字段不同(`listenhub-cli` vs `listenhub`)。这样无论用户安装的是哪个 skill name,都能获得同样的路由能力。 **作用**:当用户触发通用 ListenHub 关键词时,路由到具体的子 skill。 ```yaml --- -name: listenhub-cli +name: listenhub-cli # listenhub/SKILL.md 中为 name: listenhub description: | ListenHub CLI skills 入口。当用户触发任何 ListenHub 相关操作时路由到对应 skill。 触发词: "make a podcast", "explainer video", "read aloud", "TTS", @@ -327,8 +328,6 @@ description: | | 音乐 | `/music` | | 提取 URL 内容 | `/content-parser` | -同时更新已废弃的 `listenhub/SKILL.md`,在路由表中补充 slides 和 music。 - --- ## 六、SKILL.md 元数据变更 @@ -360,7 +359,7 @@ metadata: ## 七、README 更新 -- `README.md` / `README.zh.md` 中的项目名称升级为 "ListenHub CLI Skills" +- `README.md` / `README.zh.md` 项目名称保持 "ListenHub Skills" 不变 - 安装说明新增 CLI 安装步骤:`npm install -g @marswave/listenhub-cli` - Skill 列表补充 slides 和 music - 认证说明从 API Key 改为 OAuth @@ -397,11 +396,11 @@ metadata: | 6 | 迁移 `/explainer` SKILL.md | 步骤 1 | | 7 | 迁移 `/image-gen` SKILL.md | 步骤 1 | | 8 | 内联 `/content-parser` SKILL.md(将 shared/ API 文档信息内联) | 无 | -| 9 | 新增 `listenhub-cli/SKILL.md` + 更新 `listenhub/SKILL.md` | 步骤 2-7 | +| 9 | 新增 `listenhub-cli/SKILL.md` + 更新 `listenhub/SKILL.md` 为一致内容 | 步骤 2-7 | | 10 | 更新 `shared/speaker-selection.md` 中的查询方式 | 步骤 1 | | 11 | 删除旧 `shared/` API 文档(api-*.md、authentication.md、common-patterns.md) | 步骤 8 | | 12 | 更新 `creator/` 模板中对 shared/ 的引用 | 步骤 4-7, 11 | | 13 | 更新 `shared/config-pattern.md`(移除 API Key Check 章节) | 步骤 1 | -| 14 | 更新 README | 步骤 2-9 | +| 14 | 更新 README(品牌不变,补充 CLI 安装和新 skill) | 步骤 2-9 | 步骤 2-8 可并行执行。 From 724e6bae6af1b64605afcb4bd2eea20b900c9736 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:27:28 +0800 Subject: [PATCH 04/14] docs: add CLI migration implementation plan 14-task plan covering shared CLI docs, new slides/music skills, podcast/tts/explainer/image-gen migration, content-parser inlining, umbrella skill, shared/ cleanup, creator update, and README update. Part of marswaveai/listenhub-ralph#44 --- docs/plans/2026-04-08-cli-migration.md | 1442 ++++++++++++++++++++++++ 1 file changed, 1442 insertions(+) create mode 100644 docs/plans/2026-04-08-cli-migration.md diff --git a/docs/plans/2026-04-08-cli-migration.md b/docs/plans/2026-04-08-cli-migration.md new file mode 100644 index 0000000..31f2bb2 --- /dev/null +++ b/docs/plans/2026-04-08-cli-migration.md @@ -0,0 +1,1442 @@ +# Skills CLI Migration Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Migrate all ListenHub skills from curl/API to CLI commands, add slides and music skills, and clean up shared/ docs. + +**Architecture:** Replace curl + API Key + polling loops with `listenhub` CLI commands that handle auth (OAuth), polling, and JSON output natively. New shared/ CLI docs replace old API docs. content-parser stays curl-based with inlined API info. + +**Tech Stack:** Markdown (SKILL.md files), ListenHub CLI (`@marswave/listenhub-cli`), bash (for CLI invocations in skill workflows) + +**Spec:** `docs/specs/cli-migration.md` + +--- + +## File Map + +### New files + +| File | Responsibility | +|------|---------------| +| `shared/cli-authentication.md` | CLI install check + `listenhub auth login/status` | +| `shared/cli-patterns.md` | CLI execution: `--json`, `--no-wait`, `--timeout`, error handling | +| `shared/cli-speakers.md` | `listenhub speakers list --json` speaker query | +| `slides/SKILL.md` | Slides generation skill (storybook mode=slides) | +| `music/SKILL.md` | Music generation + cover skill | +| `listenhub-cli/SKILL.md` | Umbrella router skill | + +### Modified files + +| File | Change | +|------|--------| +| `podcast/SKILL.md` | curl → CLI, remove two-step, remove API Key | +| `tts/SKILL.md` | curl → CLI, remove API Key | +| `explainer/SKILL.md` | curl → CLI, remove API Key | +| `image-gen/SKILL.md` | curl → CLI, remove API Key | +| `content-parser/SKILL.md` | Inline API docs (auth, endpoints, polling) | +| `listenhub/SKILL.md` | Undeprecat, match listenhub-cli content | +| `shared/speaker-selection.md` | Speaker fetch via CLI instead of curl | +| `shared/config-pattern.md` | Remove API Key Check section, add CLI Auth Check | +| `creator/SKILL.md` | Update shared/ references | +| `creator/templates/narration/template.md` | Update speaker-selection + TTS references | +| `creator/templates/wechat/template.md` | Update api-image.md reference | +| `creator/templates/xiaohongshu/template.md` | Update api-image.md reference | +| `README.md` | Add slides, music, CLI install, OAuth auth | +| `README.zh.md` | Same changes in Chinese | + +### Deleted files + +| File | Reason | +|------|--------| +| `shared/api-podcast.md` | Replaced by CLI | +| `shared/api-tts.md` | Replaced by CLI | +| `shared/api-image.md` | Replaced by CLI | +| `shared/api-storybook.md` | Replaced by CLI | +| `shared/api-content-extract.md` | Inlined into content-parser | +| `shared/api-speakers.md` | Replaced by CLI | +| `shared/authentication.md` | Replaced by cli-authentication.md | +| `shared/common-patterns.md` | Replaced by cli-patterns.md | +| `listenhub/DEPRECATED.md` | No longer deprecated | + +--- + +## Task 1: Create shared CLI docs + +**Files:** +- Create: `shared/cli-authentication.md` +- Create: `shared/cli-patterns.md` +- Create: `shared/cli-speakers.md` + +- [ ] **Step 1: Write `shared/cli-authentication.md`** + +```markdown +# CLI Authentication + +## Prerequisites + +ListenHub CLI must be installed: + +```bash +npm install -g @marswave/listenhub-cli +``` + +Requires Node.js >= 20. + +## Auth Check + +Run this **before Step 0** in every CLI-based skill. + +```bash +if ! command -v listenhub &>/dev/null; then + echo "MISSING_CLI" +else + AUTH=$(listenhub auth status --json 2>/dev/null) + echo "$AUTH" | jq -r '.authenticated // false' +fi +``` + +**If `true`**: proceed to Step 0 silently. + +**If `false` or `MISSING_CLI`**: run the interactive setup below. + +### Interactive Setup + +1. If CLI not installed, tell the user: + > ListenHub CLI 未安装。请运行: + > ```bash + > npm install -g @marswave/listenhub-cli + > ``` + +2. If CLI installed but not logged in, tell the user: + > 请先登录 ListenHub: + > ```bash + > listenhub auth login + > ``` + > 浏览器会自动打开授权页面。 + +3. Wait for the user to confirm login is complete. + +4. Verify: `listenhub auth status --json` → `authenticated: true` + +5. **Continue** — proceed to Step 0 and the skill's Interaction Flow. + +## Security Notes + +- Credentials stored at `~/.config/listenhub/credentials.json` (mode 0600) +- Tokens auto-refresh before expiry +- Never log or display tokens in output +``` + +- [ ] **Step 2: Write `shared/cli-patterns.md`** + +```markdown +# CLI Patterns + +Reusable patterns for all skills that use the `listenhub` CLI. + + +**Language Adaptation**: Always respond in the user's language. Chinese input → Chinese output. English input → English output. Mixed → follow dominant language. This applies to all UI text, questions, confirmations, and error messages. + + +## Command Execution + +All generation commands follow the same pattern: + +```bash +listenhub create [options] --json +``` + +The CLI handles polling internally — it submits the task and waits until completion by default. No background polling loops needed. + +### Synchronous (default) + +```bash +RESULT=$(listenhub podcast create --query "topic" --mode quick --lang zh --json) +``` + +The command blocks until the task completes, then prints the JSON result to stdout. Use this for most cases. + +### Asynchronous (--no-wait) + +```bash +RESULT=$(listenhub podcast create --query "topic" --no-wait --json) +ID=$(echo "$RESULT" | jq -r '.episodeId') +# Later check status: +listenhub creation get "$ID" --json +``` + +Use `--no-wait` only when you need to do work between submission and completion. + +### Timeout + +Each command has a default timeout. Override with `--timeout `: + +| Command | Default timeout | +|---------|----------------| +| `podcast create` | 300s | +| `tts create` | 300s | +| `explainer create` | 300s | +| `slides create` | 300s | +| `image create` | 120s | +| `music generate` | 600s | +| `music cover` | 600s | + +### Background Execution + +For long-running commands (music especially), use Bash `run_in_background: true` with `timeout: 660000`: + +```bash +listenhub music generate --prompt "..." --style "..." --json +``` + +You will be notified when the command completes. + +## JSON Output + +All commands with `--json` output structured JSON. Parse with `jq`: + +```bash +RESULT=$(listenhub podcast create --query "topic" --json) +AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl') +DURATION=$(echo "$RESULT" | jq -r '.audioDuration') +CREDITS=$(echo "$RESULT" | jq -r '.credits') +``` + +## Error Handling + +CLI exits with non-zero codes on failure: + +| Exit code | Meaning | +|-----------|---------| +| 0 | Success | +| 1 | General error (bad params, API error) | +| 2 | Auth error (not logged in, token expired) | +| 3 | Timeout | + +On error with `--json`, stderr contains the error message. Handle: + +```bash +RESULT=$(listenhub podcast create --query "topic" --json 2>/tmp/lh-err.txt) +if [ $? -ne 0 ]; then + ERROR=$(cat /tmp/lh-err.txt) + # Report error to user +fi +``` + +### Common Errors + +| Error | Action | +|-------|--------| +| "Not authenticated" | Run `listenhub auth login` | +| "Insufficient credits" | Inform user to recharge at listenhub.ai | +| "Rate limited" | CLI retries automatically (2 retries on 429) | +| Timeout | Increase `--timeout` or use `--no-wait` + poll | + +## Interactive Parameter Collection + +Same as before — use `AskUserQuestion` tool for enumerable parameters, free text for topics/prompts. One question at a time. Confirm before executing. + +## Long Text Input + +For long content, write to a temp file and use shell substitution: + +```bash +cat > /tmp/lh-content.txt << 'EOF' +Long text content here... +EOF + +listenhub tts create --text "$(cat /tmp/lh-content.txt)" --json +rm /tmp/lh-content.txt +``` +``` + +- [ ] **Step 3: Write `shared/cli-speakers.md`** + +```markdown +# CLI Speaker Query + +## Listing Speakers + +```bash +listenhub speakers list --json +listenhub speakers list --lang zh --json +listenhub speakers list --lang en --json +``` + +Returns JSON array of speaker objects. Parse with `jq`: + +```bash +SPEAKERS=$(listenhub speakers list --lang zh --json) +echo "$SPEAKERS" | jq -r '.[] | "\(.name)\t\(.speakerId)\t\(.gender)"' +``` + +## Speaker Selection by Name + +Use `--speaker "name"` in create commands: + +```bash +listenhub podcast create --query "topic" --speaker "原野" --json +listenhub tts create --text "hello" --speaker "Mars" --json +``` + +For exact ID matching, use `--speaker-id`: + +```bash +listenhub podcast create --query "topic" --speaker-id "CN-Man-Beijing-V2" --json +``` + +## Multi-Speaker Commands + +Podcast supports up to 2 speakers (repeat the flag): + +```bash +listenhub podcast create --query "topic" --speaker "原野" --speaker "高晴" --json +``` + +## Integration with speaker-selection.md + +The interaction flow in `shared/speaker-selection.md` remains the same — present text table, accept free-text input. The only change is the underlying query: + +- **Before**: `curl -sS "https://api.marswave.ai/openapi/v1/speakers/list?language=zh" -H "Authorization: Bearer $LISTENHUB_API_KEY" -H "X-Source: skills"` +- **After**: `listenhub speakers list --lang zh --json` +``` + +- [ ] **Step 4: Commit** + +```bash +git add shared/cli-authentication.md shared/cli-patterns.md shared/cli-speakers.md +git commit -m "feat: add shared CLI docs for authentication, patterns, and speakers" +``` + +--- + +## Task 2: Create `/slides` skill + +**Files:** +- Create: `slides/SKILL.md` +- Reference: `explainer/SKILL.md` (template), `shared/cli-authentication.md`, `shared/cli-patterns.md` + +- [ ] **Step 1: Write `slides/SKILL.md`** + +Use `explainer/SKILL.md` as template. Key differences: +- mode = slides (not info/story) +- Default: skip audio (`--no-skip-audio` to enable narration) +- Interaction asks "need narration?" (default no) instead of "output type?" +- CLI command: `listenhub slides create` + +Full content: + +```markdown +--- +name: slides +description: | + Create slide decks from topics, URLs, or text. Triggers on: "幻灯片", "PPT", + "slides", "slide deck", "做幻灯片", "create slides", "presentation". +metadata: + openclaw: + emoji: "📊" + requires: + bin: ["listenhub"] + primaryBin: "listenhub" +--- + +## When to Use + +- User wants to create a slide deck or presentation +- User asks for "slides", "PPT", "幻灯片", "presentation" +- User wants visual content pages from a topic + +## When NOT to Use + +- User wants a narrated video (use `/explainer`) +- User wants audio-only content (use `/podcast` or `/tts`) +- User wants to generate a standalone image (use `/image-gen`) + +## Purpose + +Generate slide decks with AI-generated visuals from topics, URLs, or text. Optionally add voice narration. Ideal for presentations, teaching materials, and visual summaries. + +## Hard Constraints + +- Always check CLI auth following `shared/cli-authentication.md` +- Follow `shared/cli-patterns.md` for command execution and error handling +- Always read config following `shared/config-pattern.md` before any interaction +- Follow `shared/speaker-selection.md` for speaker selection when narration is enabled +- Slides use exactly 1 speaker (when narration is enabled) +- Never save files to `~/Downloads/` or `.listenhub/` — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming) + + +Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation command until the user has explicitly confirmed. + + +## Step -1: CLI Auth Check + +Follow `shared/cli-authentication.md` § Auth Check. If CLI is not installed or not logged in, guide the user through setup. + +## Step 0: Config Setup + +Follow `shared/config-pattern.md` Step 0 (Zero-Question Boot). + +**If file doesn't exist** — silently create with defaults and proceed: +```bash +mkdir -p ".listenhub/slides" +echo '{"outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/slides/config.json" +CONFIG_PATH=".listenhub/slides/config.json" +CONFIG=$(cat "$CONFIG_PATH") +``` +**Do NOT ask any setup questions.** Proceed directly to the Interaction Flow. + +**If file exists** — read config silently and proceed: +```bash +CONFIG_PATH=".listenhub/slides/config.json" +[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/slides/config.json" +CONFIG=$(cat "$CONFIG_PATH") +``` + +### Setup Flow (user-initiated reconfigure only) + +Only run when the user explicitly asks to reconfigure. Display current settings: +``` +当前配置 (slides): + 输出方式:{inline / download / both} + 语言偏好:{zh / en / 未设置} + 默认主播:{speakerName / 使用内置默认} +``` + +Then ask: + +1. **outputMode**: Follow `shared/output-mode.md` § Setup Flow Question. + +2. **Language** (optional): "默认语言?" + - "中文 (zh)" + - "English (en)" + - "每次手动选择" → keep `null` + +After collecting answers, save immediately: +```bash +NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}') +echo "$NEW_CONFIG" > "$CONFIG_PATH" +CONFIG=$(cat "$CONFIG_PATH") +``` + +## Interaction Flow + +### Step 1: Topic / Content + +Free text input. Ask the user: + +> What would you like to create slides about? + +Accept: topic description, text content, or concept. + +### Step 2: Language + +If `config.language` is set, pre-fill and show in summary — skip this question. +Otherwise, detect from the user's input language: +- Chinese input → `zh` +- English input → `en` + +Show in the confirmation summary. Never ask explicitly. + +### Step 3: Narration + +``` +Question: "需要语音旁白吗?" +Options: + - "不需要(默认)" — Skip audio, slides only + - "需要" — Generate narration with speaker +``` + +Default: no narration (slides default skips audio). + +### Step 4: Speaker Selection (only if narration enabled) + +Follow `shared/speaker-selection.md`: +- If `config.defaultSpeakers.{language}` is set → use saved speaker silently +- If not set → use **built-in default** from `shared/speaker-selection.md` for the language +- Show the speaker in the confirmation summary — user can change from there +- Only show the full speaker list if the user explicitly asks to change voice + +Only 1 speaker is supported. + +### Step 5: Visual Options (optional) + +``` +Question: "图片尺寸?" +Options: + - "2K(推荐)" — 2K resolution + - "4K" — Ultra high quality +``` + +``` +Question: "宽高比?" +Options: + - "16:9(推荐)" — Landscape, presentation + - "9:16" — Portrait + - "1:1" — Square +``` + +Visual style: ask only if user mentions a specific style. Otherwise skip. + +### Step 6: Confirm & Generate + +Summarize all choices: + +``` +Ready to generate slides: + + Topic: {topic} + Language: {language} + Narration: {yes (speaker name) / no} + Resolution: {2K / 4K} + Aspect ratio: {ratio} + Style: {style / default} + + Proceed? +``` + +Wait for explicit confirmation. + +## Workflow + +1. **Build CLI command**: + ```bash + listenhub slides create \ + --query "{topic}" \ + --lang {en|zh|ja} \ + --image-size {2K|4K} \ + --aspect-ratio {16:9|9:16|1:1} \ + --json + ``` + + If narration enabled, add: `--no-skip-audio --speaker "{name}"` + If style specified, add: `--style "{style}"` + If source URLs provided, add: `--source-url "{url}"` (repeatable) + +2. **Execute**: Run the CLI command. It handles polling internally. + Use `run_in_background: true` with `timeout: 360000`: + + ```bash + listenhub slides create --query "{topic}" --lang zh --json + ``` + +3. When notified, **parse and present result**: + + Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. + + **`inline` or `both`**: Display links. + + Present: + ``` + 幻灯片已生成! + + 在线查看:https://listenhub.ai/app/slides/{episodeId} + 页数:{pageCount} + 消耗积分:{credits} + ``` + + **`download` or `both`**: Generate a topic slug following `shared/config-pattern.md` § Artifact Naming. + - Create `{slug}-slides/` folder (dedup if exists) + - Write `script.md` inside (narration text per page) + - If narration audio exists, download: `curl -sS -o "{slug}-slides/audio.mp3" "{audioUrl}"` + - Present the save path. + +### After Successful Generation + +Update config: + +```bash +NEW_CONFIG=$(echo "$CONFIG" | jq \ + --arg lang "{language}" \ + '. + {"language": $lang}') +echo "$NEW_CONFIG" > "$CONFIG_PATH" +``` + +If narration was used with a new speaker, also update `defaultSpeakers`. + +**Estimated time**: 2-4 minutes. + +## Composability + +- **Invokes**: speakers API via CLI (for speaker selection when narration enabled) +- **Invoked by**: none currently + +## Example + +**User**: "Create slides about quantum computing basics" + +**Agent workflow**: +1. Topic: "quantum computing basics" +2. Language: en (detected from input) +3. Narration: no (default) +4. Resolution: 2K, Ratio: 16:9 (defaults) +5. Confirm → proceed + +```bash +listenhub slides create \ + --query "quantum computing basics" \ + --lang en \ + --image-size 2K \ + --aspect-ratio 16:9 \ + --json +``` + +Poll until complete, then present the result. +``` + +- [ ] **Step 2: Commit** + +```bash +git add slides/SKILL.md +git commit -m "feat: add /slides skill for slide deck generation via CLI" +``` + +--- + +## Task 3: Create `/music` skill + +**Files:** +- Create: `music/SKILL.md` +- Reference: `shared/cli-authentication.md`, `shared/cli-patterns.md` + +- [ ] **Step 1: Write `music/SKILL.md`** + +```markdown +--- +name: music +description: | + Generate AI music or create covers from reference audio. Triggers on: "音乐", + "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", + "create a song", "做一首歌". +metadata: + openclaw: + emoji: "🎵" + requires: + bin: ["listenhub"] + primaryBin: "listenhub" +--- + +## When to Use + +- User wants to generate original music from a text description +- User wants to create a cover from reference audio +- User says "music", "generate music", "create a song" +- User says "音乐", "生成音乐", "翻唱", "作曲" + +## When NOT to Use + +- User wants text-to-speech (use `/tts`) +- User wants a podcast discussion (use `/podcast`) +- User wants to transcribe audio to text (use `/asr`) + +## Purpose + +Generate original AI music from text prompts, or create cover versions from reference audio. Supports style control, titles, and instrumental-only generation. + +## Hard Constraints + +- Always check CLI auth following `shared/cli-authentication.md` +- Follow `shared/cli-patterns.md` for command execution and error handling +- Always read config following `shared/config-pattern.md` before any interaction +- Never save files to `~/Downloads/` or `.listenhub/` — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming) +- Music generation has longer timeouts (default 600s) — always use `run_in_background: true` + + +Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation command until the user has explicitly confirmed. + + +## Step -1: CLI Auth Check + +Follow `shared/cli-authentication.md` § Auth Check. + +## Step 0: Config Setup + +Follow `shared/config-pattern.md` Step 0 (Zero-Question Boot). + +**If file doesn't exist** — silently create with defaults and proceed: +```bash +mkdir -p ".listenhub/music" +echo '{"outputMode":"download","language":null}' > ".listenhub/music/config.json" +CONFIG_PATH=".listenhub/music/config.json" +CONFIG=$(cat "$CONFIG_PATH") +``` +**Do NOT ask any setup questions.** Proceed directly to the Interaction Flow. + +**If file exists** — read config silently and proceed: +```bash +CONFIG_PATH=".listenhub/music/config.json" +[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/music/config.json" +CONFIG=$(cat "$CONFIG_PATH") +``` + +### Setup Flow (user-initiated reconfigure only) + +Only run when the user explicitly asks to reconfigure. Display current settings: +``` +当前配置 (music): + 输出方式:{inline / download / both} +``` + +Then ask: + +1. **outputMode**: Follow `shared/output-mode.md` § Setup Flow Question. + +Save immediately: +```bash +NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}') +echo "$NEW_CONFIG" > "$CONFIG_PATH" +CONFIG=$(cat "$CONFIG_PATH") +``` + +## Interaction Flow + +### Step 1: Creation Mode + +``` +Question: "创作模式?" +Options: + - "原创 — 从文字描述生成音乐" — Generate original music + - "翻唱 — 从参考音频创建翻唱" — Create cover from reference audio +``` + +### Step 2a: Music Description (generate mode) + +Free text input. Ask: + +> 描述你想要的音乐(风格、情绪、场景等) + +Example: "轻松的 lo-fi 节拍,适合深夜学习" + +### Step 2b: Reference Audio (cover mode) + +Free text input. Ask: + +> 提供参考音频文件路径或 URL + +Accept: local file path (mp3, wav, flac, m4a, ogg, aac; max 20MB) or URL. + +Optionally ask for a description prompt to guide the cover style. + +### Step 3: Style (optional) + +Free text input. Ask: + +> 音乐风格?(可选,直接回车跳过) + +Examples: "lo-fi", "EDM", "classical piano", "jazz" + +### Step 4: Title (optional) + +Free text input. Ask: + +> 曲名?(可选,不填则自动生成) + +### Step 5: Instrumental + +``` +Question: "纯音乐(无人声)?" +Options: + - "有人声(默认)" — Include vocals + - "纯音乐" — Instrumental only, no vocals +``` + +### Step 6: Confirm & Generate + +Summarize all choices: + +**Generate mode:** +``` +Ready to generate music: + + Mode: Original + Prompt: {prompt} + Style: {style / not set} + Title: {title / auto} + Instrumental: {yes / no} + + Proceed? +``` + +**Cover mode:** +``` +Ready to create cover: + + Mode: Cover + Reference: {audio path or URL} + Prompt: {prompt / not set} + Style: {style / not set} + Title: {title / auto} + Instrumental: {yes / no} + + Proceed? +``` + +Wait for explicit confirmation. + +## Workflow + +### Generate Mode + +1. **Build CLI command**: + ```bash + listenhub music generate \ + --prompt "{description}" \ + --json + ``` + If style set: add `--style "{style}"` + If title set: add `--title "{title}"` + If instrumental: add `--instrumental` + +2. **Execute** with `run_in_background: true` and `timeout: 660000` + +3. When notified, **parse and present result**: + + Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. + + **`inline` or `both`**: Display audio URL as clickable link. + + Present: + ``` + 音乐已生成! + + 在线收听:{audioUrl} + 标题:{title} + 时长:{duration}s + 消耗积分:{credits} + ``` + + **`download` or `both`**: Generate a topic slug following `shared/config-pattern.md` § Artifact Naming. + ```bash + SLUG="{title-slug}" # e.g. "late-night-lofi" + NAME="${SLUG}.mp3" + BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2 + while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done + curl -sS -o "$NAME" "{audioUrl}" + ``` + Present: + ``` + 已保存到当前目录: + {NAME} + ``` + +### Cover Mode + +1. **Build CLI command**: + ```bash + listenhub music cover \ + --audio "{path-or-url}" \ + --json + ``` + If prompt set: add `--prompt "{description}"` + If style set: add `--style "{style}"` + If title set: add `--title "{title}"` + If instrumental: add `--instrumental` + +2. Same execution and presentation as generate mode. + +**Estimated times**: 3-8 minutes (music generation is slower). + +## Composability + +- **Invokes**: nothing (direct CLI call) +- **Invoked by**: none currently + +## Example + +**User**: "生成一段轻松的 lo-fi 音乐" + +**Agent workflow**: +1. Mode: generate (detected from "生成") +2. Prompt: "轻松的 lo-fi 音乐" +3. Style: "lo-fi" (inferred from prompt, confirm with user) +4. Title: auto +5. Instrumental: no (default) +6. Confirm → proceed + +```bash +listenhub music generate \ + --prompt "轻松的lo-fi音乐" \ + --style "lo-fi" \ + --json +``` + +Poll until complete, then present result. +``` + +- [ ] **Step 2: Commit** + +```bash +git add music/SKILL.md +git commit -m "feat: add /music skill for AI music generation and covers via CLI" +``` + +--- + +## Task 4: Migrate `/podcast` to CLI + +**Files:** +- Modify: `podcast/SKILL.md` + +- [ ] **Step 1: Rewrite `podcast/SKILL.md`** + +Key changes from current file: +1. **Frontmatter**: `requires.env` → `requires.bin`, `primaryEnv` → `primaryBin` +2. **Hard Constraints**: Remove "No shell scripts. Construct curl commands from API reference" → "Always check CLI auth following `shared/cli-authentication.md`" + "Follow `shared/cli-patterns.md`" +3. **Step -1**: API Key Check → CLI Auth Check +4. **Step 0**: Same config pattern (unchanged) +5. **Interaction Flow**: Remove Step 6 (Generation Method — two-step is gone). Keep Steps 1-5 and 7 (renumber 7→6). +6. **Workflow**: Remove Two-Step Generation entirely. One-Step becomes the only workflow: + - Replace curl POST with `listenhub podcast create --query ... --json` + - Replace background polling loop with CLI built-in polling (run CLI with `run_in_background: true`) + - Replace `jq` parsing of curl response with `jq` parsing of CLI JSON output +7. **API Reference section**: Replace `shared/api-podcast.md` → `shared/cli-patterns.md`, `shared/api-speakers.md` → `shared/cli-speakers.md`, `shared/authentication.md` → `shared/cli-authentication.md`, `shared/common-patterns.md` → `shared/cli-patterns.md` +8. **Example**: Replace curl example with CLI command + +The full rewrite should follow the exact same structure as the current file but with CLI commands. The interaction flow (AskUserQuestion steps) stays identical except removing the Generation Method step. + +- [ ] **Step 2: Verify no references to deleted files** + +```bash +grep -n "shared/api-\|shared/authentication\|shared/common-patterns\|LISTENHUB_API_KEY" podcast/SKILL.md +``` + +Expected: no matches. + +- [ ] **Step 3: Commit** + +```bash +git add podcast/SKILL.md +git commit -m "feat: migrate /podcast to CLI, remove two-step workflow" +``` + +--- + +## Task 5: Migrate `/tts` to CLI + +**Files:** +- Modify: `tts/SKILL.md` + +- [ ] **Step 1: Rewrite `tts/SKILL.md`** + +Key changes: +1. **Frontmatter**: `requires.env` → `requires.bin`, `primaryEnv` → `primaryBin` +2. **Hard Constraints**: curl refs → CLI refs +3. **Step -1**: API Key → CLI Auth +4. **Quick Mode**: Replace curl `POST /v1/tts` with `listenhub tts create --text "..." --mode direct --speaker "..." --json`. Note: TTS Quick maps to CLI `--mode direct`, Script maps to `--mode smart`. +5. **Script Mode**: Replace curl `POST /v1/speech` with `listenhub tts create --text "..." --mode smart --speaker "..." --json`. For multi-speaker scripts, the CLI handles this with a single `--text` containing the formatted script. +6. **Polling**: Quick mode is sync in both old (curl returns MP3 stream) and new (CLI waits). Script mode: remove background polling loop, CLI handles it. +7. **Output handling**: `inline` mode — CLI returns JSON with `audioUrl`, use that directly instead of saving to `/tmp/`. For `download`/`both` — download from `audioUrl` same as before but using URL from CLI JSON output. +8. **API Reference**: Update all shared/ references. + +- [ ] **Step 2: Verify no references to deleted files** + +```bash +grep -n "shared/api-\|shared/authentication\|shared/common-patterns\|LISTENHUB_API_KEY" tts/SKILL.md +``` + +Expected: no matches. + +- [ ] **Step 3: Commit** + +```bash +git add tts/SKILL.md +git commit -m "feat: migrate /tts to CLI commands" +``` + +--- + +## Task 6: Migrate `/explainer` to CLI + +**Files:** +- Modify: `explainer/SKILL.md` + +- [ ] **Step 1: Rewrite `explainer/SKILL.md`** + +Key changes: +1. **Frontmatter**: `requires.env` → `requires.bin`, `primaryEnv` → `primaryBin` +2. **Hard Constraints**: curl refs → CLI refs. Keep "Mode must be `info` or `story` — never `slides`" +3. **Step -1**: API Key → CLI Auth +4. **Workflow**: Replace curl `POST /storybook/episodes` + polling loop with `listenhub explainer create --query ... --mode info --json`. Replace video generation curl + polling with... the CLI handles both text and video in one command. If `--skip-audio` is passed, only text script is generated. +5. **Output**: Parse CLI JSON output instead of curl response. The `episodeId`, `audioUrl`, `videoUrl`, `credits` fields come from CLI JSON. +6. **API Reference**: Update all shared/ references. + +- [ ] **Step 2: Verify no references to deleted files** + +```bash +grep -n "shared/api-\|shared/authentication\|shared/common-patterns\|LISTENHUB_API_KEY" explainer/SKILL.md +``` + +Expected: no matches. + +- [ ] **Step 3: Commit** + +```bash +git add explainer/SKILL.md +git commit -m "feat: migrate /explainer to CLI commands" +``` + +--- + +## Task 7: Migrate `/image-gen` to CLI + +**Files:** +- Modify: `image-gen/SKILL.md` + +- [ ] **Step 1: Rewrite `image-gen/SKILL.md`** + +Key changes: +1. **Frontmatter**: `requires.env` → `requires.bin`, `primaryEnv` → `primaryBin` +2. **Hard Constraints**: curl refs → CLI refs +3. **Step -1**: API Key → CLI Auth +4. **Reference images**: CLI handles local file upload natively (`--reference ./image.png`). Remove base64 encoding logic entirely. Just pass `--reference "{path-or-url}"` (repeatable, max 5). +5. **Workflow**: Replace curl `POST /images/generation` with `listenhub image create --prompt "..." --model "..." --json`. CLI returns JSON with image data. +6. **New**: Add `--lang` flag for prompt language hint. +7. **API Reference**: Update all shared/ references. + +- [ ] **Step 2: Verify no references to deleted files** + +```bash +grep -n "shared/api-\|shared/authentication\|shared/common-patterns\|LISTENHUB_API_KEY" image-gen/SKILL.md +``` + +Expected: no matches. + +- [ ] **Step 3: Commit** + +```bash +git add image-gen/SKILL.md +git commit -m "feat: migrate /image-gen to CLI commands" +``` + +--- + +## Task 8: Inline content-parser API docs + +**Files:** +- Modify: `content-parser/SKILL.md` + +- [ ] **Step 1: Rewrite `content-parser/SKILL.md`** + +This skill stays curl-based (CLI has no content-extract command). Changes: +1. Remove all `shared/` references from Hard Constraints and API Reference sections +2. Inline the following into the SKILL.md itself: + - **Authentication** (from `shared/authentication.md`): API Key env var, base URL, required headers, curl template + - **API endpoints** (from `shared/api-content-extract.md`): POST /v1/content/extract request/response, GET /v1/content/extract/{taskId} request/response + - **Polling pattern** (from `shared/common-patterns.md`): background polling loop (5s interval, 60 polls), error handling, retry strategy + - **Config pattern**: Keep reference to `shared/config-pattern.md` (it's being retained) +3. The SKILL.md should be fully self-contained for API usage — no external shared/ references except `shared/config-pattern.md` and `shared/output-mode.md` + +Structure the inlined content as collapsed sections at the bottom: + +```markdown +## API Reference (Inlined) + +### Authentication + +[content from shared/authentication.md] + +### POST /v1/content/extract + +[content from shared/api-content-extract.md — create endpoint] + +### GET /v1/content/extract/{taskId} + +[content from shared/api-content-extract.md — poll endpoint] + +### Async Polling Pattern + +[content from shared/common-patterns.md § Async Polling, adapted for content-parser's 5s interval] + +### Error Handling + +[content from shared/common-patterns.md § Error Handling] +``` + +- [ ] **Step 2: Verify only allowed shared/ refs remain** + +```bash +grep -n "shared/" content-parser/SKILL.md +``` + +Expected: only `shared/config-pattern.md` and `shared/output-mode.md` references. + +- [ ] **Step 3: Commit** + +```bash +git add content-parser/SKILL.md +git commit -m "refactor: inline API docs into content-parser (no more shared/ deps)" +``` + +--- + +## Task 9: Create `listenhub-cli` + update `listenhub` umbrella skills + +**Files:** +- Create: `listenhub-cli/SKILL.md` +- Modify: `listenhub/SKILL.md` +- Delete: `listenhub/DEPRECATED.md` + +- [ ] **Step 1: Write `listenhub-cli/SKILL.md`** + +```markdown +--- +name: listenhub-cli +description: | + ListenHub CLI skills router. Routes to the correct skill based on user intent. + Triggers on: "make a podcast", "explainer video", "read aloud", "TTS", + "generate image", "做播客", "解说视频", "朗读", "生成图片", "幻灯片", + "slides", "音乐", "music", "generate music", "翻唱", "cover song", + "parse URL", "解析链接", "提取内容". +metadata: + openclaw: + emoji: "🎧" + requires: + bin: ["listenhub"] + primaryBin: "listenhub" +--- + +## Purpose + +This is a router skill. When users trigger a general ListenHub action, this skill identifies the intent and delegates to the appropriate specialized skill. + +## Routing Table + +| User intent | Keywords | Route to | +|-------------|----------|----------| +| Podcast | "podcast", "播客", "debate", "dialogue" | `/podcast` | +| Explainer video | "explainer", "解说视频", "tutorial video" | `/explainer` | +| Slides / PPT | "slides", "幻灯片", "PPT", "presentation" | `/slides` | +| TTS / Read aloud | "TTS", "read aloud", "朗读", "配音", "语音合成" | `/tts` | +| Image generation | "generate image", "画一张", "生成图片", "AI图" | `/image-gen` | +| Music | "music", "音乐", "生成音乐", "翻唱", "cover" | `/music` | +| Content extraction | "parse URL", "extract content", "解析链接" | `/content-parser` | +| Audio transcription | "transcribe", "ASR", "语音转文字" | `/asr` | +| Creator workflow | "创作", "写公众号", "小红书", "口播" | `/creator` | + +## How to Route + +1. Read the user's message and identify which category it falls into +2. Tell the user which skill you're routing to +3. Follow that skill's SKILL.md completely + +If the intent is ambiguous, ask the user to clarify: + +``` +Question: "What would you like to create?" +Options: + - "Podcast" — Audio discussion on a topic + - "Explainer Video" — Narrated video with AI visuals + - "Slides" — Slide deck / presentation + - "Music" — AI-generated music or cover +``` + +## Prerequisites + +Most skills require the ListenHub CLI. Check: + +```bash +listenhub auth status --json +``` + +If not installed or not logged in, guide the user: + +1. Install: `npm install -g @marswave/listenhub-cli` +2. Login: `listenhub auth login` + +Exception: `/asr` runs locally and needs no CLI or API key. +``` + +- [ ] **Step 2: Copy to `listenhub/SKILL.md` with name change** + +Copy the exact same content to `listenhub/SKILL.md`, changing only `name: listenhub-cli` → `name: listenhub` in the frontmatter. Everything else is identical. + +- [ ] **Step 3: Delete `listenhub/DEPRECATED.md`** + +```bash +rm listenhub/DEPRECATED.md +``` + +- [ ] **Step 4: Commit** + +```bash +git add listenhub-cli/SKILL.md listenhub/SKILL.md +git rm listenhub/DEPRECATED.md +git commit -m "feat: add listenhub-cli router skill, sync listenhub skill" +``` + +--- + +## Task 10: Update `shared/speaker-selection.md` + +**Files:** +- Modify: `shared/speaker-selection.md` + +- [ ] **Step 1: Update speaker fetch command** + +Replace the "Fetching Speakers" section. Change: + +```markdown +## Fetching Speakers + +Always call the speakers API before presenting options (when user requests to change voice): + +``` +GET /speakers/list?language={language} +``` +``` + +To: + +```markdown +## Fetching Speakers + +Always query the speaker list before presenting options (when user requests to change voice): + +```bash +listenhub speakers list --lang {language} --json +``` + +See `shared/cli-speakers.md` for full query patterns. +``` + +No other changes — the built-in defaults table, selection UI, and persistence logic stay the same. + +- [ ] **Step 2: Verify no curl/API Key references remain** + +```bash +grep -n "curl\|LISTENHUB_API_KEY\|Authorization" shared/speaker-selection.md +``` + +Expected: no matches. + +- [ ] **Step 3: Commit** + +```bash +git add shared/speaker-selection.md +git commit -m "refactor: update speaker-selection to use CLI query" +``` + +--- + +## Task 11: Update `shared/config-pattern.md` + +**Files:** +- Modify: `shared/config-pattern.md` + +- [ ] **Step 1: Replace API Key Check with CLI Auth Check** + +Replace the entire "## API Key Check" section (from `## API Key Check` through the end of "### Interactive Key Setup" including step 6) with: + +```markdown +## CLI Auth Check + +Run this **before Step 0** in every skill that uses the ListenHub CLI. + +Follow `shared/cli-authentication.md` § Auth Check. + +If CLI is not installed or not logged in, guide the user through setup as described in `shared/cli-authentication.md`. +``` + +No other changes to the file. + +- [ ] **Step 2: Commit** + +```bash +git add shared/config-pattern.md +git commit -m "refactor: replace API Key Check with CLI Auth Check in config-pattern" +``` + +--- + +## Task 12: Delete old shared/ API docs + +**Files:** +- Delete: `shared/api-podcast.md` +- Delete: `shared/api-tts.md` +- Delete: `shared/api-image.md` +- Delete: `shared/api-storybook.md` +- Delete: `shared/api-content-extract.md` +- Delete: `shared/api-speakers.md` +- Delete: `shared/authentication.md` +- Delete: `shared/common-patterns.md` + +- [ ] **Step 1: Verify no remaining references** + +```bash +grep -rn "shared/api-\|shared/authentication\.md\|shared/common-patterns\.md" \ + --include="*.md" \ + --exclude-dir=".git" \ + --exclude-dir="docs" \ + . +``` + +Expected: no matches outside `docs/` (specs/plans are documentation, not runtime references). + +If any matches found in SKILL.md files, fix them first before proceeding. + +- [ ] **Step 2: Delete files** + +```bash +git rm shared/api-podcast.md shared/api-tts.md shared/api-image.md \ + shared/api-storybook.md shared/api-content-extract.md shared/api-speakers.md \ + shared/authentication.md shared/common-patterns.md +``` + +- [ ] **Step 3: Commit** + +```bash +git commit -m "chore: remove old shared/ API docs (replaced by CLI + inlined)" +``` + +--- + +## Task 13: Update creator/ templates + +**Files:** +- Modify: `creator/SKILL.md` +- Modify: `creator/templates/narration/template.md` +- Modify: `creator/templates/wechat/template.md` +- Modify: `creator/templates/xiaohongshu/template.md` + +- [ ] **Step 1: Update `creator/SKILL.md` references** + +Replace all `shared/` references that point to deleted files: +- `shared/authentication.md` → `shared/cli-authentication.md` +- `shared/common-patterns.md` → `shared/cli-patterns.md` +- `shared/api-image.md` → inline the image CLI command: `listenhub image create --prompt "..." --json` +- `shared/api-content-extract.md` → reference `content-parser/SKILL.md` § API Reference (Inlined) +- `shared/api-tts.md` → inline the TTS CLI command: `listenhub tts create --text "..." --json` + +In the Hard Constraints section, change: +- "No shell scripts. Construct curl commands from the API reference files in `shared/`" → "Use `listenhub` CLI commands for image-gen and TTS. Use curl for content-parser (see `content-parser/SKILL.md` § API Reference)." + +In the API Key Check at Confirmation Gate, change: +- Check `LISTENHUB_API_KEY` → Check `listenhub auth status --json` for CLI-based calls. For content-parser calls, still check `LISTENHUB_API_KEY`. + +In the API Reference section at the bottom, update: +- `shared/authentication.md` → `shared/cli-authentication.md` +- `shared/api-image.md` → "Use `listenhub image create` (see `shared/cli-patterns.md`)" +- `shared/api-content-extract.md` → `content-parser/SKILL.md` § API Reference (Inlined) +- `shared/api-tts.md` → "Use `listenhub tts create` (see `shared/cli-patterns.md`)" +- `shared/common-patterns.md` → `shared/cli-patterns.md` + +Also update the image generation curl commands in the workflow to use CLI: +```bash +# Before: +curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" ... + +# After: +listenhub image create --prompt "{prompt}" --aspect-ratio "1:1" --size "2K" --json +``` + +And TTS commands: +```bash +# Before: +curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" ... + +# After: +listenhub tts create --text "{text}" --speaker "{speaker}" --json +``` + +- [ ] **Step 2: Update `creator/templates/narration/template.md`** + +- Replace `shared/speaker-selection.md` built-in defaults reference → keep as-is (speaker-selection.md is retained) +- Replace TTS API curl command with CLI: + ```bash + listenhub tts create --text "$(cat /tmp/lh-content.txt)" --speaker "{speaker}" --json + ``` + +- [ ] **Step 3: Update `creator/templates/wechat/template.md`** + +Replace `shared/api-image.md` reference with CLI command note: +- `(per shared/api-image.md)` → `(use listenhub image create --json)` + +- [ ] **Step 4: Update `creator/templates/xiaohongshu/template.md`** + +Same change as wechat template. + +- [ ] **Step 5: Verify no deleted shared/ references remain in creator/** + +```bash +grep -rn "shared/api-\|shared/authentication\.md\|shared/common-patterns\.md" creator/ +``` + +Expected: no matches. + +- [ ] **Step 6: Commit** + +```bash +git add creator/SKILL.md creator/templates/narration/template.md \ + creator/templates/wechat/template.md creator/templates/xiaohongshu/template.md +git commit -m "refactor: update creator/ to use CLI commands and new shared/ docs" +``` + +--- + +## Task 14: Update READMEs + +**Files:** +- Modify: `README.md` +- Modify: `README.zh.md` + +- [ ] **Step 1: Update `README.md`** + +Changes: +1. **Skills table**: Add slides and music rows: + ``` + | `/slides` | "slides", "幻灯片" | Create slide decks with AI visuals | + | `/music` | "music", "音乐" | AI music generation and covers | + ``` +2. **Setup section**: Replace API Key with CLI: + ``` + **ListenHub CLI** — Install and login: + ```bash + npm install -g @marswave/listenhub-cli + listenhub auth login + ``` + ``` +3. **Directory structure**: Add `slides/`, `music/`, `listenhub-cli/`. Change `listenhub/` description from "Deprecated" to "Router skill (alias for listenhub-cli)". +4. **Supported Inputs**: Add "Music prompts" and "Reference audio" to the list. + +- [ ] **Step 2: Update `README.zh.md`** + +Same changes as README.md but in Chinese: +1. Skills table: + ``` + | `/slides` | "幻灯片"、"slides" | 幻灯片生成 | + | `/music` | "音乐"、"music" | AI 音乐生成、翻唱 | + ``` +2. Setup: CLI install + login +3. Directory structure: add new dirs +4. Supported inputs: add music items + +- [ ] **Step 3: Commit** + +```bash +git add README.md README.zh.md +git commit -m "docs: update READMEs with slides, music, CLI auth" +``` + +--- + +## Dependency Graph + +``` +Task 1 (shared CLI docs) +├── Task 2 (slides) ────────────┐ +├── Task 3 (music) ─────────────┤ +├── Task 4 (podcast) ───────────┤ +├── Task 5 (tts) ───────────────┤ +├── Task 6 (explainer) ─────────┼── Task 9 (listenhub-cli + listenhub) ── Task 14 (README) +├── Task 7 (image-gen) ─────────┤ +├── Task 10 (speaker-selection) ┘ +├── Task 11 (config-pattern) +│ +Task 8 (content-parser inline) ─── Task 12 (delete old shared/) ── Task 13 (creator/ update) +``` + +Tasks 2-8, 10, 11 can run in parallel after Task 1. +Task 9 requires Tasks 2-7. +Task 12 requires Task 8. +Task 13 requires Tasks 4-7 and 12. +Task 14 requires Tasks 2-9. From 42b78e213634eea9abdee60dd5afb3cf019badcc Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:32:46 +0800 Subject: [PATCH 05/14] feat: add shared CLI docs for authentication, patterns, and speakers --- shared/cli-authentication.md | 49 +++++++++ shared/cli-patterns.md | 187 +++++++++++++++++++++++++++++++++++ shared/cli-speakers.md | 78 +++++++++++++++ 3 files changed, 314 insertions(+) create mode 100644 shared/cli-authentication.md create mode 100644 shared/cli-patterns.md create mode 100644 shared/cli-speakers.md diff --git a/shared/cli-authentication.md b/shared/cli-authentication.md new file mode 100644 index 0000000..a492046 --- /dev/null +++ b/shared/cli-authentication.md @@ -0,0 +1,49 @@ +# CLI Authentication + +## Prerequisites + +- **Node.js >= 20** +- **ListenHub CLI**: `npm install -g @marswave/listenhub-cli` + +## Auth Check + +Run this before any CLI operation: + +```bash +listenhub auth status --json +``` + +Parse the `.authenticated` field: + +```bash +AUTH=$(listenhub auth status --json 2>/dev/null) +AUTHED=$(echo "$AUTH" | jq -r '.authenticated // false') +``` + +### If CLI not installed + +If `listenhub` command is not found, tell the user: + +> ListenHub CLI is not installed. Please install it: +> ``` +> npm install -g @marswave/listenhub-cli +> ``` +> Requires Node.js 20 or later. + +### If not logged in + +If `.authenticated` is `false`, tell the user: + +> You're not logged in. Please run: +> ``` +> listenhub auth login +> ``` +> This will open your browser for OAuth authentication. + +Then wait for the user to complete login and re-check. + +## Security + +- Credentials are stored at `~/.config/listenhub/credentials.json` (file mode `0600`) +- Tokens refresh automatically -- no manual rotation needed +- Never log or display tokens in output diff --git a/shared/cli-patterns.md b/shared/cli-patterns.md new file mode 100644 index 0000000..27381a8 --- /dev/null +++ b/shared/cli-patterns.md @@ -0,0 +1,187 @@ +# CLI Patterns + +Reusable patterns for all skills that use the ListenHub CLI. + + +**Language Adaptation**: Always respond in the user's language. Chinese input -> Chinese output. English input -> English output. Mixed -> follow dominant language. This applies to all UI text, questions, confirmations, and error messages. + + +## Command Pattern + +```bash +listenhub create [options] --json +``` + +All creation commands follow this shape. The `--json` flag ensures machine-readable output for parsing with jq. + +## Execution Modes + +### Synchronous (default) + +The CLI blocks until the task completes and returns the final result: + +```bash +RESULT=$(listenhub podcast create --topic "AI trends" --lang zh --json) +echo "$RESULT" | jq -r '.audioUrl' +``` + +This is the simplest approach. Use it when the expected duration is short or when you want to wait for the result. + +### Async with `--no-wait` + +Returns a creation ID immediately without waiting for completion: + +```bash +RESULT=$(listenhub podcast create --topic "AI trends" --lang zh --no-wait --json) +ID=$(echo "$RESULT" | jq -r '.id') +echo "Submitted: $ID" +``` + +Check status later: + +```bash +listenhub creation get "$ID" --json +``` + +The `.status` field will be one of: `processing`, `completed`, `failed`. + +### Timeout Reference + +| Content type | Default timeout | +|-------------|----------------| +| podcast | 300s | +| tts | 300s | +| explainer | 300s | +| slides | 300s | +| image | 120s | +| music | 600s | + +### Background Execution + +For long-running commands, use the Bash tool's `run_in_background: true` parameter. This keeps the terminal responsive while the CLI waits for completion. + +**Two-step pattern:** + +1. **Submit (foreground)** with `--no-wait` to get the ID. Tell the user the task is submitted. +2. **Poll (background)** with `run_in_background: true`: + +```bash +# Run with run_in_background: true +ID="" +for i in $(seq 1 60); do + RESULT=$(listenhub creation get "$ID" --json 2>/dev/null) + STATUS=$(echo "$RESULT" | jq -r '.status // "processing"') + + case "$STATUS" in + completed) echo "$RESULT"; exit 0 ;; + failed) echo "FAILED: $RESULT" >&2; exit 1 ;; + *) sleep 10 ;; + esac +done +echo "TIMEOUT" >&2; exit 2 +``` + +When the background task finishes, you will be notified with the output. Parse the result and present it to the user. If the task failed or timed out, report the error. + +## JSON Output Parsing + +All CLI commands with `--json` produce structured JSON. Parse with jq: + +```bash +RESULT=$(listenhub tts create --text "Hello" --lang en --json) +AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl') +STATUS=$(echo "$RESULT" | jq -r '.status') +``` + +## Error Handling + +### Exit Codes + +| Code | Meaning | Action | +|------|---------|--------| +| 0 | Success | Parse JSON output | +| 1 | General error | Check stderr for details | +| 2 | Auth error | Run `listenhub auth login` | +| 3 | Timeout | Retry or use `--no-wait` | + +### Common Errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `command not found: listenhub` | CLI not installed | `npm install -g @marswave/listenhub-cli` | +| `Not authenticated` | Not logged in | `listenhub auth login` | +| `Insufficient credits` | Account has no credits | Tell user to recharge at listenhub.ai | +| `Rate limited` | Too many requests | Wait and retry | +| `Invalid speaker` | Speaker ID not found | Re-query speakers list | +| `Request timeout` | Generation took too long | Retry or use `--no-wait` for async | + +### Error Checking Pattern + +```bash +RESULT=$(listenhub podcast create --topic "AI" --lang zh --json 2>/tmp/lh-err) +EXIT_CODE=$? + +if [ $EXIT_CODE -ne 0 ]; then + ERROR=$(cat /tmp/lh-err) + case $EXIT_CODE in + 2) echo "Auth error: run 'listenhub auth login'" ;; + 3) echo "Timeout: try --no-wait" ;; + *) echo "Error: $ERROR" ;; + esac + rm -f /tmp/lh-err + # Handle error appropriately +fi +rm -f /tmp/lh-err +``` + +## Interactive Parameter Collection + +Skills must use the **AskUserQuestion tool** for all enumerable parameters, following a **conversational, step-by-step** approach. This renders an interactive picker in the terminal that users can navigate with arrow keys. + +### Conversation Behavior (mandatory) + +1. **One question at a time.** Ask a single question, then STOP and wait for the user's answer before proceeding to the next step. Do not batch multiple steps into one message unless the parameters are explicitly independent (e.g., resolution + aspect ratio). +2. **Wait for the answer.** Never assume a default and skip ahead. If the user hasn't answered, do not proceed. +3. **Confirm before executing.** After all parameters are collected, summarize the choices and ask the user to confirm before running any CLI command. This is the final gate. +4. **Be ready to go back.** If the user changes their mind or says something doesn't look right, revise and re-ask instead of pushing forward. + +### How to Ask + +**Always use the AskUserQuestion tool** -- do NOT print questions as plain text. Each step's `Question` and `Options` map directly to AskUserQuestion parameters: + +``` +Step definition in SKILL.md: -> AskUserQuestion tool call: + +Question: "What language?" -> question: "What language?" + - "Chinese (zh)" -- Mandarin -> options: [{label: "Chinese (zh)", description: "Mandarin"} + - "English (en)" -- English -> {label: "English (en)", description: "English"}] +``` + +For **free text** steps (topic, URL, prompt), just ask the question in a normal text message and wait for the user to type their answer. + +### Parameter Types + +- **Multiple-choice -> AskUserQuestion**: language, mode, speaker count, generation style, resolution, aspect ratio +- **Free text -> normal message**: topic, content body, URL, image prompt +- **Sequential when dependent**: e.g., speaker list depends on language choice -- ask language first, then fetch speakers and present list +- **Batch when independent**: e.g., resolution + aspect ratio can be asked together in one AskUserQuestion call (multiple questions) +- **Options include descriptions**: not just labels -- explain what each choice means + +## Long Text Input + +When text content is long (e.g., a full article for TTS), passing it inline may hit shell argument length limits. Write to a temp file and use shell substitution: + +```bash +# Write content to temp file +cat > /tmp/lh-content.txt << 'ENDCONTENT' +Very long text content goes here... +ENDCONTENT + +# Use shell substitution to pass the file content +listenhub tts create --text "$(cat /tmp/lh-content.txt)" --lang zh --json + +# Clean up +rm -f /tmp/lh-content.txt +``` + +**When to use temp files**: Always use this approach when text content exceeds a few KB. diff --git a/shared/cli-speakers.md b/shared/cli-speakers.md new file mode 100644 index 0000000..1e021d3 --- /dev/null +++ b/shared/cli-speakers.md @@ -0,0 +1,78 @@ +# CLI Speakers + +Query and use voice speakers via the ListenHub CLI. + +## Listing Speakers + +```bash +# All speakers +listenhub speakers list --json + +# Chinese speakers only +listenhub speakers list --lang zh --json + +# English speakers only +listenhub speakers list --lang en --json +``` + +### Parsing the Response + +```bash +SPEAKERS=$(listenhub speakers list --lang en --json) + +# List all speaker names +echo "$SPEAKERS" | jq -r '.[].name' + +# Get a specific speaker's ID +echo "$SPEAKERS" | jq -r '.[] | select(.name == "Mars") | .speakerId' + +# Count available speakers +echo "$SPEAKERS" | jq 'length' +``` + +### Speaker Fields + +| Field | Type | Description | +|-------|------|-------------| +| name | string | Display name (e.g., "Mars") | +| speakerId | string | ID to use in create commands | +| gender | string | `male` or `female` | +| language | string | `zh` or `en` | +| demoAudioUrl | string | Preview audio URL | + +## Using Speakers in Create Commands + +### By name + +```bash +listenhub podcast create --topic "AI" --speaker "Mars" --lang en --json +``` + +### By ID + +```bash +listenhub podcast create --topic "AI" --speaker-id "cozy-man-english" --lang en --json +``` + +### Multi-speaker (podcast, up to 2) + +Repeat the `--speaker` flag: + +```bash +listenhub podcast create --topic "AI" --speaker "Mars" --speaker "Mia" --lang en --json +``` + +Or with IDs: + +```bash +listenhub podcast create --topic "AI" --speaker-id "cozy-man-english" --speaker-id "travel-girl-english" --lang en --json +``` + +## Integration with Speaker Selection + +The interactive speaker selection flow in [speaker-selection.md](./speaker-selection.md) remains unchanged. The only difference is the underlying query mechanism: + +- **Before**: `GET /speakers/list?language={language}` via curl +- **Now**: `listenhub speakers list --lang {language} --json` + +The selection UI, default speakers, input matching, and config persistence all work the same way. From 3f0f2e095d4b60fd535cbdf62853578a52882f64 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:36:54 +0800 Subject: [PATCH 06/14] refactor: inline content-parser API docs, update speaker-selection and config-pattern - content-parser/SKILL.md: inline all API docs (auth, endpoints, polling, errors) - shared/speaker-selection.md: replace curl speaker fetch with CLI command - shared/config-pattern.md: replace API Key Check with CLI Auth Check --- content-parser/SKILL.md | 234 ++++++++++++++++++++++++++++++++++-- shared/config-pattern.md | 35 +----- shared/speaker-selection.md | 10 +- 3 files changed, 235 insertions(+), 44 deletions(-) diff --git a/content-parser/SKILL.md b/content-parser/SKILL.md index e484aa8..acea1c7 100644 --- a/content-parser/SKILL.md +++ b/content-parser/SKILL.md @@ -31,9 +31,9 @@ Extract and normalize content from URLs across supported platforms. Returns stru ## Hard Constraints -- No shell scripts. Construct curl commands from the API reference files listed in Resources -- Always read `shared/authentication.md` for API key and headers -- Follow `shared/common-patterns.md` for polling, errors, and interaction patterns +- No shell scripts. Construct curl commands from the API Reference (Inlined) section below +- See § API Reference (Inlined) below for API key and headers +- See § API Reference (Inlined) below for polling, errors, and interaction patterns - URL must be a valid HTTP(S) URL - Always read config following `shared/config-pattern.md` before any interaction - Never save files to `~/Downloads/` or `.listenhub/` — save to the current working directory @@ -200,13 +200,229 @@ Wait for explicit confirmation before calling the API. **Estimated time**: 10-30 seconds depending on content size and platform. -## API Reference +## API Reference (Inlined) -- Content extract: `shared/api-content-extract.md` -- Supported platforms: `references/supported-platforms.md` -- Polling: `shared/common-patterns.md` § Async Polling -- Error handling: `shared/common-patterns.md` § Error Handling -- Config pattern: `shared/config-pattern.md` +### Authentication + +**Environment variable**: `LISTENHUB_API_KEY` (format: `lh_sk_...`) + +Store in `~/.zshrc` (macOS) or `~/.bashrc` (Linux): + +```bash +export LISTENHUB_API_KEY="lh_sk_..." +``` + +**How to obtain**: Visit https://listenhub.ai/settings/api-keys (Pro plan required). + +**Base URL**: `https://api.marswave.ai/openapi/v1` + +**Required headers** (every request): + +``` +Authorization: Bearer $LISTENHUB_API_KEY +Content-Type: application/json +X-Source: skills +``` + +The `X-Source: skills` header identifies requests as coming from Claude Code skills (CLI tool). + +**curl template:** + +```bash +curl -sS -X POST "https://api.marswave.ai/openapi/v1/{endpoint}" \ + -H "Authorization: Bearer $LISTENHUB_API_KEY" \ + -H "Content-Type: application/json" \ + -H "X-Source: skills" \ + -d '{ ... }' +``` + +For GET requests, omit `-d` and change `-X POST` to `-X GET`. + +**Security notes:** +- Never log or display full API keys in output +- API keys are transmitted via HTTPS only +- Do not pass sensitive or confidential information as content input — it is sent to external APIs for processing + +--- + +### POST /v1/content/extract + +Create a content extraction task for a URL. Returns a `taskId` for polling. + +**Request body:** + +| Field | Required | Type | Description | +|-------|----------|------|-------------| +| source | **Yes** | object | Source to extract from | +| source.type | **Yes** | string | Must be `"url"` | +| source.uri | **Yes** | string | Valid HTTP(S) URL to extract content from | +| options | No | object | Extraction options | +| options.summarize | No | boolean | Whether to generate a summary | +| options.maxLength | No | integer | Maximum content length | +| options.twitter | No | object | Twitter/X specific options | +| options.twitter.count | No | integer | Number of tweets to fetch (1-100, default 20) | + +**Response:** + +```json +{ + "code": 0, + "message": "success", + "data": { + "taskId": "69a7dac700cf95938f86d9bb" + } +} +``` + +**Error codes:** + +| Code | Meaning | +|------|---------| +| 29003 | Validation error (`"source.uri" is required`, `"source.uri" must be a valid uri`) | +| 21007 | Invalid API key | + +--- + +### GET /v1/content/extract/{taskId} + +Get extraction task status and results. + +**Path params:** + +| Param | Type | Description | +|-------|------|-------------| +| taskId | string | 24-char hex task ID | + +**Response states:** + +- **processing** — Task is still running +- **completed** — Extraction finished, data available +- **failed** — Extraction failed, check `failCode` and `message` + +**Response (processing):** + +```json +{ + "code": 0, + "message": "success", + "data": { + "taskId": "69a7dac700cf95938f86d9bb", + "status": "processing", + "createdAt": "2025-04-09T12:00:00Z", + "data": null, + "credits": 0, + "failCode": null, + "message": null + } +} +``` + +**Response (completed):** + +```json +{ + "code": 0, + "message": "success", + "data": { + "taskId": "69a7dac700cf95938f86d9bb", + "status": "completed", + "createdAt": "2025-04-09T12:00:00Z", + "data": { + "content": "Extracted text content...", + "metadata": { + "title": "Article Title", + "author": "Author Name", + "publishedAt": "2025-04-01T08:00:00Z" + }, + "references": [ + "https://example.com/related-article" + ] + }, + "credits": 5, + "failCode": null, + "message": null + } +} +``` + +**Response (failed):** + +```json +{ + "code": 0, + "message": "success", + "data": { + "taskId": "69a7dac700cf95938f86d9bb", + "status": "failed", + "createdAt": "2025-04-09T12:00:00Z", + "data": null, + "credits": 0, + "failCode": "EXTRACT_FAILED", + "message": "Unable to extract content from the provided URL" + } +} +``` + +**Key fields:** + +| Field | Type | Description | +|-------|------|-------------| +| status | string | `processing`, `completed`, or `failed` | +| data.data.content | string | Extracted text content | +| data.data.metadata | object | Page metadata (title, author, publishedAt) | +| data.data.references | array | Referenced URLs (array of strings) | +| credits | integer | Credits consumed | +| failCode | string | Error code (null on success) | +| message | string | Error message (null on success) | + +**Error codes:** + +| Code | Meaning | +|------|---------| +| 29003 | Invalid taskId format | +| 25002 | Task not found | + +--- + +### Polling Pattern + +5-second interval, 60 polls max. Run with `run_in_background: true` and `timeout: 300000`. + +**Two-step pattern:** + +1. **Submit (foreground)**: POST the creation request, extract `taskId` from the response. +2. **Poll (background)**: Run the polling loop with `run_in_background: true`. You will be notified automatically when it completes. + +The exact polling bash command is already specified in the Workflow section (Step 5). + +--- + +### Error Handling + +**HTTP status codes:** + +| Code | Meaning | Action | +|------|---------|--------| +| 200 | Success | Parse response body | +| 400 | Bad request | Check parameters | +| 401 | Invalid API key | Re-check `LISTENHUB_API_KEY` | +| 402 | Insufficient credits | Inform user to recharge | +| 403 | Forbidden | No permission for this resource | +| 429 | Rate limited | Exponential backoff, retry after delay | +| 500/502/503/504 | Server error | Retry up to 3 times | + +**Retry strategy:** + +- **429 rate limit**: Wait 15 seconds, then retry (exponential backoff) +- **5xx server errors**: Retry up to 3 times with 5-second intervals +- **Network errors**: Retry up to 3 times + +**Application error codes:** + +| Code | Meaning | +|------|---------| +| 21007 | Invalid user API key | +| 25429 | Rate limited (application-level) | ## Example diff --git a/shared/config-pattern.md b/shared/config-pattern.md index 02f2a20..21a9e14 100644 --- a/shared/config-pattern.md +++ b/shared/config-pattern.md @@ -2,40 +2,13 @@ Reusable pattern for per-skill config lookup, creation, and update. -## API Key Check +## CLI Auth Check -Run this **before Step 0** in every skill that requires `LISTENHUB_API_KEY`. +Run this **before Step 0** in every skill that uses the ListenHub CLI. -```bash -[ -z "$LISTENHUB_API_KEY" ] && echo "MISSING" || echo "OK" -``` - -**If `OK`**: proceed to Step 0 silently. Do NOT display or confirm the key. - -**If `MISSING`**: run the interactive setup below. Do NOT stop — guide the user through configuration and then continue. - -### Interactive Key Setup - -1. Tell the user: - > `LISTENHUB_API_KEY` 未配置。请前往 https://listenhub.ai/settings/api-keys 获取 API Key(需要 Pro 订阅)。 - -2. Use `AskUserQuestion` to collect the key: - > 请粘贴你的 API Key(以 `lh_sk_` 开头): - -3. Validate format — must start with `lh_sk_`. If not, re-prompt. - -4. Write to shell profile and source: - ```bash - echo '' >> ~/.zshrc - echo 'export LISTENHUB_API_KEY="lh_sk_..."' >> ~/.zshrc - source ~/.zshrc - ``` - On Linux, use `~/.bashrc` instead. - -5. Confirm to the user: - > API Key 已保存到 `~/.zshrc`,后续会话无需重复配置。 +Follow `shared/cli-authentication.md` § Auth Check. -6. **Continue** — proceed to Step 0 and the skill's Interaction Flow. Do NOT ask the user to re-run. +If CLI is not installed or not logged in, guide the user through setup as described in `shared/cli-authentication.md`. ## Config Location diff --git a/shared/speaker-selection.md b/shared/speaker-selection.md index 4f6cde4..b24e6cd 100644 --- a/shared/speaker-selection.md +++ b/shared/speaker-selection.md @@ -22,13 +22,15 @@ When no user preference is saved, use these built-in defaults. This eliminates t ## Fetching Speakers -Always call the speakers API before presenting options (when user requests to change voice): +Always query the speaker list before presenting options (when user requests to change voice): +```bash +listenhub speakers list --lang {language} --json ``` -GET /speakers/list?language={language} -``` -Never hardcode speaker IDs in API calls — use the defaults above only as fallback when no user preference exists. +See `shared/cli-speakers.md` for full query patterns. + +Never hardcode speaker IDs — use the defaults above only as fallback when no user preference exists. ## Speaker Properties From 451f9b43905fb41f9f771ae727c5e3b807e49f60 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:38:55 +0800 Subject: [PATCH 07/14] feat: add /slides and /music skills via CLI - slides/SKILL.md: slide deck generation (storybook mode=slides) - music/SKILL.md: AI music generation and covers --- music/SKILL.md | 432 ++++++++++++++++++++++++++++++++++++++++++++++++ slides/SKILL.md | 376 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 808 insertions(+) create mode 100644 music/SKILL.md create mode 100644 slides/SKILL.md diff --git a/music/SKILL.md b/music/SKILL.md new file mode 100644 index 0000000..129adb9 --- /dev/null +++ b/music/SKILL.md @@ -0,0 +1,432 @@ +--- +name: music +metadata: + openclaw: + emoji: "🎵" + requires: + bin: ["listenhub"] + primaryBin: "listenhub" +description: | + AI music generation and covers via CLI. Triggers on: "音乐", "music", + "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", + "create a song", "做一首歌". +--- + +## When to Use + +- User wants to generate an original song from a text prompt +- User wants to create a cover version from reference audio +- User says "音乐", "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", "create a song", "做一首歌" + +## When NOT to Use + +- User wants text-to-speech reading (use `/tts`) +- User wants a podcast discussion (use `/podcast`) +- User wants an explainer video with narration (use `/explainer`) +- User wants to transcribe audio to text (use `/asr`) + +## Purpose + +Generate original AI music or create cover versions from reference audio using the ListenHub CLI. Two modes: + +1. **Generate** (original): Create a new song from a text prompt, with optional style, title, and instrumental-only options. +2. **Cover**: Transform a reference audio file into a new version, with optional style modifications. + +## Hard Constraints + +- Always check CLI authentication via `shared/cli-authentication.md` before any operation +- Follow `shared/cli-patterns.md` for execution modes, error handling, and interaction patterns +- Follow `shared/config-pattern.md` for config lookup, creation, and update +- No speakers involved — this is music generation, not speech +- Audio file constraints for cover mode: mp3, wav, flac, m4a, ogg, aac; max 20 MB +- Long timeout: 600s default. Use `run_in_background: true` with `timeout: 660000` +- Never save files to `~/Downloads/` or `.listenhub/` — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming) + + +Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation CLI command until the user has explicitly confirmed. + + + +## CLI Commands + +### Generate (original) + +```bash +listenhub music generate --prompt "..." [--style "..."] [--title "..."] [--instrumental] --json +``` + +### Cover (from reference audio) + +```bash +listenhub music cover --audio "{path-or-url}" [--prompt "..."] [--style "..."] [--title "..."] [--instrumental] --json +``` + +### List tasks + +```bash +listenhub music list --page 1 --page-size 20 [--status pending|generating|uploading|success|failed] --json +``` + +### Get task status + +```bash +listenhub music get --json +``` + +## Step -1: CLI Auth Check + +Follow `shared/cli-authentication.md`: + +```bash +AUTH=$(listenhub auth status --json 2>/dev/null) +AUTHED=$(echo "$AUTH" | jq -r '.authenticated // false') +``` + +- If `listenhub` command is not found: tell the user to install it (`npm install -g @marswave/listenhub-cli`). Stop here. +- If `.authenticated` is `false`: tell the user to run `listenhub auth login`. Wait for completion, then re-check. +- If `.authenticated` is `true`: proceed silently. + +## Step 0: Config Setup + +Follow `shared/config-pattern.md` Step 0 (Zero-Question Boot). + +**If file doesn't exist** — silently create with defaults and proceed: +```bash +mkdir -p ".listenhub/music" +echo '{"outputMode":"download","language":null}' > ".listenhub/music/config.json" +CONFIG_PATH=".listenhub/music/config.json" +CONFIG=$(cat "$CONFIG_PATH") +``` +**Do NOT ask any setup questions.** Proceed directly to the Interaction Flow. + +**If file exists** — read config silently and proceed: +```bash +CONFIG_PATH=".listenhub/music/config.json" +[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/music/config.json" +CONFIG=$(cat "$CONFIG_PATH") +``` + +### Setup Flow (user-initiated reconfigure only) + +Only run when the user explicitly asks to reconfigure. Display current settings: +``` +当前配置 (music): + 输出方式:{inline / download / both} + 语言偏好:{zh / en / 未设置} +``` + +Then ask: + +1. **outputMode**: Follow `shared/output-mode.md` § Setup Flow Question. + +2. **Language** (optional): "默认语言?" + - "中文 (zh)" + - "English (en)" + - "每次手动选择" -> keep `null` + +After collecting answers, save immediately: +```bash +NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}') +if [ "$LANGUAGE" != "null" ]; then + NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}') +fi +echo "$NEW_CONFIG" > "$CONFIG_PATH" +CONFIG=$(cat "$CONFIG_PATH") +``` + +## Interaction Flow + +### Step 1: Mode + +Ask the user which mode they want, unless the intent is already clear from their message (e.g., "翻唱" or "cover" implies cover mode; "作曲" or "compose" implies generate mode). + +``` +Question: "选择音乐生成模式:" +Options: + - "原创 (Generate)" — 从文字描述生成全新歌曲 + - "翻唱 (Cover)" — 基于参考音频生成新版本 +``` + +### Step 2a: Prompt (generate mode) + +If the user chose **Generate**, ask for the song description: + +> "请描述你想要的歌曲(主题、情绪、歌词片段等):" + +Accept free text. This maps to `--prompt`. + +### Step 2b: Reference Audio (cover mode) + +If the user chose **Cover**, ask for the reference audio: + +> "请提供参考音频文件路径或 URL:" + +Accept a local file path or URL. This maps to `--audio`. + +**Validate the input:** + +- If a local path: verify the file exists and check the extension is one of: `mp3`, `wav`, `flac`, `m4a`, `ogg`, `aac` +- If a URL: accept as-is (the CLI will validate) +- Check file size does not exceed 20 MB for local files: + ```bash + FILE_SIZE=$(stat -f%z "{path}" 2>/dev/null || stat -c%s "{path}" 2>/dev/null) + if [ "$FILE_SIZE" -gt 20971520 ]; then + echo "File exceeds 20 MB limit" + fi + ``` + +If validation fails, inform the user and re-ask. + +Optionally, the user may also provide a prompt to guide the cover style. If not provided in this step, it will be asked in Step 3. + +### Step 3: Style (optional) + +Ask for an optional style descriptor: + +> "指定音乐风格?(如 pop、rock、jazz、电子、古风等,留空则由 AI 自动选择)" + +Accept free text or empty. This maps to `--style`. + +### Step 4: Title (optional) + +Ask for an optional title: + +> "歌曲标题?(留空则自动生成)" + +Accept free text or empty. This maps to `--title`. + +### Step 5: Instrumental + +``` +Question: "是否纯音乐(无人声)?" +Options: + - "否,带人声(默认)" + - "是,纯音乐" +``` + +Default is "no" (with vocals). If the user selects "是", add `--instrumental` flag. + +### Step 6: Confirm & Generate + +Summarize all choices: + +**Generate mode:** +``` +准备生成音乐: + + 模式:原创 (Generate) + 描述:{prompt, first 80 chars}... + 风格:{style / 自动} + 标题:{title / 自动} + 人声:{带人声 / 纯音乐} + + 确认? +``` + +**Cover mode:** +``` +准备生成音乐: + + 模式:翻唱 (Cover) + 参考音频:{audio path or URL} + 描述:{prompt / 无} + 风格:{style / 自动} + 标题:{title / 自动} + 人声:{带人声 / 纯音乐} + + 确认? +``` + +Wait for explicit confirmation before proceeding. + +## Workflow + +### Generate Mode + +1. **Submit (foreground)** with `--no-wait` to get the task ID: + + ```bash + RESULT=$(listenhub music generate \ + --prompt "{prompt}" \ + ${STYLE:+--style "$STYLE"} \ + ${TITLE:+--title "$TITLE"} \ + ${INSTRUMENTAL:+--instrumental} \ + --no-wait --json 2>/tmp/lh-music-err) + EXIT_CODE=$? + + if [ $EXIT_CODE -ne 0 ]; then + ERROR=$(cat /tmp/lh-music-err) + echo "Error: $ERROR" + rm -f /tmp/lh-music-err + exit $EXIT_CODE + fi + rm -f /tmp/lh-music-err + + TASK_ID=$(echo "$RESULT" | jq -r '.id') + echo "Submitted: $TASK_ID" + ``` + +2. Tell the user the task is submitted. + +3. **Poll (background)**: Run with `run_in_background: true` and `timeout: 660000`: + + ```bash + TASK_ID="" + for i in $(seq 1 60); do + RESULT=$(listenhub music get "$TASK_ID" --json 2>/dev/null) + STATUS=$(echo "$RESULT" | jq -r '.status // "pending"') + + case "$STATUS" in + success|completed) echo "$RESULT"; exit 0 ;; + failed) echo "FAILED: $RESULT" >&2; exit 1 ;; + *) sleep 10 ;; + esac + done + echo "TIMEOUT" >&2; exit 2 + ``` + +4. When notified of completion, **present the result** (see Result Presentation below). + +### Cover Mode + +1. **Submit (foreground)** with `--no-wait`: + + ```bash + RESULT=$(listenhub music cover \ + --audio "{path-or-url}" \ + ${PROMPT:+--prompt "$PROMPT"} \ + ${STYLE:+--style "$STYLE"} \ + ${TITLE:+--title "$TITLE"} \ + ${INSTRUMENTAL:+--instrumental} \ + --no-wait --json 2>/tmp/lh-music-err) + EXIT_CODE=$? + + if [ $EXIT_CODE -ne 0 ]; then + ERROR=$(cat /tmp/lh-music-err) + echo "Error: $ERROR" + rm -f /tmp/lh-music-err + exit $EXIT_CODE + fi + rm -f /tmp/lh-music-err + + TASK_ID=$(echo "$RESULT" | jq -r '.id') + echo "Submitted: $TASK_ID" + ``` + +2. Tell the user the task is submitted. + +3. **Poll (background)**: Same polling loop as Generate mode, with `run_in_background: true` and `timeout: 660000`. + +4. When notified of completion, **present the result**. + +## Result Presentation + +Read `OUTPUT_MODE` from config: + +```bash +OUTPUT_MODE=$(echo "$CONFIG" | jq -r '.outputMode // "download"') +``` + +Parse the completed result: + +```bash +AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl') +TITLE=$(echo "$RESULT" | jq -r '.title // "Untitled"') +DURATION=$(echo "$RESULT" | jq -r '.duration // 0') +CREDITS=$(echo "$RESULT" | jq -r '.credits // 0') +``` + +### `inline` or `both` + +Display the audio URL as a clickable link: + +``` +音乐已生成! + +标题:{title} +在线收听:{audioUrl} +时长:{duration}s +消耗积分:{credits} +``` + +### `download` or `both` + +Generate a slug from the title following `shared/config-pattern.md` § Artifact Naming. + +```bash +SLUG="{title-slug}" # e.g. "summer-breeze", "夜空中最亮的星" +NAME="${SLUG}.mp3" +# Dedup: if file exists, append -2, -3, etc. +BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2 +while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done +curl -sS -o "$NAME" "{audioUrl}" +``` + +Present: +``` +已保存到当前目录: + {NAME} +``` + +## Updating Config + +After successful generation, merge the language used this session into config if the user explicitly specified one: + +```bash +if [ -n "$LANGUAGE" ]; then + NEW_CONFIG=$(echo "$CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}') + echo "$NEW_CONFIG" > "$CONFIG_PATH" +fi +``` + +## API Reference + +- CLI authentication: `shared/cli-authentication.md` +- CLI patterns: `shared/cli-patterns.md` +- Config pattern: `shared/config-pattern.md` +- Output mode: `shared/output-mode.md` + +## Composability + +- **Invokes**: nothing +- **Invoked by**: nothing (standalone) + +## Examples + +**Generate original:** + +> "帮我做一首关于夏天海边的歌" + +1. Detect: generate mode ("做一首歌") +2. Read config (first run: create defaults with `outputMode: "download"`) +3. Infer: mode = generate, prompt = "夏天海边的歌" +4. Ask: style? title? instrumental? +5. Confirm summary -> user confirms +6. Submit `listenhub music generate --prompt "关于夏天海边的歌" --no-wait --json` +7. Poll in background +8. On completion: download `夏天海边.mp3` to cwd + +**Cover from file:** + +> "用这个音频翻唱一下 demo.mp3,jazz 风格" + +1. Detect: cover mode ("翻唱") +2. Validate: `demo.mp3` exists, is a supported format, under 20 MB +3. Infer: style = "jazz" from user input +4. Ask: title? instrumental? +5. Confirm summary -> user confirms +6. Submit `listenhub music cover --audio "demo.mp3" --style "jazz" --no-wait --json` +7. Poll in background +8. On completion: download `demo-cover.mp3` to cwd + +**Generate instrumental:** + +> "Create an instrumental electronic track for a game intro" + +1. Detect: generate mode ("Create ... track") +2. Infer: style = "electronic", instrumental = yes, from user input +3. Ask: title? +4. Confirm summary -> user confirms +5. Submit with `--style "electronic" --instrumental` +6. Poll in background +7. On completion: download `game-intro.mp3` to cwd diff --git a/slides/SKILL.md b/slides/SKILL.md new file mode 100644 index 0000000..d353216 --- /dev/null +++ b/slides/SKILL.md @@ -0,0 +1,376 @@ +--- +name: slides +description: | + Create slide decks with AI-generated visuals and optional narration. Triggers on: + "幻灯片", "PPT", "slides", "slide deck", "做幻灯片", "create slides", + "presentation". +metadata: + openclaw: + emoji: "📊" + requires: + bin: ["listenhub"] + primaryBin: "listenhub" +--- + +## When to Use + +- User wants to create a slide deck or presentation +- User asks to make "slides", "幻灯片", or "PPT" +- User wants a visual presentation with optional narration + +## When NOT to Use + +- User wants a narrated video without slides (use `/explainer`) +- User wants audio-only content (use `/speech` or `/podcast`) +- User wants a podcast-style discussion (use `/podcast`) +- User wants to generate a standalone image (use `/image-gen`) + +## Purpose + +Generate slide decks that combine structured visual pages with optional voice narration. Ideal for business presentations, educational content, and topic overviews. By default, slides are generated without audio — narration can be enabled on request. + +## Hard Constraints + +- Always check CLI authentication following `shared/cli-authentication.md` +- Follow `shared/cli-patterns.md` for command structure, execution, errors, and interaction patterns +- Always read config following `shared/config-pattern.md` before any interaction +- Follow `shared/speaker-selection.md` when narration is enabled +- Never save files to `~/Downloads/` or `.listenhub/` — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming) +- Mode is always `slides` — never `info` or `story` (those are for `/explainer`) +- Only 1 speaker supported (when narration is enabled) +- Default behavior: skip audio (no narration). User must opt in with `--no-skip-audio` + + +Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation CLI command until the user has explicitly confirmed. + + + +## Step -1: CLI Auth Check + +Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, stop and guide them through setup. + +## Step 0: Config Setup + +Follow `shared/config-pattern.md` Step 0 (Zero-Question Boot). + +**If file doesn't exist** — silently create with defaults and proceed: +```bash +mkdir -p ".listenhub/slides" +echo '{"outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/slides/config.json" +CONFIG_PATH=".listenhub/slides/config.json" +CONFIG=$(cat "$CONFIG_PATH") +``` +**Do NOT ask any setup questions.** Proceed directly to the Interaction Flow. + +**If file exists** — read config silently and proceed: +```bash +CONFIG_PATH=".listenhub/slides/config.json" +[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/slides/config.json" +CONFIG=$(cat "$CONFIG_PATH") +``` + +### Setup Flow (user-initiated reconfigure only) + +Only run when the user explicitly asks to reconfigure. Display current settings: +``` +当前配置 (slides): + 输出方式:{inline / download / both} + 语言偏好:{zh / en / 未设置} + 默认主播:{speakerName / 使用内置默认} +``` + +Then ask: + +1. **outputMode**: Follow `shared/output-mode.md` § Setup Flow Question. + +2. **Language** (optional): "默认语言?" + - "中文 (zh)" + - "English (en)" + - "每次手动选择" → keep `null` + +After collecting answers, save immediately: +```bash +NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}') +echo "$NEW_CONFIG" > "$CONFIG_PATH" +CONFIG=$(cat "$CONFIG_PATH") +``` + +## Interaction Flow + +### Step 1: Topic / Content + +Free text input. Ask the user: + +> What would you like to create a slide deck about? + +Accept: topic description, text content, URLs as source material. + +### Step 2: Source URLs (optional) + +If the user provided URLs in Step 1, collect them. Otherwise ask: + +> Do you have any reference URLs to include as source material? (optional — type "skip" to proceed without) + +Each URL will be passed as a `--source-url` flag (repeatable). + +### Step 3: Language + +If `config.language` is set, pre-fill and show in summary — skip this question. +Otherwise ask: + +``` +Question: "What language?" +Options: + - "Chinese (zh)" — Content in Mandarin Chinese + - "English (en)" — Content in English +``` + +### Step 4: Narration + +Ask the user: + +``` +Question: "需要语音旁白吗?(默认否)" +Options: + - "不需要" — Slides only, no narration + - "需要旁白" — Add voice narration to slides +``` + +Default is no narration. + +### Step 5: Speaker Selection (only if narration enabled) + +**Skip this step entirely if narration is not enabled.** + +Follow `shared/speaker-selection.md`: +- If `config.defaultSpeakers.{language}` is set → use saved speaker silently +- If not set → use **built-in default** from `shared/speaker-selection.md` for the language +- Show the speaker in the confirmation summary (Step 7) — user can change from there if desired +- Only show the full speaker list if the user explicitly asks to change voice + +Only 1 speaker is supported. + +### Step 6: Style (optional) + +If the user mentioned a specific visual style, capture it. Otherwise skip — do not ask. + +Style is passed as `--style "{style}"` when specified. + +### Step 7: Confirm & Generate + +Summarize all choices: + +**Without narration:** +``` +Ready to generate slides: + + Topic: {topic} + Language: {language} + Narration: None + Sources: {urls or "none"} + + Proceed? +``` + +**With narration:** +``` +Ready to generate slides: + + Topic: {topic} + Language: {language} + Narration: Yes + Speaker: {speaker name} + Sources: {urls or "none"} + + Proceed? +``` + +Wait for explicit confirmation before running any CLI command. + +## Workflow + +1. **Submit (foreground)** with `--no-wait` to get the creation ID: + + **Base command:** + ```bash + RESULT=$(listenhub slides create \ + --query "{topic}" \ + --lang {language} \ + --image-size 2K \ + --aspect-ratio 16:9 \ + --no-wait \ + --json) + ID=$(echo "$RESULT" | jq -r '.id') + ``` + + **If narration enabled**, add: + ``` + --no-skip-audio --speaker "{speakerName}" + ``` + + **If style specified**, add: + ``` + --style "{style}" + ``` + + **If source URLs provided**, add for each URL: + ``` + --source-url "{url}" + ``` + +2. Tell the user the task is submitted. + +3. **Poll (background)**: Run the following with `run_in_background: true` and `timeout: 360000`: + + ```bash + ID="" + for i in $(seq 1 60); do + RESULT=$(listenhub creation get "$ID" --json 2>/dev/null) + STATUS=$(echo "$RESULT" | jq -r '.status // "processing"') + + case "$STATUS" in + completed) echo "$RESULT"; exit 0 ;; + failed) echo "FAILED: $RESULT" >&2; exit 1 ;; + *) sleep 10 ;; + esac + done + echo "TIMEOUT" >&2; exit 2 + ``` + +4. When notified, **parse and present the result**: + + Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. + + Extract from the completed result: + - `episodeId` — for the online link + - `pageCount` — number of slides generated + - `credits` — credits consumed + + **`inline` or `both`**: Present the result inline. + + ``` + 幻灯片已生成! + + 「{title}」 + + 在线查看:https://listenhub.ai/app/slides/{episodeId} + 页数:{pageCount} + 消耗积分:{credits} + ``` + + **If narration was enabled**, also show: + ``` + 音频链接:{audioUrl} + ``` + + **`download` or `both`**: Also save files locally. Generate a topic slug following `shared/config-pattern.md` § Artifact Naming. + + Create `{slug}-slides/` folder (dedup if exists): + - Write `script.md` inside (the slide script/outline) + - If narration was enabled: download `audio.mp3` inside + + ```bash + DIR="{slug}-slides" + i=2; while [ -d "$DIR" ]; do DIR="{slug}-slides-${i}"; i=$((i+1)); done + mkdir -p "$DIR" + + # Save script + echo "$RESULT" | jq -r '.script // .content // ""' > "$DIR/script.md" + + # If narration enabled, download audio + AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl // empty') + [ -n "$AUDIO_URL" ] && curl -sS -o "$DIR/audio.mp3" "$AUDIO_URL" + ``` + + Present: + ``` + 已保存到当前目录: + {slug}-slides/ + script.md + audio.mp3 (if narration enabled) + ``` + +### After Successful Generation + +Update config with the choices made this session: + +```bash +NEW_CONFIG=$(echo "$CONFIG" | jq \ + --arg lang "{language}" \ + '. + {"language": $lang}') +echo "$NEW_CONFIG" > "$CONFIG_PATH" +``` + +If narration was enabled and a speaker was used: +```bash +NEW_CONFIG=$(echo "$CONFIG" | jq \ + --arg lang "{language}" \ + --argjson ids '["speakerId"]' \ + '.defaultSpeakers[$lang] = $ids') +echo "$NEW_CONFIG" > "$CONFIG_PATH" +``` + +**Estimated time**: 3-6 minutes + +## API Reference + +- CLI authentication: `shared/cli-authentication.md` +- CLI patterns: `shared/cli-patterns.md` +- Speaker list (CLI): `shared/cli-speakers.md` +- Speaker selection guide: `shared/speaker-selection.md` +- Config pattern: `shared/config-pattern.md` +- Output mode: `shared/output-mode.md` + +## Composability + +- **Invokes**: speakers CLI (for speaker selection when narration enabled) +- **Invoked by**: content-planner (Phase 3) + +## Example + +**User**: "帮我做一个关于量子计算的幻灯片" + +**Agent workflow**: +1. Topic: "量子计算" +2. Source URLs: skip (none provided) +3. Language: pre-filled from config or ask → "zh" +4. Narration: ask → "不需要" +5. Confirm and generate + +```bash +RESULT=$(listenhub slides create \ + --query "量子计算" \ + --lang zh \ + --image-size 2K \ + --aspect-ratio 16:9 \ + --no-wait \ + --json) +ID=$(echo "$RESULT" | jq -r '.id') +``` + +Poll until completed, then present the online link and page count. + +**User**: "Create slides about React hooks with narration" + +**Agent workflow**: +1. Topic: "React hooks" +2. Source URLs: skip +3. Language: ask → "en" +4. Narration: ask → "需要旁白" +5. Speaker: use built-in default "Mars" (cozy-man-english) +6. Confirm and generate + +```bash +RESULT=$(listenhub slides create \ + --query "React hooks" \ + --lang en \ + --image-size 2K \ + --aspect-ratio 16:9 \ + --no-skip-audio \ + --speaker "Mars" \ + --no-wait \ + --json) +ID=$(echo "$RESULT" | jq -r '.id') +``` + +Poll until completed, then present the online link, page count, and audio link. From d5fe1fba6a4ba86a3673656cb7a9e90f876ea6ac Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:38:55 +0800 Subject: [PATCH 08/14] feat: migrate podcast, tts, explainer, image-gen to CLI - podcast: remove two-step (OpenAPI-only), one-step via CLI - tts: quick + script modes via CLI - explainer: single CLI command replaces two polling loops - image-gen: CLI handles file upload natively, remove base64 logic --- explainer/SKILL.md | 199 +++++++++++++++++++++---------------- image-gen/SKILL.md | 241 +++++++++++++++++++++------------------------ podcast/SKILL.md | 136 ++++++++++--------------- tts/SKILL.md | 193 ++++++++++++++++++++++-------------- 4 files changed, 396 insertions(+), 373 deletions(-) diff --git a/explainer/SKILL.md b/explainer/SKILL.md index 543f5b9..0caad7b 100644 --- a/explainer/SKILL.md +++ b/explainer/SKILL.md @@ -8,8 +8,8 @@ metadata: openclaw: emoji: "🎬" requires: - env: ["LISTENHUB_API_KEY"] - primaryEnv: "LISTENHUB_API_KEY" + bin: ["listenhub"] + primaryBin: "listenhub" --- ## When to Use @@ -32,23 +32,22 @@ Generate explainer videos that combine a single narrator's voiceover with AI-gen ## Hard Constraints -- No shell scripts. Construct curl commands from the API reference files listed in Resources -- Always read `shared/authentication.md` for API key and headers -- Follow `shared/common-patterns.md` for polling, errors, and interaction patterns - Always read config following `shared/config-pattern.md` before any interaction -- Never hardcode speaker IDs — always fetch from the speakers API +- Follow `shared/cli-patterns.md` for execution modes, error handling, and interaction patterns +- Always follow `shared/cli-authentication.md` for auth checks +- Never hardcode speaker IDs — always fetch from the speakers CLI when the user wants to change voice - Never save files to `~/Downloads/` or `.listenhub/` — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming) - Explainer uses exactly 1 speaker - Mode must be `info` (for Info style) or `story` (for Story style) — never `slides` (use `/slides` skill instead) -Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation API until the user has explicitly confirmed. +Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any CLI command until the user has explicitly confirmed. -## Step -1: API Key Check +## Step -1: CLI Auth Check -Follow `shared/config-pattern.md` § API Key Check. If the key is missing, stop immediately. +Follow `shared/config-pattern.md` § CLI Auth Check. If the CLI is not installed or the user is not logged in, stop immediately and guide them through setup. ## Step 0: Config Setup @@ -122,6 +121,7 @@ Question: "What language?" Options: - "Chinese (zh)" — Content in Mandarin Chinese - "English (en)" — Content in English + - "Japanese (ja)" — Content in Japanese ``` ### Step 3: Style @@ -144,6 +144,8 @@ Follow `shared/speaker-selection.md`: - Show the speaker in the confirmation summary (Step 6) — user can change from there if desired - Only show the full speaker list if the user explicitly asks to change voice +Speaker query: see `shared/cli-speakers.md` for listing and filtering speakers. + Only 1 speaker is supported for explainer videos. ### Step 5: Output Type @@ -171,34 +173,71 @@ Ready to generate explainer: Proceed? ``` -Wait for explicit confirmation before calling any API. +Wait for explicit confirmation before running any CLI command. ## Workflow -1. **Submit (foreground)**: `POST /storybook/episodes` with content, speaker, language, mode → extract `episodeId` -2. Tell the user the task is submitted -3. **Poll (background)**: Run the following **exact** bash command with `run_in_background: true` and `timeout: 600000`. Do NOT use python3, awk, or any other JSON parser — use `jq` as shown: +1. **Submit (foreground)**: Run with `--no-wait` to get the creation ID immediately: + + ```bash + RESULT=$(listenhub explainer create \ + --query "{topic}" \ + --mode {info|story} \ + --lang {en|zh|ja} \ + --speaker "{name}" \ + --speaker-id "{id}" \ + --no-wait \ + --json) + + if [ $? -ne 0 ]; then + echo "Error: $RESULT" >&2 + exit 1 + fi + + ID=$(echo "$RESULT" | jq -r '.id') + echo "Submitted: $ID" + ``` + + **Optional flags** (add when applicable): + - `--source-url "{url}"` — if the user provided a reference URL + - `--skip-audio` — if text-only output (no video) + - `--image-size {2K|4K}` — image resolution (default: 2K) + - `--aspect-ratio {16:9|9:16|1:1}` — video aspect ratio (default: 16:9) + - `--style "{style}"` — visual style for AI-generated images + +2. Tell the user the task is submitted. + +3. **Poll (background)**: Run the following with `run_in_background: true` and `timeout: 660000`: ```bash - EPISODE_ID="" - for i in $(seq 1 30); do - RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "X-Source: skills" 2>/dev/null) - STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.processStatus // "pending"') + ID="" + for i in $(seq 1 60); do + RESULT=$(listenhub creation get "$ID" --json 2>/dev/null) + STATUS=$(echo "$RESULT" | jq -r '.status // "processing"') + case "$STATUS" in - success|completed) echo "$RESULT"; exit 0 ;; - failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; + completed) echo "$RESULT"; exit 0 ;; + failed) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 10 ;; esac done echo "TIMEOUT" >&2; exit 2 ``` -4. When notified, **download and present script**: +4. When notified, **parse and present result**: + + Parse the CLI JSON output for key fields: + ```bash + EPISODE_ID=$(echo "$RESULT" | jq -r '.episodeId') + AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl // empty') + VIDEO_URL=$(echo "$RESULT" | jq -r '.videoUrl // empty') + CREDITS=$(echo "$RESULT" | jq -r '.credits // empty') + ``` Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. + **If text-only output**: + **`inline` or `both`**: Present the script inline. Present: @@ -211,56 +250,36 @@ Wait for explicit confirmation before calling any API. ``` **`download` or `both`**: Also save the script file. Generate a topic slug following `shared/config-pattern.md` § Artifact Naming. - - If text-only output: save as `{slug}-explainer.md` in cwd (dedup if exists) - - If text+video output: create `{slug}-explainer/` folder (dedup if exists), write `script.md` inside + - Save as `{slug}-explainer.md` in cwd (dedup if exists) - Present the save path in addition to the above summary. -5. **If video requested**: `POST /storybook/episodes/{episodeId}/video` (foreground) → **poll again (background)** using the **exact** bash command below with `run_in_background: true` and `timeout: 600000`. Poll for `videoStatus`, not `processStatus`: + **If text + video output**: - ```bash - EPISODE_ID="" - for i in $(seq 1 30); do - RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "X-Source: skills" 2>/dev/null) - STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.videoStatus // "pending"') - case "$STATUS" in - success|completed) echo "$RESULT"; exit 0 ;; - failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; - *) sleep 10 ;; - esac - done - echo "TIMEOUT" >&2; exit 2 - ``` -6. When notified, **download and present result**: - -**Present result** + **`inline` or `both`**: Display video URL and audio URL as clickable links. -Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. - -**`inline` or `both`**: Display video URL and audio URL as clickable links. - -Present: -``` -解说视频已生成! + Present: + ``` + 解说视频已生成! -视频链接:{videoUrl} -音频链接:{audioUrl} -时长:{duration}s -消耗积分:{credits} -``` + 视频链接:{videoUrl} + 音频链接:{audioUrl} + 消耗积分:{credits} + ``` -**`download` or `both`**: Also download the audio file into the `{slug}-explainer/` folder. -```bash -curl -sS -o "{slug}-explainer/audio.mp3" "{audioUrl}" -``` -Present: -``` -已保存到当前目录: - {slug}-explainer/ - script.md - audio.mp3 -``` + **`download` or `both`**: Also save files. Generate a topic slug following `shared/config-pattern.md` § Artifact Naming. + - Create `{slug}-explainer/` folder (dedup if exists) + - Write `script.md` inside + - Download audio: + ```bash + curl -sS -o "{slug}-explainer/audio.mp3" "{audioUrl}" + ``` + - Present: + ``` + 已保存到当前目录: + {slug}-explainer/ + script.md + audio.mp3 + ``` ### After Successful Generation @@ -277,19 +296,20 @@ echo "$NEW_CONFIG" > "$CONFIG_PATH" **Estimated times**: - Text script only: 2-3 minutes -- Text + Video: 3-5 minutes +- Text + Video: 5-10 minutes -## API Reference +## Resources -- Speaker list: `shared/api-speakers.md` +- CLI authentication: `shared/cli-authentication.md` +- CLI patterns: `shared/cli-patterns.md` +- Speaker query: `shared/cli-speakers.md` - Speaker selection guide: `shared/speaker-selection.md` -- Episode creation: `shared/api-storybook.md` -- Polling: `shared/common-patterns.md` § Async Polling - Config pattern: `shared/config-pattern.md` +- Output mode: `shared/output-mode.md` ## Composability -- **Invokes**: speakers API (for speaker selection); may invoke `/speech` for voiceover +- **Invokes**: speakers CLI (for speaker selection); may invoke `/speech` for voiceover - **Invoked by**: content-planner (Phase 3) ## Example @@ -300,20 +320,31 @@ echo "$NEW_CONFIG" > "$CONFIG_PATH" 1. Topic: "Claude Code introduction" 2. Ask language → "English" 3. Ask style → "Info" -4. Fetch speakers, user picks "cozy-man-english" +4. Use default speaker "Mars" (cozy-man-english) 5. Ask output → "Text + Video" ```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/storybook/episodes" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ - "sources": [{"type": "text", "content": "Introduce Claude Code: what it is, key features, and how to get started"}], - "speakers": [{"speakerId": "cozy-man-english"}], - "language": "en", - "mode": "info" - }' +# Submit +RESULT=$(listenhub explainer create \ + --query "Introduce Claude Code: what it is, key features, and how to get started" \ + --mode info \ + --lang en \ + --speaker "Mars" \ + --speaker-id "cozy-man-english" \ + --no-wait \ + --json) +ID=$(echo "$RESULT" | jq -r '.id') + +# Poll (run_in_background: true, timeout: 660000) +for i in $(seq 1 60); do + RESULT=$(listenhub creation get "$ID" --json 2>/dev/null) + STATUS=$(echo "$RESULT" | jq -r '.status // "processing"') + case "$STATUS" in + completed) echo "$RESULT"; exit 0 ;; + failed) echo "FAILED: $RESULT" >&2; exit 1 ;; + *) sleep 10 ;; + esac +done ``` -Poll until text is ready, then generate video if requested. +Parse result for `episodeId`, `audioUrl`, `videoUrl`, `credits`, and present to user. diff --git a/image-gen/SKILL.md b/image-gen/SKILL.md index cfeee08..9d60e0f 100644 --- a/image-gen/SKILL.md +++ b/image-gen/SKILL.md @@ -8,8 +8,8 @@ metadata: openclaw: emoji: "🖼️" requires: - env: ["LISTENHUB_API_KEY"] - primaryEnv: "LISTENHUB_API_KEY" + bin: ["listenhub"] + primaryBin: "listenhub" --- ## When to Use @@ -28,24 +28,22 @@ metadata: ## Purpose -Generate AI images using the Labnana API. Supports text prompts with optional reference images, multiple resolutions, and aspect ratios. Images are saved as local files. +Generate AI images using the ListenHub CLI. Supports text prompts with optional reference images (local files or URLs), multiple resolutions, and aspect ratios. Images are saved as local files. ## Hard Constraints -- No shell scripts. Construct curl commands from the API reference files listed in Resources -- Always read `shared/authentication.md` for API key and headers -- Follow `shared/common-patterns.md` for error handling -- Image generation uses a **different base URL**: `https://api.marswave.ai/openapi/v1` +- Always check CLI auth following `shared/cli-authentication.md` +- Follow `shared/cli-patterns.md` for command execution and error handling - Always read config following `shared/config-pattern.md` before any interaction - Output saved to `.listenhub/image-gen/YYYY-MM-DD-{jobId}/` — never `~/Downloads/` -Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call the image generation API until the user has explicitly confirmed. +Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call the image generation command until the user has explicitly confirmed. -## Step -1: API Key Check +## Step -1: CLI Auth Check -Follow `shared/config-pattern.md` § API Key Check. If the key is missing, stop immediately. +Follow `shared/cli-authentication.md` § Auth Check. If CLI is not installed or not logged in, guide the user through setup. ## Step 0: Config Setup @@ -135,29 +133,17 @@ If flash model was selected, also offer: `1:4` (narrow portrait), `4:1` (wide la ``` Question: "Any reference images for style guidance?" Options: - - "Yes, I have URL(s)" — Provide reference image URLs - - "Yes, I have local file(s)" — Provide local file paths (base64 mode) + - "Yes" — Provide file paths or URLs - "No references" — Generate from prompt only ``` -**If URL mode**: Collect URLs (comma-separated, max 14). For each URL, infer mimeType from suffix and build: -```json -{ "fileData": { "fileUri": "", "mimeType": "" } } -``` -Suffix mapping: `.jpg`/`.jpeg` → `image/jpeg`, `.png` → `image/png`, `.webp` → `image/webp`, `.gif` → `image/gif` +**If yes**: Collect reference image paths or URLs (comma-separated). The CLI handles both local files and URLs natively — no need to distinguish between them. -**If local file (base64) mode**: Collect file paths (comma-separated, max 14). For each file, encode to base64 and infer mimeType from suffix: -```bash -# macOS -BASE64_REF=$(base64 -i /path/to/image.png) -# Linux -BASE64_REF=$(base64 -w 0 /path/to/image.png) -``` -Build: -```json -{ "inlineData": { "data": "", "mimeType": "" } } -``` -Suffix mapping: `.jpg`/`.jpeg` → `image/jpeg`, `.png` → `image/png`, `.webp` → `image/webp`, `.heic` → `image/heic`, `.heif` → `image/heif` +- Max 5 references +- Supported formats: jpg, png, webp, gif +- Max 10MB per file + +Each reference will be passed as a `--reference` flag to the CLI. ### Step 5: Confirm & Generate @@ -170,67 +156,83 @@ Ready to generate image: Model: {pro / flash} Resolution: {1K / 2K / 4K} Aspect ratio: {ratio} - References: {yes — N URL(s) / yes — N local file(s) / no} + References: {yes — N image(s) / no} Proceed? ``` -Wait for explicit confirmation before calling the API. +Wait for explicit confirmation before running the CLI command. ## Workflow -1. **Build request**: Construct JSON with provider, model, prompt, imageConfig, and optional referenceImages (URL-based via `fileData` or base64 via `inlineData`) -2. **Encode local files** (if base64 mode): For each local file path, encode to base64 and build `inlineData` objects -3. **Submit**: `POST https://api.marswave.ai/openapi/v1/images/generation` with timeout of 600s -4. **Extract image**: Parse base64 data from response -5. **Decode and present result** - -Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. - -**`inline` or `both`**: Decode base64 to a temp file, then use the Read tool. - -```bash -JOB_ID=$(date +%s) -echo "$BASE64_DATA" | base64 -D > /tmp/image-gen-${JOB_ID}.jpg -``` -Then use the Read tool on `/tmp/image-gen-{jobId}.jpg`. The image displays inline in the conversation. - -Present: -``` -图片已生成! -``` - -**`download` or `both`**: Save to the artifact directory. - -```bash -JOB_ID=$(date +%s) -DATE=$(date +%Y-%m-%d) -JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}" -mkdir -p "$JOB_DIR" -echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg" -``` - -Present: -``` -图片已生成! - -已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/: - {jobId}.jpg -``` - -**Base64 decoding** (cross-platform): - -```bash -# Linux -echo "$BASE64_DATA" | base64 -d > output.jpg - -# macOS -echo "$BASE64_DATA" | base64 -D > output.jpg -# or -echo "$BASE64_DATA" | base64 --decode > output.jpg -``` - -**Retry logic**: On 429 (rate limit), wait 15 seconds and retry. Max 3 retries. +1. **Build CLI command**: Construct the `listenhub image create` command with all collected parameters. + +2. **Execute**: Run the command with `run_in_background: true` and `timeout: 180000`: + + ```bash + listenhub image create \ + --prompt "{description}" \ + --model "{model}" \ + --lang "{lang}" \ + --aspect-ratio {16:9|9:16|1:1} \ + --size {1K|2K|4K} \ + --json + ``` + + If reference images were provided, add `--reference` for each: + ```bash + listenhub image create \ + --prompt "{description}" \ + --model "{model}" \ + --lang "{lang}" \ + --aspect-ratio 16:9 \ + --size 2K \ + --reference ./sketch.png \ + --reference ./photo.jpg \ + --json + ``` + + The `--lang` flag provides a language hint for the prompt. Detect from the user's prompt language (e.g., Chinese prompt → `zh`, English prompt → `en`). + +3. **Parse result and present** + + Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. + + Parse the CLI JSON output to extract the image URL: + ```bash + IMAGE_URL=$(echo "$RESULT" | jq -r '.imageUrl') + ``` + + **`inline` or `both`**: Download to a temp file, then use the Read tool. + + ```bash + JOB_ID=$(date +%s) + curl -sS -o /tmp/image-gen-${JOB_ID}.jpg "$IMAGE_URL" + ``` + Then use the Read tool on `/tmp/image-gen-{jobId}.jpg`. The image displays inline in the conversation. + + Present: + ``` + 图片已生成! + ``` + + **`download` or `both`**: Save to the artifact directory. + + ```bash + JOB_ID=$(date +%s) + DATE=$(date +%Y-%m-%d) + JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}" + mkdir -p "$JOB_DIR" + curl -sS -o "${JOB_DIR}/${JOB_ID}.jpg" "$IMAGE_URL" + ``` + + Present: + ``` + 图片已生成! + + 已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/: + {jobId}.jpg + ``` ## Prompt Handling @@ -253,12 +255,14 @@ echo "$BASE64_DATA" | base64 --decode > output.jpg ## API Reference -- Image generation: `shared/api-image.md` -- Error handling: `shared/common-patterns.md` § Error Handling +- CLI authentication: `shared/cli-authentication.md` +- CLI execution patterns: `shared/cli-patterns.md` +- Config pattern: `shared/config-pattern.md` +- Output mode: `shared/output-mode.md` ## Composability -- **Invokes**: nothing (direct API call) +- **Invokes**: nothing (direct CLI call) - **Invoked by**: platform skills for cover images (Phase 2) ## Example @@ -273,61 +277,38 @@ echo "$BASE64_DATA" | base64 --decode > output.jpg 5. No references ```bash -RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - --max-time 600 \ - -d '{ - "provider": "google", - "model": "gemini-3-pro-image-preview", - "prompt": "cyberpunk city at night", - "imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"} - }') - -BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data') -JOB_ID=$(date +%s) -DATE=$(date +%Y-%m-%d) -JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}" -mkdir -p "$JOB_DIR" -echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg" +listenhub image create \ + --prompt "cyberpunk city at night" \ + --model "gemini-3-pro-image-preview" \ + --lang en \ + --aspect-ratio 16:9 \ + --size 2K \ + --json ``` -Decode the base64 data per `outputMode` (see `shared/output-mode.md`). +Parse CLI JSON output per `outputMode` (see `shared/output-mode.md`). -### Example 2 — With Local Reference Image (base64) +### Example 2 — With Reference Images -**User**: "Generate an image in this style" (provides a local file path) +**User**: "Generate an image in this style" (provides local files and a URL) **Agent workflow**: 1. Ask prompt → "a serene mountain lake at dawn" 2. Ask model → "pro" 3. Ask resolution → "2K" 4. Ask ratio → "16:9" -5. References → local file → `/path/to/style-reference.png` +5. References → `/path/to/style-reference.png`, `https://example.com/photo.jpg` ```bash -# Encode local reference image -BASE64_REF=$(base64 -i /path/to/style-reference.png) - -RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - --max-time 600 \ - -d "{ - \"provider\": \"google\", - \"model\": \"gemini-3-pro-image-preview\", - \"prompt\": \"a serene mountain lake at dawn\", - \"imageConfig\": {\"imageSize\": \"2K\", \"aspectRatio\": \"16:9\"}, - \"referenceImages\": [{\"inlineData\": {\"data\": \"$BASE64_REF\", \"mimeType\": \"image/png\"}}] - }") - -BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data') -JOB_ID=$(date +%s) -DATE=$(date +%Y-%m-%d) -JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}" -mkdir -p "$JOB_DIR" -echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg" +listenhub image create \ + --prompt "a serene mountain lake at dawn" \ + --model "gemini-3-pro-image-preview" \ + --lang en \ + --aspect-ratio 16:9 \ + --size 2K \ + --reference /path/to/style-reference.png \ + --reference https://example.com/photo.jpg \ + --json ``` -Decode the base64 data per `outputMode` (see `shared/output-mode.md`). +Parse CLI JSON output per `outputMode` (see `shared/output-mode.md`). diff --git a/podcast/SKILL.md b/podcast/SKILL.md index e201d36..a9758cf 100644 --- a/podcast/SKILL.md +++ b/podcast/SKILL.md @@ -8,8 +8,8 @@ metadata: openclaw: emoji: "🎙️" requires: - env: ["LISTENHUB_API_KEY"] - primaryEnv: "LISTENHUB_API_KEY" + bin: ["listenhub"] + primaryBin: "listenhub" --- ## When to Use @@ -32,11 +32,10 @@ Generate podcast episodes with 1-2 AI speakers discussing a topic. Supports quic ## Hard Constraints -- No shell scripts. Construct curl commands from the API reference files listed in Resources -- Always read `shared/authentication.md` for API key and headers -- Follow `shared/common-patterns.md` for polling, errors, and interaction patterns +- Always check CLI auth following `shared/cli-authentication.md` +- Follow `shared/cli-patterns.md` for command execution and error handling - Never hardcode speaker IDs in API calls — use built-in defaults from `shared/speaker-selection.md` as fallback only; fetch from the speakers API when the user wants to change voice -- Never fabricate API endpoints or parameters +- Never fabricate CLI commands or parameters - Always read config following `shared/config-pattern.md` before any interaction - Always follow `shared/speaker-selection.md` for speaker selection (text table + free-text input) - Never save files to `~/Downloads/` or `.listenhub/` — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming) @@ -46,9 +45,9 @@ Use the AskUserQuestion tool for every multiple-choice step — do NOT print opt -## Step -1: API Key Check +## Step -1: CLI Auth Check -Follow `shared/config-pattern.md` § API Key Check. If the key is missing, stop immediately. +Follow `shared/cli-authentication.md` § Auth Check. If the CLI is not installed or the user is not logged in, stop immediately and guide them. ## Step 0: Config Setup @@ -57,7 +56,7 @@ Follow `shared/config-pattern.md` Step 0 (Zero-Question Boot). **If file doesn't exist** — silently create with defaults and proceed: ```bash mkdir -p ".listenhub/podcast" -echo '{"outputMode":"inline","language":null,"defaultMode":"quick","defaultMethod":"one-step","defaultSpeakers":{}}' > ".listenhub/podcast/config.json" +echo '{"outputMode":"inline","language":null,"defaultMode":"quick","defaultSpeakers":{}}' > ".listenhub/podcast/config.json" CONFIG_PATH=".listenhub/podcast/config.json" CONFIG=$(cat "$CONFIG_PATH") ``` @@ -78,7 +77,6 @@ Only run when the user explicitly asks to reconfigure. Display current settings: 输出方式:{inline / download / both} 语言偏好:{zh / en / 未设置} 默认模式:{quick / deep / debate / 未设置} - 默认生成方式:{one-step / two-step} 默认主播:{speakerName(s) / 使用内置默认} ``` @@ -97,11 +95,6 @@ Then ask these questions in order and save: - "Debate — 辩论对话" - "每次手动选择" → keep `null` -4. **Method** (optional): "默认生成方式?" - - "一步生成(推荐)" → `defaultMethod: "one-step"` - - "两步生成(先预览文本)" → `defaultMethod: "two-step"` - - "每次手动选择" → keep `null` - After collecting answers, save immediately: ```bash NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}') @@ -113,10 +106,6 @@ fi if [ "$MODE" != "null" ]; then NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg mode "$MODE" '. + {"defaultMode": $mode}') fi -# Save method if user chose one -if [ "$METHOD" != "null" ]; then - NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg method "$METHOD" '. + {"defaultMethod": $method}') -fi echo "$NEW_CONFIG" > "$CONFIG_PATH" CONFIG=$(cat "$CONFIG_PATH") ``` @@ -171,13 +160,7 @@ Follow `shared/speaker-selection.md`: For 2-speaker mode (dialogue/debate): use Primary + Secondary defaults for the language. -### Step 6: Generation Method - -**Default: "one-step"** — skip this question unless: -- `config.defaultMethod` is set → use that value silently -- User explicitly asks to review text first → use "two-step" - -### Step 7: Confirm & Generate +### Step 6: Confirm & Generate Summarize all choices: @@ -189,37 +172,47 @@ Ready to generate podcast: Language: {language} Speakers: {speaker name(s)} References: {yes/no + brief description} - Method: {one-step/two-step} Proceed? ``` -Wait for explicit confirmation before calling any API. The user can adjust any parameter here before confirming. +Wait for explicit confirmation before calling any CLI command. The user can adjust any parameter here before confirming. ## Workflow -### One-Step Generation +### Generation -1. **Submit (foreground)**: `POST /podcast/episodes` with collected parameters → extract `episodeId` -2. Tell the user the task is submitted -3. **Poll (background)**: Run the following **exact** bash command with `run_in_background: true` and `timeout: 600000`. Do NOT use python3, awk, or any other JSON parser — use `jq` as shown: +1. **Submit (background)**: Run the CLI command with `run_in_background: true` and `timeout: 360000`: ```bash - EPISODE_ID="" - for i in $(seq 1 30); do - RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/podcast/episodes/$EPISODE_ID" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "X-Source: skills" 2>/dev/null) - STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.processStatus // "pending"') - case "$STATUS" in - success|completed) echo "$RESULT"; exit 0 ;; - failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; - *) sleep 10 ;; - esac - done - echo "TIMEOUT" >&2; exit 2 + listenhub podcast create \ + --query "{topic}" \ + --source-url "{url}" \ + --source-text "{text}" \ + --mode {quick|deep|debate} \ + --lang {en|zh|ja} \ + --speaker "{name}" \ + --speaker "{name2}" \ + --json ``` -4. When notified of completion, **Step 6: Present result** + + Flag notes: + - `--query` — the topic or question to discuss + - `--source-url` — repeatable, one per URL reference + - `--source-text` — repeatable, one per text block reference + - `--mode` — one of `quick`, `deep`, `debate` + - `--lang` — language code + - `--speaker` — repeatable (max 2); use speaker display names + - `--speaker-id` — alternative to `--speaker`; use speaker IDs instead of names + - Omit `--source-url` / `--source-text` if the user provided no references + + The CLI handles polling internally and returns the final result when generation completes. + +2. Tell the user the task is submitted and that they will be notified when it finishes. + +3. When notified of completion, **Present result**: + + Parse the CLI JSON output to extract fields: `audioUrl`, `subtitlesUrl`, `audioDuration`, `credits`. Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. @@ -249,26 +242,7 @@ Wait for explicit confirmation before calling any API. The user can adjust any p 已保存到当前目录: {NAME} ``` -5. Offer to show transcript or provide download URL on request - -### Two-Step Generation - -1. **Step 1 — Submit text (foreground)**: `POST /podcast/episodes/text-content` → extract `episodeId` -2. **Poll text (background)**: Use the exact `jq`-based polling loop above (substitute endpoint `podcast/episodes/text-content/{episodeId}` if needed), with `run_in_background: true` and `timeout: 600000` -3. When notified, **save draft to a topic-based folder in cwd**: - - Generate a topic slug following `shared/config-pattern.md` § Artifact Naming - - Create `{slug}-podcast/` folder (dedup if exists) - - Write `draft.md` (human-readable: `**{speakerName}**: {content}` per line) - - Write `draft.json` (raw `scripts` array) - - Present the draft location and content preview -4. **STOP**: Present the draft and wait for explicit user approval -5. **Step 2 — Submit audio (foreground, after approval)**: - - No changes: `POST /podcast/episodes/{episodeId}/audio` with `{}` - - With edits: `POST /podcast/episodes/{episodeId}/audio` with modified `{scripts: [...]}` -6. **Poll audio (background)**: Same exact `jq`-based loop, `run_in_background: true`, `timeout: 600000` -7. When notified, **download audio to the same folder**: - - `curl -sS -o {slug}-podcast/podcast.mp3 {audioUrl}` - - Present final result (same format as one-step, folder now has draft + final files) +4. Offer to show transcript or provide download URL on request ### After Successful Generation @@ -278,18 +252,17 @@ Update config with the choices made this session: NEW_CONFIG=$(echo "$CONFIG" | jq \ --arg lang "{language}" \ --arg mode "{mode}" \ - --arg method "{one-step/two-step}" \ --argjson speakers '{"{language}": ["{speakerId}"]}' \ - '. + {"language": $lang, "defaultMode": $mode, "defaultMethod": $method, "defaultSpeakers": (.defaultSpeakers + $speakers)}') + '. + {"language": $lang, "defaultMode": $mode, "defaultSpeakers": (.defaultSpeakers + $speakers)}') echo "$NEW_CONFIG" > "$CONFIG_PATH" ``` ## API Reference -- Speaker list: `shared/api-speakers.md` +- Speaker list: `shared/cli-speakers.md` - Speaker selection guide: `shared/speaker-selection.md` -- Episode creation: `shared/api-podcast.md` -- Polling: `shared/common-patterns.md` § Async Polling +- CLI patterns: `shared/cli-patterns.md` +- CLI authentication: `shared/cli-authentication.md` - Config pattern: `shared/config-pattern.md` ## Composability @@ -303,20 +276,17 @@ echo "$NEW_CONFIG" > "$CONFIG_PATH" **Agent workflow**: 1. Detect: podcast request, topic = "latest AI developments", no references -2. Infer: mode = "quick" (default), language = "en" (user wrote in English), 2 speakers (default), one-step (default) +2. Infer: mode = "quick" (default), language = "en" (user wrote in English), 2 speakers (default) 3. Show confirmation summary → user confirms ```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/podcast/episodes" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ - "sources": [{"type": "text", "content": "The latest AI developments"}], - "speakers": [{"speakerId": "cozy-man-english"}], - "language": "en", - "mode": "deep" - }' +listenhub podcast create \ + --query "The latest AI developments" \ + --mode deep \ + --lang en \ + --speaker "Mars" \ + --speaker "Mia" \ + --json ``` -Poll until complete, then present the result with title and listen link. +Wait for CLI to return result, then present with title and listen link. diff --git a/tts/SKILL.md b/tts/SKILL.md index 35e7707..13c787e 100644 --- a/tts/SKILL.md +++ b/tts/SKILL.md @@ -4,8 +4,8 @@ metadata: openclaw: emoji: "🔊" requires: - env: ["LISTENHUB_API_KEY"] - primaryEnv: "LISTENHUB_API_KEY" + bin: ["listenhub"] + primaryBin: "listenhub" description: | Text-to-speech and voice narration. Triggers on: "朗读这段", "配音", "TTS", "语音合成", "text to speech", "read this aloud", "convert to speech", @@ -29,21 +29,20 @@ description: | Convert text into natural-sounding speech audio. Two paths: -1. **Quick mode** (`/v1/tts`): Single voice, low-latency, sync MP3 stream. For casual chat, reading snippets, instant audio. -2. **Script mode** (`/v1/speech`): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content. +1. **Quick mode** (`--mode direct`): Single voice, low-latency, sync. For casual chat, reading snippets, instant audio. +2. **Script mode** (`--mode smart`): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content. ## Hard Constraints -- No shell scripts. Construct curl commands from the API reference files listed in Resources -- Always read `shared/authentication.md` for API key and headers -- Follow `shared/common-patterns.md` for errors and interaction patterns -- Never hardcode speaker IDs in API calls — use built-in defaults from `shared/speaker-selection.md` as fallback only; fetch from the speakers API when the user wants to change voice +- Always check CLI auth following `shared/cli-authentication.md` +- Follow `shared/cli-patterns.md` for CLI execution, errors, and interaction patterns +- Never hardcode speaker IDs in CLI calls — use built-in defaults from `shared/speaker-selection.md` as fallback only; fetch from the speakers CLI when the user wants to change voice - Always read config following `shared/config-pattern.md` before any interaction - Always follow `shared/speaker-selection.md` for speaker selection (text table + free-text input) - Never save files to `~/Downloads/` or `/tmp/` as primary output — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming) -Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation API until the user has explicitly confirmed. +Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation CLI command until the user has explicitly confirmed. @@ -62,9 +61,9 @@ Determine the mode from the user's input **automatically** before asking any que ## Interaction Flow -### Step -1: API Key Check +### Step -1: CLI Auth Check -Follow `shared/config-pattern.md` § API Key Check. If the key is missing, stop immediately. +Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, stop and guide them through setup. ### Step 0: Config Setup @@ -116,7 +115,7 @@ echo "$NEW_CONFIG" > "$CONFIG_PATH" CONFIG=$(cat "$CONFIG_PATH") ``` -### Quick Mode — `POST /v1/tts` +### Quick Mode — `listenhub tts create --mode direct` **Step 1: Extract text** @@ -154,51 +153,58 @@ Proceed? **Step 5: Generate** +For short text, pass inline: ```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{"input": "...", "voice": "..."}' \ - --output /tmp/tts-output.mp3 +RESULT=$(listenhub tts create --text "{text}" --mode direct --speaker "{name}" --lang {lang} --json 2>/tmp/lh-err) +EXIT_CODE=$? + +if [ $EXIT_CODE -ne 0 ]; then + ERROR=$(cat /tmp/lh-err) + case $EXIT_CODE in + 2) echo "Auth error: run 'listenhub auth login'" ;; + 3) echo "Timeout: try --no-wait" ;; + *) echo "Error: $ERROR" ;; + esac + rm -f /tmp/lh-err +fi +rm -f /tmp/lh-err + +AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl') +``` + +For long text, write to a temp file first (see `shared/cli-patterns.md` § Long Text Input): +```bash +cat > /tmp/lh-content.txt << 'ENDCONTENT' +Long text content goes here... +ENDCONTENT + +RESULT=$(listenhub tts create --text "$(cat /tmp/lh-content.txt)" --mode direct --speaker "{name}" --lang {lang} --json) +AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl') + +rm -f /tmp/lh-content.txt ``` **Step 6: Present result** Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. -Use a timestamped jobId: `$(date +%s)` - -**`inline` or `both`** (TTS quick returns a sync audio stream — no `audioUrl`): -```bash -JOB_ID=$(date +%s) -curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{"input": "...", "voice": "..."}' \ - --output /tmp/tts-${JOB_ID}.mp3 -``` -Then use the Read tool on `/tmp/tts-{jobId}.mp3`. +**`inline` or `both`**: Display the `audioUrl` as a clickable link. Present: ``` Audio generated! + +在线收听:{audioUrl} ``` -**`download` or `both`**: Generate a topic slug from the text content following `shared/config-pattern.md` § Artifact Naming. +**`download` or `both`**: Also download the file. Generate a topic slug from the text content following `shared/config-pattern.md` § Artifact Naming. ```bash SLUG="{topic-slug}" # e.g. "server-maintenance-notice" NAME="${SLUG}.mp3" # Dedup: if file exists, append -2, -3, etc. BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2 while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done -curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{"input": "...", "voice": "..."}' \ - --output "$NAME" +curl -sS -o "$NAME" "$AUDIO_URL" ``` Present: ``` @@ -210,7 +216,7 @@ Audio generated! --- -### Script Mode — `POST /v1/speech` +### Script Mode — `listenhub tts create --mode smart` **Step 1: Get scripts** @@ -221,7 +227,7 @@ Determine whether the user already has a scripts array: > "Please provide the script with speaker assignments. Format: each line as `SpeakerName: text content`. I'll convert it." - Once the user provides the script, parse it into the `scripts` JSON format. + Once the user provides the script, parse it into speaker-annotated text. **Step 2: Assign voices per character** @@ -258,31 +264,55 @@ Proceed? **Step 5: Generate** -Write the request body to a temp file, then submit: +Format the script text with speaker markers and submit. For multi-speaker scripts, include speaker names inline in the text. Run with `run_in_background: true` since script mode may take longer. +**Submit (foreground)** with `--no-wait`: ```bash -# Write request to temp file -cat > /tmp/lh-speech-request.json << 'ENDJSON' -{ - "scripts": [ - {"content": "...", "speakerId": "..."}, - {"content": "...", "speakerId": "..."} - ] -} -ENDJSON - -# Submit -curl -sS -X POST "https://api.marswave.ai/openapi/v1/speech" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d @/tmp/lh-speech-request.json - -rm /tmp/lh-speech-request.json +RESULT=$(listenhub tts create --text "{formatted script with speaker markers}" --mode smart --speaker "{name1}" --speaker "{name2}" --lang {lang} --no-wait --json) +ID=$(echo "$RESULT" | jq -r '.id') +echo "Submitted: $ID" +``` + +For long scripts, write to a temp file first: +```bash +cat > /tmp/lh-content.txt << 'ENDCONTENT' +SpeakerA: First line of dialogue +SpeakerB: Second line of dialogue +... +ENDCONTENT + +RESULT=$(listenhub tts create --text "$(cat /tmp/lh-content.txt)" --mode smart --speaker "{name1}" --speaker "{name2}" --lang {lang} --no-wait --json) +ID=$(echo "$RESULT" | jq -r '.id') + +rm -f /tmp/lh-content.txt +``` + +**Poll (background)** with `run_in_background: true` and `timeout: 600000`: +```bash +ID="" +for i in $(seq 1 60); do + RESULT=$(listenhub creation get "$ID" --json 2>/dev/null) + STATUS=$(echo "$RESULT" | jq -r '.status // "processing"') + + case "$STATUS" in + completed) echo "$RESULT"; exit 0 ;; + failed) echo "FAILED: $RESULT" >&2; exit 1 ;; + *) sleep 10 ;; + esac +done +echo "TIMEOUT" >&2; exit 2 ``` **Step 6: Present result** +When the background task completes, parse the result: +```bash +AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl') +SUBTITLES_URL=$(echo "$RESULT" | jq -r '.subtitlesUrl // empty') +DURATION=$(echo "$RESULT" | jq -r '.audioDuration // empty') +CREDITS=$(echo "$RESULT" | jq -r '.credits // empty') +``` + Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. **`inline` or `both`**: Display the `audioUrl` and `subtitlesUrl` as clickable links. @@ -304,7 +334,7 @@ NAME="${SLUG}.mp3" # Dedup: if file exists, append -2, -3, etc. BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2 while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done -curl -sS -o "$NAME" "{audioUrl}" +curl -sS -o "$NAME" "$AUDIO_URL" ``` Present: ``` @@ -324,15 +354,16 @@ When saving preferences, merge into `.listenhub/tts/config.json` — do not over ## API Reference -- TTS & Speech endpoints: `shared/api-tts.md` -- Speaker list: `shared/api-speakers.md` +- CLI execution patterns: `shared/cli-patterns.md` +- CLI authentication: `shared/cli-authentication.md` +- Speaker list: `shared/cli-speakers.md` - Speaker selection guide: `shared/speaker-selection.md` -- Error handling: `shared/common-patterns.md` § Error Handling -- Long text input: `shared/common-patterns.md` § Long Text Input +- Config pattern: `shared/config-pattern.md` +- Output mode: `shared/output-mode.md` ## Composability -- **Invokes**: speakers API (for speaker selection) +- **Invokes**: speakers CLI (for speaker selection) - **Invoked by**: explainer (for voiceover) ## Examples @@ -342,20 +373,30 @@ When saving preferences, merge into `.listenhub/tts/config.json` — do not over > "TTS this: The server will be down for maintenance at midnight." 1. Detect: Quick mode (plain text, "TTS this") -2. Read config: `quickVoice` is `null` -3. Fetch speakers, user picks "Yuanye" -4. Ask to save → yes → update config -5. `POST /v1/tts` with `input` + `voice` -6. Present: `/tmp/tts-output.mp3` +2. Read config: `defaultSpeakers.en` is empty +3. Use built-in default: Mars (`cozy-man-english`) +4. Confirm → user approves +5. Generate: + ```bash + RESULT=$(listenhub tts create --text "The server will be down for maintenance at midnight." --mode direct --speaker "Mars" --lang en --json) + AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl') + ``` +6. Present: display `audioUrl` as link (inline mode) **Script mode:** > "帮我做一段双人对话配音,A说:欢迎大家,B说:谢谢邀请" 1. Detect: Script mode ("双人对话") -2. Parse segments: A → "欢迎大家", B → "谢谢邀请" -3. Read config: `scriptVoices` empty -4. Fetch `zh` speakers, assign A and B voices -5. Ask to save → yes → update config -6. `POST /v1/speech` with scripts array -7. Present: `audioUrl`, `subtitlesUrl`, duration +2. Parse segments: A -> "欢迎大家", B -> "谢谢邀请" +3. Read config: `defaultSpeakers.zh` empty +4. Use built-in defaults: 原野 (Primary) + 高晴 (Secondary) +5. Confirm → user approves +6. Generate: + ```bash + RESULT=$(listenhub tts create --text "A: 欢迎大家 + B: 谢谢邀请" --mode smart --speaker "原野" --speaker "高晴" --lang zh --no-wait --json) + ID=$(echo "$RESULT" | jq -r '.id') + ``` +7. Poll in background until complete +8. Present: `audioUrl`, `subtitlesUrl`, duration From a3b57e534d8bca7680a266f16b6416a9781e5a5c Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:42:31 +0800 Subject: [PATCH 09/14] feat: add listenhub-cli router skill, sync listenhub skill - listenhub-cli/SKILL.md: new umbrella router with full routing table - listenhub/SKILL.md: identical content (alias), no longer deprecated - Delete DEPRECATED.md --- listenhub-cli/SKILL.md | 65 +++++++++++++++++++++++++++++++++++ listenhub/DEPRECATED.md | 14 -------- listenhub/SKILL.md | 76 +++++++++++++++++++++++++++++------------ 3 files changed, 119 insertions(+), 36 deletions(-) create mode 100644 listenhub-cli/SKILL.md delete mode 100644 listenhub/DEPRECATED.md diff --git a/listenhub-cli/SKILL.md b/listenhub-cli/SKILL.md new file mode 100644 index 0000000..3365367 --- /dev/null +++ b/listenhub-cli/SKILL.md @@ -0,0 +1,65 @@ +--- +name: listenhub-cli +description: | + ListenHub CLI skills router. Routes to the correct skill based on user intent. + Triggers on: "make a podcast", "explainer video", "read aloud", "TTS", + "generate image", "做播客", "解说视频", "朗读", "生成图片", "幻灯片", + "slides", "音乐", "music", "generate music", "翻唱", "cover song", + "parse URL", "解析链接", "提取内容". +metadata: + openclaw: + emoji: "🎧" + requires: + bin: ["listenhub"] + primaryBin: "listenhub" +--- + +## Purpose + +This is a router skill. When users trigger a general ListenHub action, this skill identifies the intent and delegates to the appropriate specialized skill. + +## Routing Table + +| User intent | Keywords | Route to | +|-------------|----------|----------| +| Podcast | "podcast", "播客", "debate", "dialogue" | `/podcast` | +| Explainer video | "explainer", "解说视频", "tutorial video" | `/explainer` | +| Slides / PPT | "slides", "幻灯片", "PPT", "presentation" | `/slides` | +| TTS / Read aloud | "TTS", "read aloud", "朗读", "配音", "语音合成" | `/tts` | +| Image generation | "generate image", "画一张", "生成图片", "AI图" | `/image-gen` | +| Music | "music", "音乐", "生成音乐", "翻唱", "cover" | `/music` | +| Content extraction | "parse URL", "extract content", "解析链接" | `/content-parser` | +| Audio transcription | "transcribe", "ASR", "语音转文字" | `/asr` | +| Creator workflow | "创作", "写公众号", "小红书", "口播" | `/creator` | + +## How to Route + +1. Read the user's message and identify which category it falls into +2. Tell the user which skill you're routing to +3. Follow that skill's SKILL.md completely + +If the intent is ambiguous, ask the user to clarify: + +``` +Question: "What would you like to create?" +Options: + - "Podcast" — Audio discussion on a topic + - "Explainer Video" — Narrated video with AI visuals + - "Slides" — Slide deck / presentation + - "Music" — AI-generated music or cover +``` + +## Prerequisites + +Most skills require the ListenHub CLI. Check: + +```bash +listenhub auth status --json +``` + +If not installed or not logged in, guide the user: + +1. Install: `npm install -g @marswave/listenhub-cli` +2. Login: `listenhub auth login` + +Exception: `/asr` runs locally and needs no CLI or API key. diff --git a/listenhub/DEPRECATED.md b/listenhub/DEPRECATED.md deleted file mode 100644 index db826e6..0000000 --- a/listenhub/DEPRECATED.md +++ /dev/null @@ -1,14 +0,0 @@ -# ListenHub Skill — DEPRECATED - -This monolithic skill has been decomposed into individual skills: - -- `/podcast` — Podcast generation (solo, dialogue, debate) -- `/explainer` — Explainer videos with narration and AI visuals -- `/tts` — Text-to-speech and voice narration -- `/image-gen` — AI image generation -- `/content-parser` — URL content extraction - -Shared infrastructure (API reference, authentication, common patterns) is in `shared/`. - -**Migration date**: 2026-03-04 -**Issue**: [MARS-3517](https://linear.app/marswave/issue/MARS-3517) diff --git a/listenhub/SKILL.md b/listenhub/SKILL.md index 2bc42e1..238bc82 100644 --- a/listenhub/SKILL.md +++ b/listenhub/SKILL.md @@ -1,33 +1,65 @@ --- name: listenhub description: | - DEPRECATED — replaced by individual skills. Use when the user triggers any - ListenHub action: "make a podcast", "explainer video", "read aloud", "TTS", - "generate image", "解说视频", "播客", "朗读", "生成图片". + ListenHub CLI skills router. Routes to the correct skill based on user intent. + Triggers on: "make a podcast", "explainer video", "read aloud", "TTS", + "generate image", "做播客", "解说视频", "朗读", "生成图片", "幻灯片", + "slides", "音乐", "music", "generate music", "翻唱", "cover song", + "parse URL", "解析链接", "提取内容". +metadata: + openclaw: + emoji: "🎧" + requires: + bin: ["listenhub"] + primaryBin: "listenhub" --- -## ⚠️ This skill has been replaced +## Purpose -The `listenhub` skill has been split into focused individual skills. +This is a router skill. When users trigger a general ListenHub action, this skill identifies the intent and delegates to the appropriate specialized skill. -**Tell the user:** +## Routing Table -> This skill is outdated. Please upgrade with: -> -> ```bash -> npx skills add marswaveai/skills -> ``` -> -> Then restart your agent. +| User intent | Keywords | Route to | +|-------------|----------|----------| +| Podcast | "podcast", "播客", "debate", "dialogue" | `/podcast` | +| Explainer video | "explainer", "解说视频", "tutorial video" | `/explainer` | +| Slides / PPT | "slides", "幻灯片", "PPT", "presentation" | `/slides` | +| TTS / Read aloud | "TTS", "read aloud", "朗读", "配音", "语音合成" | `/tts` | +| Image generation | "generate image", "画一张", "生成图片", "AI图" | `/image-gen` | +| Music | "music", "音乐", "生成音乐", "翻唱", "cover" | `/music` | +| Content extraction | "parse URL", "extract content", "解析链接" | `/content-parser` | +| Audio transcription | "transcribe", "ASR", "语音转文字" | `/asr` | +| Creator workflow | "创作", "写公众号", "小红书", "口播" | `/creator` | -After upgrading, the new skills work the same way — just say what you want: +## How to Route -| What you wanted | New skill | -|----------------|-----------| -| Make a podcast | `/podcast` | -| Explainer video | `/explainer` | -| Read aloud / TTS | `/tts` | -| Generate image | `/image-gen` | -| Extract URL content | `/content-parser` | +1. Read the user's message and identify which category it falls into +2. Tell the user which skill you're routing to +3. Follow that skill's SKILL.md completely -**Do not attempt the original task until the user has upgraded.** +If the intent is ambiguous, ask the user to clarify: + +``` +Question: "What would you like to create?" +Options: + - "Podcast" — Audio discussion on a topic + - "Explainer Video" — Narrated video with AI visuals + - "Slides" — Slide deck / presentation + - "Music" — AI-generated music or cover +``` + +## Prerequisites + +Most skills require the ListenHub CLI. Check: + +```bash +listenhub auth status --json +``` + +If not installed or not logged in, guide the user: + +1. Install: `npm install -g @marswave/listenhub-cli` +2. Login: `listenhub auth login` + +Exception: `/asr` runs locally and needs no CLI or API key. From d256fec7be625e3388628d12757f85ebc8aa3ab3 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:42:31 +0800 Subject: [PATCH 10/14] chore: remove old shared/ API docs (replaced by CLI + inlined) Delete 8 files: api-podcast, api-tts, api-image, api-storybook, api-content-extract, api-speakers, authentication, common-patterns --- shared/api-content-extract.md | 191 ---------------------------------- shared/api-image.md | 154 --------------------------- shared/api-podcast.md | 134 ------------------------ shared/api-speakers.md | 53 ---------- shared/api-storybook.md | 163 ----------------------------- shared/api-tts.md | 95 ----------------- shared/authentication.md | 59 ----------- shared/common-patterns.md | 181 -------------------------------- 8 files changed, 1030 deletions(-) delete mode 100644 shared/api-content-extract.md delete mode 100644 shared/api-image.md delete mode 100644 shared/api-podcast.md delete mode 100644 shared/api-speakers.md delete mode 100644 shared/api-storybook.md delete mode 100644 shared/api-tts.md delete mode 100644 shared/authentication.md delete mode 100644 shared/common-patterns.md diff --git a/shared/api-content-extract.md b/shared/api-content-extract.md deleted file mode 100644 index 0b11c96..0000000 --- a/shared/api-content-extract.md +++ /dev/null @@ -1,191 +0,0 @@ -# ListenHub API — Content Extract - -**Authentication**: See [authentication.md](./authentication.md) - -### POST /v1/content/extract - -Create a content extraction task for a URL. Returns a `taskId` for polling. - -**Request body:** - -| Field | Required | Type | Description | -|-------|----------|------|-------------| -| source | **Yes** | object | Source to extract from | -| source.type | **Yes** | string | Must be `"url"` | -| source.uri | **Yes** | string | Valid HTTP(S) URL to extract content from | -| options | No | object | Extraction options | -| options.summarize | No | boolean | Whether to generate a summary | -| options.maxLength | No | integer | Maximum content length | -| options.twitter | No | object | Twitter/X specific options | -| options.twitter.count | No | integer | Number of tweets to fetch (1-100, default 20) | - -**curl (basic):** - -```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ - "source": { - "type": "url", - "uri": "https://en.wikipedia.org/wiki/Topology" - } - }' -``` - -**curl (with options):** - -```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ - "source": { - "type": "url", - "uri": "https://x.com/elonmusk" - }, - "options": { - "summarize": true, - "maxLength": 5000, - "twitter": { - "count": 50 - } - } - }' -``` - -**Response:** - -```json -{ - "code": 0, - "message": "success", - "data": { - "taskId": "69a7dac700cf95938f86d9bb" - } -} -``` - -**Error codes:** - -| Code | Meaning | -|------|---------| -| 29003 | Validation error (`"source.uri" is required`, `"source.uri" must be a valid uri`) | -| 21007 | Invalid API key | - -### GET /v1/content/extract/{taskId} - -Get extraction task status and results. - -**Path params:** - -| Param | Type | Description | -|-------|------|-------------| -| taskId | string | 24-char hex task ID | - -**curl:** - -```bash -curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "X-Source: skills" -``` - -**Response (processing):** - -```json -{ - "code": 0, - "message": "success", - "data": { - "taskId": "69a7dac700cf95938f86d9bb", - "status": "processing", - "createdAt": "2025-04-09T12:00:00Z", - "data": null, - "credits": 0, - "failCode": null, - "message": null - } -} -``` - -**Response (completed):** - -```json -{ - "code": 0, - "message": "success", - "data": { - "taskId": "69a7dac700cf95938f86d9bb", - "status": "completed", - "createdAt": "2025-04-09T12:00:00Z", - "data": { - "content": "Extracted text content...", - "metadata": { - "title": "Article Title", - "author": "Author Name", - "publishedAt": "2025-04-01T08:00:00Z" - }, - "references": [ - "https://example.com/related-article" - ] - }, - "credits": 5, - "failCode": null, - "message": null - } -} -``` - -**Response (failed):** - -```json -{ - "code": 0, - "message": "success", - "data": { - "taskId": "69a7dac700cf95938f86d9bb", - "status": "failed", - "createdAt": "2025-04-09T12:00:00Z", - "data": null, - "credits": 0, - "failCode": "EXTRACT_FAILED", - "message": "Unable to extract content from the provided URL" - } -} -``` - -**Key fields:** - -| Field | Type | Description | -|-------|------|-------------| -| status | string | `processing`, `completed`, or `failed` | -| data.data.content | string | Extracted text content | -| data.data.metadata | object | Page metadata (title, author, publishedAt) | -| data.data.references | array | Referenced URLs (array of strings) | -| credits | integer | Credits consumed | -| failCode | string | Error code (null on success) | -| message | string | Error message (null on success) | - -**Error codes:** - -| Code | Meaning | -|------|---------| -| 29003 | Invalid taskId format | -| 25002 | Task not found | - -**Supported URL types:** - -| Category | Platforms | -|----------|----------| -| Video | YouTube, Bilibili | -| Social | Twitter/X (profiles and single tweets), WeChat articles | -| Documents | PDF, DOCX (direct URLs) | -| Images | JPEG, PNG, etc. (direct URLs) | -| Web | Any general web page (Wikipedia, arXiv, GitHub, etc.) | - -**Twitter/X notes:** -- For profile URLs (e.g. `https://x.com/username`), use `options.twitter.count` to control tweet count (1-100, default 20) -- This option is ignored for non-Twitter URLs diff --git a/shared/api-image.md b/shared/api-image.md deleted file mode 100644 index 2d980dd..0000000 --- a/shared/api-image.md +++ /dev/null @@ -1,154 +0,0 @@ -# ListenHub API — Image Generation - -**Base URL**: `https://api.marswave.ai/openapi/v1` -**Authentication**: Bearer `$LISTENHUB_API_KEY` (same key, different host) - -## POST /images/generation - -Generate an AI image from a text prompt. Synchronous — returns base64-encoded image data directly (no polling needed). - -**Request body:** - -| Field | Required | Type | Description | -|-------|----------|------|-------------| -| provider | Yes | string | Model provider. Use `"google"` | -| prompt | Yes | string | Image description (English recommended) | -| model | No | string | `"gemini-3-pro-image-preview"` (default) or `"gemini-3.1-flash-image-preview"` | -| imageConfig | No | object | Size and aspect ratio config | -| imageConfig.imageSize | No | string | `"1K"`, `"2K"` (default), or `"4K"` | -| imageConfig.aspectRatio | No | string | `"1:1"` (default). See aspect ratio table below. | -| referenceImages | No | array | Up to 14 reference images for style guidance (see format below) | - -**Aspect ratios:** - -| Ratio | Description | Models | -|-------|-------------|--------| -| 1:1 | Square | All | -| 2:3 | Portrait photo | All | -| 3:2 | Landscape photo | All | -| 3:4 | Poster portrait | All | -| 4:3 | Traditional landscape | All | -| 9:16 | Portrait / phone | All | -| 16:9 | Landscape / widescreen | All | -| 21:9 | Ultrawide | All | -| 1:4 | Narrow portrait | gemini-3.1-flash-image-preview only | -| 4:1 | Wide landscape | gemini-3.1-flash-image-preview only | -| 1:8 | Extreme narrow portrait | gemini-3.1-flash-image-preview only | -| 8:1 | Panoramic | gemini-3.1-flash-image-preview only | - -**referenceImages format:** - -Each item must have either `fileData` (URL) or `inlineData` (base64), not both. You can mix URL and base64 items in the same array. - -*URL-based reference:* - -```json -{ - "fileData": { - "fileUri": "https://example.com/photo.png", - "mimeType": "image/png" - } -} -``` - -Infer `mimeType` from URL suffix: `.jpg`/`.jpeg` → `image/jpeg`, `.png` → `image/png`, `.webp` → `image/webp`, `.gif` → `image/gif` - -*Base64 reference (inline):* - -```json -{ - "inlineData": { - "data": "", - "mimeType": "image/png" - } -} -``` - -Supported mimeTypes: `image/png`, `image/jpeg`, `image/webp`, `image/heic`, `image/heif` - -To encode a local file as base64: - -```bash -# macOS -BASE64_REF=$(base64 -i /path/to/image.png) - -# Linux -BASE64_REF=$(base64 -w 0 /path/to/image.png) -``` - -**Constraints:** -- Use `--max-time 600` (generation can take up to 10 minutes) -- On 429 (rate limit): wait 15s and retry. Max 3 retries. - -**curl (text-only):** - -```bash -RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - --max-time 600 \ - -d '{ - "provider": "google", - "model": "gemini-3-pro-image-preview", - "prompt": "cyberpunk city at night, neon lights, highly detailed", - "imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"} - }') -``` - -**curl (with base64 reference image):** - -```bash -BASE64_REF=$(base64 -i /path/to/reference.png) - -RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - --max-time 600 \ - -d "{ - \"provider\": \"google\", - \"model\": \"gemini-3-pro-image-preview\", - \"prompt\": \"cyberpunk city at night\", - \"imageConfig\": {\"imageSize\": \"2K\", \"aspectRatio\": \"16:9\"}, - \"referenceImages\": [{\"inlineData\": {\"data\": \"$BASE64_REF\", \"mimeType\": \"image/png\"}}] - }") -``` - -**Response:** - -```json -{ - "candidates": [ - { - "content": { - "parts": [ - { - "inlineData": { - "data": "", - "mimeType": "image/jpeg" - } - } - ] - } - } - ] -} -``` - -**Extract base64 data:** - -```bash -BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data') -``` - -**Save to file (macOS):** - -```bash -echo "$BASE64_DATA" | base64 -D > ~/Downloads/listenhub-$(date +%Y%m%d-%H%M%S)-0001.jpg -``` - -**Save to file (Linux):** - -```bash -echo "$BASE64_DATA" | base64 -d > ~/Downloads/listenhub-$(date +%Y%m%d-%H%M%S)-0001.jpg -``` diff --git a/shared/api-podcast.md b/shared/api-podcast.md deleted file mode 100644 index bd93def..0000000 --- a/shared/api-podcast.md +++ /dev/null @@ -1,134 +0,0 @@ -# ListenHub API — Podcast - -**Base URL**: `https://api.marswave.ai/openapi/v1` -**Authentication**: See [authentication.md](./authentication.md) - -## Podcast - -### POST /podcast/episodes - -Create a podcast episode. - -**Request body:** - -| Field | Required | Type | Description | -|-------|----------|------|-------------| -| speakers | **Yes** | array | 1-2 speaker objects `[{speakerId: "..."}]` | -| query | No | string | Topic or prompt text | -| sources | No | array | Content sources (see Sources format below) | -| language | No | string | `en` or `zh` | -| mode | No | string | `deep` or `quick` | - -**Sources format:** - -```json -[ - {"type": "url", "content": "https://example.com/article"}, - {"type": "text", "content": "Topic description or reference text..."} -] -``` - -**Constraints:** -- Max 2 speakers - -**curl:** - -```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/podcast/episodes" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ - "query": "The future of AI development", - "sources": [{"type": "text", "content": "Reference material about AI trends"}], - "speakers": [{"speakerId": "cozy-man-english"}], - "language": "en", - "mode": "deep" - }' -``` - -**Response:** - -```json -{ - "code": 0, - "message": "", - "data": { - "episodeId": "688c9a27348f001e707ba331" - } -} -``` - -### GET /podcast/episodes/{episodeId} - -Get podcast episode details and status. - -**Path params:** - -| Param | Type | Description | -|-------|------|-------------| -| episodeId | string | 24-char hex episode ID | - -**curl:** - -```bash -curl -sS "https://api.marswave.ai/openapi/v1/podcast/episodes/688c9a27348f001e707ba331" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "X-Source: skills" -``` - -**Response:** - -```json -{ - "code": 0, - "message": "", - "data": { - "episodeId": "688c9a27348f001e707ba331", - "createdAt": 1718230400, - "credits": 10, - "message": "success", - "failCode": 0, - "processStatus": "success", - "completedTime": 1718230400, - "sourceProcessResult": { - "content": "User-provided source text", - "references": [ - { - "type": "url", - "urlCitation": { - "title": "Reference Title", - "url": "https://example.com/reference", - "favicon": "https://example.com/favicon.ico" - } - } - ] - }, - "title": "My Podcast Title", - "outline": "This is the podcast outline.", - "cover": "https://example.com/cover.jpg", - "audioUrl": "https://gcs.example.com/audio.mp3", - "audioStreamUrl": "https://gcs.example.com/audio_stream.m3u8", - "scripts": [ - { - "speakerId": "speaker-1", - "speakerName": "Host A", - "content": "This is the first segment" - } - ] - } -} -``` - -**Key fields:** - -| Field | Type | Description | -|-------|------|-------------| -| processStatus | string | `pending`, `success`, or `failed` | -| audioUrl | string | Direct audio download URL | -| audioStreamUrl | string | M3U8 streaming URL | -| scripts | array | Script segments with speaker info and text | -| title | string | Generated episode title | -| outline | string | Generated outline | -| cover | string | Cover image URL | -| credits | integer | Credits consumed | diff --git a/shared/api-speakers.md b/shared/api-speakers.md deleted file mode 100644 index 560d0b6..0000000 --- a/shared/api-speakers.md +++ /dev/null @@ -1,53 +0,0 @@ -# ListenHub API — Speakers - -**Base URL**: `https://api.marswave.ai/openapi/v1` -**Authentication**: See [authentication.md](./authentication.md) - -## GET /speakers/list - -Get available voice speakers, optionally filtered by language. - -**Parameters (query string):** - -| Param | Required | Type | Description | -|-------|----------|------|-------------| -| language | No | string | Filter by language: `zh` or `en` | -| status | No | integer | Speaker status: `1` (active, default) or `2` | - -**curl:** - -```bash -curl -sS "https://api.marswave.ai/openapi/v1/speakers/list?language=en" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "X-Source: skills" -``` - -**Response:** - -```json -{ - "code": 0, - "message": "", - "data": { - "items": [ - { - "name": "Yuanye", - "speakerId": "cozy-man-english", - "demoAudioUrl": "https://example.com/demo.mp3", - "gender": "male", - "language": "en" - } - ] - } -} -``` - -**Fields:** - -| Field | Type | Description | -|-------|------|-------------| -| name | string | Display name | -| speakerId | string | ID to pass to creation endpoints | -| demoAudioUrl | string | Preview audio URL | -| gender | string | `male` or `female` | -| language | string | `zh` or `en` | diff --git a/shared/api-storybook.md b/shared/api-storybook.md deleted file mode 100644 index fc24404..0000000 --- a/shared/api-storybook.md +++ /dev/null @@ -1,163 +0,0 @@ -# ListenHub API — Storybook - -**Base URL**: `https://api.marswave.ai/openapi/v1` -**Authentication**: See [authentication.md](./authentication.md) - -Used by: -- `/explainer` skill — mode=`info` (factual/informational) or mode=`story` (narrative) -- `/slides` skill — mode=`slides` (PPT-style presentation) - ---- - -## POST /v1/storybook/episodes - -Create a storybook episode. Returns an `episodeId` immediately; generation runs asynchronously. - -**Request body:** - -| Field | Required | Type | Description | -|-------|----------|------|-------------| -| sources | **Yes** | array | Exactly 1 source object | -| sources[].type | **Yes** | string | `"text"` or `"url"` | -| sources[].content | **Yes** | string | Topic text or URL | -| speakers | **Yes** | array | Exactly 1 speaker: `[{"speakerId": "..."}]` | -| language | No | string | `"en"` or `"zh"` | -| mode | No | string | `"info"` (explainer), `"story"`, or `"slides"` (default: `"info"`) | -| style | No | string | Visual style hint (optional, free text) | - -**Constraints:** -- Exactly 1 source -- Max 1 speaker - -**curl:** - -```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/storybook/episodes" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ - "sources": [{"type": "text", "content": "The history of the Roman Empire"}], - "speakers": [{"speakerId": "cozy-man-english"}], - "language": "en", - "mode": "slides" - }' -``` - -**Response:** - -```json -{ - "code": 0, - "message": "", - "data": { - "episodeId": "688c9a27348f001e707ba331" - } -} -``` - ---- - -## GET /v1/storybook/episodes/{episodeId} - -Get storybook episode status and result. - -**Path params:** - -| Param | Type | Description | -|-------|------|-------------| -| episodeId | string | 24-char hex episode ID | - -**curl:** - -```bash -curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/688c9a27348f001e707ba331" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "X-Source: skills" -``` - -**Response:** - -```json -{ - "code": 0, - "message": "", - "data": { - "episodeId": "688c9a27348f001e707ba331", - "createdAt": 1718230400, - "mode": "slides", - "processStatus": "success", - "completedTime": 1718230450, - "credits": 10, - "message": "success", - "failCode": 0, - "title": "The Roman Empire", - "cover": "https://example.com/cover.jpg", - "audioUrl": "https://gcs.example.com/audio.mp3", - "audioDuration": 120, - "videoUrl": null, - "videoStatus": "not_generated", - "pages": [ - { - "text": "The Roman Empire began in 27 BC...", - "pageNumber": 1, - "imageUrl": "https://example.com/page1.jpg", - "audioTimestamp": 0 - } - ], - "sourceProcessResult": { - "query": "The history of the Roman Empire", - "content": "Processed source text...", - "imageSources": [] - } - } -} -``` - -**Key fields:** - -| Field | Type | Description | -|-------|------|-------------| -| processStatus | string | `"pending"`, `"success"`, or `"failed"` | -| mode | string | `"info"`, `"story"`, or `"slides"` | -| pages | array | Slide pages — each has `text`, `pageNumber`, `imageUrl`, `audioTimestamp` | -| audioUrl | string | Narration audio URL | -| audioDuration | number | Audio length in seconds | -| videoUrl | string | Video URL (null until generated via video endpoint) | -| videoStatus | string | `"not_generated"`, `"pending"`, `"success"`, `"failed"` | -| credits | integer | Credits consumed | -| failCode | number | Non-zero on failure | - ---- - -## POST /v1/storybook/episodes/{episodeId}/video - -Trigger video generation for a completed storybook episode. Video combines the page images with narration audio. - -**Path params:** - -| Param | Type | Description | -|-------|------|-------------| -| episodeId | string | 24-char hex episode ID (must be `processStatus=success`) | - -**curl:** - -```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/storybook/episodes/688c9a27348f001e707ba331/video" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "X-Source: skills" -``` - -**Response:** - -```json -{ - "code": 0, - "message": "", - "data": { - "success": true - } -} -``` - -After calling this endpoint, poll `GET /v1/storybook/episodes/{episodeId}` and wait for `videoStatus=success`. Then `videoUrl` will contain the video URL. diff --git a/shared/api-tts.md b/shared/api-tts.md deleted file mode 100644 index 2ce7466..0000000 --- a/shared/api-tts.md +++ /dev/null @@ -1,95 +0,0 @@ -# ListenHub API — TTS - -**Base URL**: `https://api.marswave.ai/openapi/v1` -**Authentication**: See [authentication.md](./authentication.md) - ---- - -## POST /v1/tts - -Low-latency single-voice TTS. Returns a **streaming binary MP3** — not JSON. - -**Request body:** - -| Field | Required | Type | Description | -|-------|----------|------|-------------| -| input | Yes | string | Text to convert | -| voice | Yes | string | Speaker ID (`speakerId` from speakers API) | -| model | No | string | Model name, defaults to `flowtts` | - -**curl:** - -```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ - "input": "Hello, welcome to ListenHub.", - "voice": "EN-Man-General-01" - }' \ - --output /tmp/tts-output.mp3 -``` - -**Response:** Binary MP3 audio stream. On error, falls back to a JSON error object (check HTTP status code first). - -**Key constraints:** -- Max ~10,000 characters for `input` -- `voice` must be a valid `speakerId` from `GET /speakers/list` - ---- - -## POST /v1/speech - -Multi-speaker script-to-audio. Each script segment uses a different voice. Returns audio URL **synchronously**. - -**Request body:** - -| Field | Required | Type | Description | -|-------|----------|------|-------------| -| scripts | Yes | array | Ordered array of script segments | -| scripts[].content | Yes | string | Text for this segment | -| scripts[].speakerId | Yes | string | Speaker ID for this segment | -| title | No | string | Custom title (auto-generated if omitted) | - -**curl:** - -```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/speech" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ - "scripts": [ - {"content": "Welcome everyone.", "speakerId": "EN-Man-General-01"}, - {"content": "Today we discuss an interesting topic.", "speakerId": "EN-Woman-General-01"}, - {"content": "Let us begin.", "speakerId": "EN-Man-General-01"} - ] - }' -``` - -**Response:** - -```json -{ - "code": 0, - "message": "", - "data": { - "audioUrl": "https://assets.listenhub.ai/listenhub-public-prod/podcast/example.mp3", - "audioDuration": 12500, - "subtitlesUrl": "https://assets.listenhub.ai/listenhub-public-prod/podcast/example.srt", - "taskId": "1eed39d387a046c0a1213e6b8f139d77", - "credits": 12 - } -} -``` - -**Response fields:** - -| Field | Type | Description | -|-------|------|-------------| -| audioUrl | string | MP3 audio file URL | -| audioDuration | integer | Duration in milliseconds | -| subtitlesUrl | string | SRT subtitle file URL | -| taskId | string | Task identifier | -| credits | integer | Credits consumed | diff --git a/shared/authentication.md b/shared/authentication.md deleted file mode 100644 index 144571b..0000000 --- a/shared/authentication.md +++ /dev/null @@ -1,59 +0,0 @@ -# Authentication - -## API Key - -All ListenHub API calls require a valid API key. - -**Environment variable**: `LISTENHUB_API_KEY` - -Store in `~/.zshrc` (macOS) or `~/.bashrc` (Linux): - -```bash -export LISTENHUB_API_KEY="lh_sk_..." -``` - -Reload after adding: - -```bash -source ~/.zshrc -``` - -**How to obtain**: Visit https://listenhub.ai/settings/api-keys (Pro plan required). - -## Base URLs - -| Service | Base URL | -|---------|----------| -| ListenHub API | `https://api.marswave.ai/openapi/v1` | -| Image Generation | `https://api.marswave.ai/openapi/v1` | -| Staging (ListenHub) | `https://staging-api.marswave.ai/openapi/v1` | - -## Required Headers - -Every request must include: - -``` -Authorization: Bearer $LISTENHUB_API_KEY -Content-Type: application/json -X-Source: skills -``` - -The `X-Source: skills` header identifies requests as coming from Claude Code skills (CLI tool), distinguishing them from `openapi` (web) or other sources on the server side. - -## curl Template - -```bash -curl -sS -X POST "https://api.marswave.ai/openapi/v1/{endpoint}" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ ... }' -``` - -For GET requests, omit `-d` and change `-X POST` to `-X GET`. - -## Security Notes - -- Never log or display full API keys in output -- API keys are transmitted via HTTPS only -- Do not pass sensitive or confidential information as content input — it is sent to external APIs for processing diff --git a/shared/common-patterns.md b/shared/common-patterns.md deleted file mode 100644 index faca63e..0000000 --- a/shared/common-patterns.md +++ /dev/null @@ -1,181 +0,0 @@ -# Common Patterns - -Reusable patterns for all skills that call ListenHub APIs. - - -**Language Adaptation**: Always respond in the user's language. Chinese input → Chinese output. English input → English output. Mixed → follow dominant language. This applies to all UI text, questions, confirmations, and error messages. - - -## Async Polling - -Most generation endpoints are asynchronous: submit a task, get an ID, then poll until completion. - -### Execution Model - -All polling MUST run in the background using Bash `run_in_background: true`. This keeps the terminal responsive while the task processes. - -**Two-step pattern:** - -1. **Submit (foreground)**: POST the creation request, extract the task/episode ID from the response. This is fast and runs in the foreground. -2. **Poll (background)**: Run the polling loop with `run_in_background: true`. You will be notified automatically when it completes — do NOT sleep or poll manually. - -### Step 1: Submit (foreground) - -```bash -RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/podcast/episodes" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d '{ ... }') - -EPISODE_ID=$(echo "$RESPONSE" | jq -r '.data.episodeId') -echo "Submitted: $EPISODE_ID" -``` - -After this returns, tell the user the task is submitted and polling will run in the background. - -### Step 2: Poll (background) - -Run this as a **separate Bash call** with `run_in_background: true`: - -```bash -# Poll until complete — runs in background -EPISODE_ID="" -for i in $(seq 1 30); do - RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/podcast/episodes/$EPISODE_ID" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "X-Source: skills" 2>/dev/null) - - STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.processStatus // "pending"') - - case "$STATUS" in - success|completed) echo "$RESULT"; exit 0 ;; - failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; - *) sleep 10 ;; - esac -done -echo "TIMEOUT" >&2; exit 2 -``` - -### Polling Parameters - -| Parameter | Default | Notes | -|-----------|---------|-------| -| Interval | 10s | Use 5s for content-parser only | -| Max polls | 30 | = 300s timeout at 10s interval | -| Timeout (Bash) | 600000 | Set `timeout: 600000` on the Bash tool call | - -### After Completion - -When the background task finishes, you will be notified with the output. Parse the result and present it to the user. If the task failed or timed out, report the error. - -## Standard Response Structure - -All API responses follow this format: - -```json -{ - "code": 0, - "message": "", - "data": { ... } -} -``` - -- `code: 0` = success -- Non-zero `code` = error (see Error Handling below) - -## Error Handling - -### HTTP Status Codes - -| Code | Meaning | Action | -|------|---------|--------| -| 200 | Success | Parse response body | -| 400 | Bad request | Check parameters | -| 401 | Invalid API key | Re-check `LISTENHUB_API_KEY` | -| 402 | Insufficient credits | Inform user to recharge | -| 403 | Forbidden | No permission for this resource | -| 429 | Rate limited | Exponential backoff, retry after delay | -| 500/502/503/504 | Server error | Retry up to 3 times | - -### Retry Strategy - -- **429 rate limit**: Wait 15 seconds, then retry (exponential backoff) -- **5xx server errors**: Retry up to 3 times with 5-second intervals -- **Network errors**: Retry up to 3 times - -### Application Error Codes - -| Code | Meaning | -|------|---------| -| 21007 | Invalid user API key | -| 25429 | Rate limited (application-level) | - -## Input Validation - -| Constraint | Rule | -|-----------|------| -| URL format | Must be valid HTTP(S) URL | -| Text content length | Max 10,000 characters for TTS | -| Supported languages | `zh` (Chinese), `en` (English) | -| ID format | Alphanumeric + hyphen + underscore only | -| Episode ID format | 24-character hex string (MongoDB ObjectId) | - -## Long Text Input - -When `sources` content is long (e.g., a full article), passing it inline in `-d '{...}'` may hit shell argument length limits. Use `@file` to read the request body from a file: - -```bash -# Write request JSON to a temp file -cat > /tmp/lh-request.json << 'ENDJSON' -{ - "sources": [{"type": "text", "content": "Very long text content goes here..."}], - "speakers": [{"speakerId": "cozy-man-english"}], - "language": "en" -} -ENDJSON - -# Reference the file with @ -curl -sS -X POST "https://api.marswave.ai/openapi/v1/podcast/episodes" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d @/tmp/lh-request.json -``` - -**When to use `@file`**: Always use this approach when text content exceeds a few KB. The `@` prefix tells curl to read the body from the file, bypassing shell argument limits entirely. - -**Cleanup**: Remove the temp file after use: `rm /tmp/lh-request.json` - -## Interactive Parameter Collection - -Skills must use the **AskUserQuestion tool** for all enumerable parameters, following a **conversational, step-by-step** approach. This renders an interactive picker in the terminal that users can navigate with arrow keys. - -### Conversation Behavior (mandatory) - -1. **One question at a time.** Ask a single question, then STOP and wait for the user's answer before proceeding to the next step. Do not batch multiple steps into one message unless the parameters are explicitly independent (e.g., resolution + aspect ratio). -2. **Wait for the answer.** Never assume a default and skip ahead. If the user hasn't answered, do not proceed. -3. **Confirm before executing.** After all parameters are collected, summarize the choices and ask the user to confirm before calling any API. This is the final gate. -4. **Be ready to go back.** If the user changes their mind or says something doesn't look right, revise and re-ask instead of pushing forward. - -### How to Ask - -**Always use the AskUserQuestion tool** — do NOT print questions as plain text. Each step's `Question` and `Options` map directly to AskUserQuestion parameters: - -``` -Step definition in SKILL.md: → AskUserQuestion tool call: - -Question: "What language?" → question: "What language?" - - "Chinese (zh)" — Mandarin → options: [{label: "Chinese (zh)", description: "Mandarin"} - - "English (en)" — English → {label: "English (en)", description: "English"}] -``` - -For **free text** steps (topic, URL, prompt), just ask the question in a normal text message and wait for the user to type their answer. - -### Parameter Types - -- **Multiple-choice → AskUserQuestion**: language, mode, speaker count, generation style, resolution, aspect ratio -- **Free text → normal message**: topic, content body, URL, image prompt -- **Sequential when dependent**: e.g., speaker list depends on language choice — ask language first, then fetch speakers and present list -- **Batch when independent**: e.g., resolution + aspect ratio can be asked together in one AskUserQuestion call (multiple questions) -- **Options include descriptions**: not just labels — explain what each choice means From 9ffbc38d9dd5563e668de916f9a2458e2b0dfedc Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:42:31 +0800 Subject: [PATCH 11/14] refactor: update creator/ and asr to use CLI commands and new shared/ docs --- asr/SKILL.md | 2 +- creator/SKILL.md | 52 +++++++---------------- creator/templates/narration/template.md | 16 ++----- creator/templates/wechat/template.md | 2 +- creator/templates/xiaohongshu/template.md | 2 +- 5 files changed, 22 insertions(+), 52 deletions(-) diff --git a/asr/SKILL.md b/asr/SKILL.md index 2b2da67..643b6d3 100644 --- a/asr/SKILL.md +++ b/asr/SKILL.md @@ -33,7 +33,7 @@ Run `coli asr --help` for current CLI options and supported flags. - No shell scripts. Use direct commands only. - Always read config following `shared/config-pattern.md` before any interaction -- Follow `shared/common-patterns.md` for interaction patterns +- Follow `shared/cli-patterns.md` for interaction patterns - Never ask more than one question at a time diff --git a/creator/SKILL.md b/creator/SKILL.md index 78c6e90..ef81994 100644 --- a/creator/SKILL.md +++ b/creator/SKILL.md @@ -31,9 +31,9 @@ Generate platform-specific content packages by orchestrating existing skills. In ## Hard Constraints -- No shell scripts. Construct curl commands from the API reference files in `shared/` +- Use `listenhub` CLI commands for image-gen and TTS. Use curl for content-parser (see `content-parser/SKILL.md` § API Reference). - Always read config following `shared/config-pattern.md` before any interaction -- Follow `shared/common-patterns.md` for polling, errors, and interaction patterns +- Follow `shared/cli-patterns.md` for polling, errors, and interaction patterns - Never save files to `~/Downloads/` or `.listenhub/` — save content packages to the current working directory - JSON parsing: use `jq` only (no python3, awk) @@ -46,7 +46,7 @@ Use AskUserQuestion for every multiple-choice step. One question at a time. Wait -API Key Check at Confirmation Gate: If the pipeline includes any remote API call (image-gen, content-parser, tts), check `LISTENHUB_API_KEY` before proceeding. If missing, run interactive setup from `shared/authentication.md`. Pure text-only pipelines (e.g., topic → narration script without TTS) can proceed without an API key. +API Key Check at Confirmation Gate: If the pipeline includes any remote API call (image-gen, content-parser, tts), check authentication before proceeding. For CLI-based calls (image-gen, TTS), run `listenhub auth login` if not authenticated. For content-parser calls, configure `LISTENHUB_API_KEY` (see `content-parser/SKILL.md` § Authentication). Pure text-only pipelines (e.g., topic → narration script without TTS) can proceed without authentication. ## Step -1: API Key Check @@ -208,7 +208,7 @@ Otherwise: - Narration without TTS → no API key needed - Web/article URL input → needs content-parser → requires API key (audio/video URLs use local `coli asr`, no API key needed) -If API key required and missing: run `shared/authentication.md` interactive setup. +If API key required and missing: for CLI-based calls, run `listenhub auth login`. For content-parser calls, configure `LISTENHUB_API_KEY` (see `content-parser/SKILL.md` § Authentication). **Show confirmation summary:** @@ -250,7 +250,7 @@ RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" TASK_ID=$(echo "$RESPONSE" | jq -r '.data.taskId') ``` -Then poll in background. Run this as a **separate Bash call** with `run_in_background: true` and `timeout: 600000` (per `shared/common-patterns.md`). The polling loop itself runs up to 300s (60 polls × 5s); `timeout: 600000` is set higher at the tool level to give the Bash process headroom beyond the poll budget: +Then poll in background. Run this as a **separate Bash call** with `run_in_background: true` and `timeout: 600000` (per `shared/cli-patterns.md`). The polling loop itself runs up to 300s (60 polls × 5s); `timeout: 600000` is set higher at the tool level to give the Bash process headroom beyond the poll budget: ```bash # Run with: run_in_background: true, timeout: 600000 @@ -283,17 +283,10 @@ If extraction fails: tell user "URL 解析失败,你可以直接粘贴文字 **For image generation** (called by wechat and xiaohongshu templates): ```bash -RESPONSE=$(curl -sS -X POST "https://api.marswave.ai/openapi/v1/images/generation" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - --max-time 600 \ - -d '{ - "provider": "google", - "model": "gemini-3-pro-image-preview", - "prompt": "", - "imageConfig": {"imageSize": "2K", "aspectRatio": ""} - }') +RESPONSE=$(listenhub image create \ + --prompt "" \ + --aspect-ratio "" \ + --json) BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data') # macOS uses -D, Linux uses -d (detect platform) @@ -310,22 +303,9 @@ Generate images **sequentially** (not parallel) to respect rate limits. **For TTS** (called by narration template when user wants audio): -Use `@file` pattern per `shared/common-patterns.md` to handle special chars in script text: - ```bash -# Write TTS request to temp file (handles quotes, newlines safely) -cat > /tmp/creator-tts-request.json << ENDJSON -{"input": $(echo "$SCRIPT_TEXT" | jq -Rs .), "voice": "$SPEAKER_ID"} -ENDJSON - -curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d @/tmp/creator-tts-request.json \ - --output "{slug}-narration/audio.mp3" - -rm /tmp/creator-tts-request.json +listenhub tts create --text "$(cat /tmp/lh-content.txt)" --speaker "$SPEAKER_ID" --json \ + | jq -r '.data' | base64 -D > "{slug}-narration/audio.mp3" ``` ### Step 6: Assemble Output @@ -401,13 +381,13 @@ If the user says "重置风格偏好" or "reset style": ## API Reference -- Authentication & headers: `shared/authentication.md` -- Image generation: `shared/api-image.md` -- Content extraction: `shared/api-content-extract.md` -- TTS (text-to-speech): `shared/api-tts.md` +- Authentication: `shared/cli-authentication.md` +- Image generation: CLI: `listenhub image create` (see `shared/cli-patterns.md`) +- Content extraction: `content-parser/SKILL.md` § API Reference (Inlined) +- TTS (text-to-speech): CLI: `listenhub tts create` (see `shared/cli-patterns.md`) - Speaker selection: `shared/speaker-selection.md` - Config pattern: `shared/config-pattern.md` -- Common patterns (polling, errors): `shared/common-patterns.md` +- Common patterns (polling, errors): `shared/cli-patterns.md` - Output mode: `shared/output-mode.md` ## Composability diff --git a/creator/templates/narration/template.md b/creator/templates/narration/template.md index af6be10..1bc33e3 100644 --- a/creator/templates/narration/template.md +++ b/creator/templates/narration/template.md @@ -32,20 +32,10 @@ If generating audio: - English: "Mars" (`cozy-man-english`) - On first TTS use, ask the user via AskUserQuestion if they want to choose a different speaker. Save their choice to `preferences.narration.defaultSpeaker` for future runs. -2. Call TTS API (use `@file` pattern for safe text handling per `shared/common-patterns.md`): +2. Call TTS API: ```bash -cat > /tmp/creator-tts-request.json << ENDJSON -{"input": $(echo "$SCRIPT_TEXT" | jq -Rs .), "voice": "$SPEAKER_ID"} -ENDJSON - -curl -sS -X POST "https://api.marswave.ai/openapi/v1/tts" \ - -H "Authorization: Bearer $LISTENHUB_API_KEY" \ - -H "Content-Type: application/json" \ - -H "X-Source: skills" \ - -d @/tmp/creator-tts-request.json \ - --output "{output}/audio.mp3" - -rm /tmp/creator-tts-request.json +listenhub tts create --text "$(cat /tmp/lh-content.txt)" --speaker "$SPEAKER_ID" --json \ + | jq -r '.data' | base64 -D > "{output}/audio.mp3" ``` Note: TTS max input is ~10,000 characters. For longer scripts, this is still well within limits for narration (typically 300-2000 chars). diff --git a/creator/templates/wechat/template.md b/creator/templates/wechat/template.md index 2931889..aa55e20 100644 --- a/creator/templates/wechat/template.md +++ b/creator/templates/wechat/template.md @@ -51,7 +51,7 @@ For each planned illustration, call the image generation API: - **Model**: `gemini-3-pro-image-preview` - **Cover**: aspect ratio `3:2`, size `2K` - **Body images**: aspect ratio `3:2` or `16:9`, size `2K` -- **Timeout**: `--max-time 600` on curl (per `shared/api-image.md`) +- **Timeout**: `--timeout 600` (use `listenhub image create --json`) Save images to `{output}/images/cover.jpg`, `{output}/images/section-1.jpg`, etc. diff --git a/creator/templates/xiaohongshu/template.md b/creator/templates/xiaohongshu/template.md index 39ce5bc..3cf781b 100644 --- a/creator/templates/xiaohongshu/template.md +++ b/creator/templates/xiaohongshu/template.md @@ -114,7 +114,7 @@ For each prompt in `prompts.json`: - **Model**: `gemini-3-pro-image-preview` - **Aspect ratio**: `3:4` (portrait, standard Xiaohongshu card) - **Size**: `2K` -- **Timeout**: `--max-time 600` on curl (per `shared/api-image.md`) +- **Timeout**: `--timeout 600` (use `listenhub image create --json`) Save to `{output}/cards/01-cover.jpg`, `{output}/cards/02-page.jpg`, etc. From 618ec73ebcbd9a58f2824601a0d19facc57b9fd4 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:42:31 +0800 Subject: [PATCH 12/14] docs: update READMEs with slides, music, CLI auth --- README.md | 18 +++++++++++++++--- README.zh.md | 18 +++++++++++++++--- 2 files changed, 30 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index ed37d75..9e24104 100644 --- a/README.md +++ b/README.md @@ -51,6 +51,8 @@ Restart your agent (Claude Code, Cursor, etc.) after updating. | `/content-parser` | "parse this URL", "解析链接" | Extract content from URLs | | `/asr` | "transcribe", "语音转文字", "ASR" | Transcribe audio files to text | | `/creator` | "创作", "写公众号", "小红书", "口播" | Creator workflow — platform-ready content packages | +| `/slides` | "slides", "幻灯片" | Create slide decks with AI visuals | +| `/music` | "music", "音乐" | AI music generation and covers | ## Supported Inputs @@ -60,12 +62,19 @@ Restart your agent (Claude Code, Cursor, etc.) after updating. - Plain text - Image prompts - Audio files +- Music prompts +- Reference audio files ## Setup -**ListenHub API Key** — [Get yours](https://listenhub.ai/settings/api-keys) (Pro plan required) +**ListenHub CLI** — Install and login: -Keys auto-configure on first use. +```bash +npm install -g @marswave/listenhub-cli +listenhub auth login +``` + +**Note:** `/content-parser` and `/creator` still require a [ListenHub API Key](https://listenhub.ai/settings/api-keys) for content extraction. ## Directory Structure @@ -78,7 +87,10 @@ Keys auto-configure on first use. ├── content-parser/ # URL content extraction ├── asr/ # Audio transcription ├── creator/ # Creator workflow (WeChat, Xiaohongshu, narration) -└── listenhub/ # Deprecated (see DEPRECATED.md) +├── slides/ # Slide deck generation +├── music/ # AI music generation and covers +├── listenhub-cli/ # CLI authentication and setup +└── listenhub/ # Router skill (alias for listenhub-cli) ``` ## Supported Clients diff --git a/README.zh.md b/README.zh.md index 5d239ec..eeb3f5a 100644 --- a/README.zh.md +++ b/README.zh.md @@ -51,6 +51,8 @@ git pull origin main | `/content-parser` | "解析链接"、"提取内容" | URL 内容提取 | | `/asr` | "转录"、"语音转文字"、"ASR" | 音频文件转文字 | | `/creator` | "创作"、"写公众号"、"小红书"、"口播" | 创作者工作流 — 一键生成平台内容包 | +| `/slides` | "幻灯片"、"slides" | 幻灯片生成 | +| `/music` | "音乐"、"music" | AI 音乐生成、翻唱 | ## 支持的输入 @@ -60,12 +62,19 @@ git pull origin main - 纯文本 - 图片描述 - 音频文件 +- 音乐描述 +- 参考音频文件 ## 配置 -**ListenHub API Key** — [获取](https://listenhub.ai/zh/settings/api-keys)(Pro 订阅) +**ListenHub CLI** — 安装并登录: -首次使用时自动配置。 +```bash +npm install -g @marswave/listenhub-cli +listenhub auth login +``` + +**注意:** `/content-parser` 和 `/creator` 仍需要 [ListenHub API Key](https://listenhub.ai/zh/settings/api-keys) 用于内容提取。 ## 目录结构 @@ -78,7 +87,10 @@ git pull origin main ├── content-parser/ # URL 内容提取 ├── asr/ # 音频转文字 ├── creator/ # 创作者工作流 -└── listenhub/ # 已弃用(见 DEPRECATED.md) +├── slides/ # 幻灯片生成 +├── music/ # AI 音乐生成、翻唱 +├── listenhub-cli/ # CLI 认证与配置 +└── listenhub/ # 路由 skill(listenhub-cli 的别名) ``` ## 支持的客户端 From 5a14ef425607cdb516325403b04b56ce8d946498 Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 11:43:00 +0800 Subject: [PATCH 13/14] refactor: simplify explainer, image-gen, music, slides skill docs --- explainer/SKILL.md | 83 ++++--------- image-gen/SKILL.md | 4 +- music/SKILL.md | 294 +++++++++++++++++---------------------------- slides/SKILL.md | 225 +++++++++++++++------------------- 4 files changed, 231 insertions(+), 375 deletions(-) diff --git a/explainer/SKILL.md b/explainer/SKILL.md index 0caad7b..0c9e70d 100644 --- a/explainer/SKILL.md +++ b/explainer/SKILL.md @@ -177,54 +177,28 @@ Wait for explicit confirmation before running any CLI command. ## Workflow -1. **Submit (foreground)**: Run with `--no-wait` to get the creation ID immediately: +Run the CLI command with `run_in_background: true` and `timeout: 660000`. The CLI blocks until generation completes and returns the final result as JSON: - ```bash - RESULT=$(listenhub explainer create \ - --query "{topic}" \ - --mode {info|story} \ - --lang {en|zh|ja} \ - --speaker "{name}" \ - --speaker-id "{id}" \ - --no-wait \ - --json) - - if [ $? -ne 0 ]; then - echo "Error: $RESULT" >&2 - exit 1 - fi - - ID=$(echo "$RESULT" | jq -r '.id') - echo "Submitted: $ID" - ``` - - **Optional flags** (add when applicable): - - `--source-url "{url}"` — if the user provided a reference URL - - `--skip-audio` — if text-only output (no video) - - `--image-size {2K|4K}` — image resolution (default: 2K) - - `--aspect-ratio {16:9|9:16|1:1}` — video aspect ratio (default: 16:9) - - `--style "{style}"` — visual style for AI-generated images - -2. Tell the user the task is submitted. +```bash +listenhub explainer create \ + --query "{topic}" \ + --mode {info|story} \ + --lang {en|zh|ja} \ + --speaker "{name}" \ + --speaker-id "{id}" \ + --json +``` -3. **Poll (background)**: Run the following with `run_in_background: true` and `timeout: 660000`: +If the command fails (non-zero exit), check stderr for error details. See `shared/cli-patterns.md` § Error Handling for exit codes and common errors. - ```bash - ID="" - for i in $(seq 1 60); do - RESULT=$(listenhub creation get "$ID" --json 2>/dev/null) - STATUS=$(echo "$RESULT" | jq -r '.status // "processing"') - - case "$STATUS" in - completed) echo "$RESULT"; exit 0 ;; - failed) echo "FAILED: $RESULT" >&2; exit 1 ;; - *) sleep 10 ;; - esac - done - echo "TIMEOUT" >&2; exit 2 - ``` +**Optional flags** (add when applicable): +- `--source-url "{url}"` — if the user provided a reference URL +- `--skip-audio` — if text-only output (no video) +- `--image-size {2K|4K}` — image resolution (default: 2K) +- `--aspect-ratio {16:9|9:16|1:1}` — video aspect ratio (default: 16:9) +- `--style "{style}"` — visual style for AI-generated images -4. When notified, **parse and present result**: +Tell the user the task is submitted. When notified of completion, **parse and present result**: Parse the CLI JSON output for key fields: ```bash @@ -271,7 +245,7 @@ Wait for explicit confirmation before running any CLI command. - Write `script.md` inside - Download audio: ```bash - curl -sS -o "{slug}-explainer/audio.mp3" "{audioUrl}" + listenhub download "{audioUrl}" -o "{slug}-explainer/audio.mp3" ``` - Present: ``` @@ -324,27 +298,14 @@ echo "$NEW_CONFIG" > "$CONFIG_PATH" 5. Ask output → "Text + Video" ```bash -# Submit -RESULT=$(listenhub explainer create \ +# Run with run_in_background: true, timeout: 660000 +listenhub explainer create \ --query "Introduce Claude Code: what it is, key features, and how to get started" \ --mode info \ --lang en \ --speaker "Mars" \ --speaker-id "cozy-man-english" \ - --no-wait \ - --json) -ID=$(echo "$RESULT" | jq -r '.id') - -# Poll (run_in_background: true, timeout: 660000) -for i in $(seq 1 60); do - RESULT=$(listenhub creation get "$ID" --json 2>/dev/null) - STATUS=$(echo "$RESULT" | jq -r '.status // "processing"') - case "$STATUS" in - completed) echo "$RESULT"; exit 0 ;; - failed) echo "FAILED: $RESULT" >&2; exit 1 ;; - *) sleep 10 ;; - esac -done + --json ``` Parse result for `episodeId`, `audioUrl`, `videoUrl`, `credits`, and present to user. diff --git a/image-gen/SKILL.md b/image-gen/SKILL.md index 9d60e0f..97a58db 100644 --- a/image-gen/SKILL.md +++ b/image-gen/SKILL.md @@ -207,7 +207,7 @@ Wait for explicit confirmation before running the CLI command. ```bash JOB_ID=$(date +%s) - curl -sS -o /tmp/image-gen-${JOB_ID}.jpg "$IMAGE_URL" + listenhub download "$IMAGE_URL" -o /tmp/image-gen-${JOB_ID}.jpg ``` Then use the Read tool on `/tmp/image-gen-{jobId}.jpg`. The image displays inline in the conversation. @@ -223,7 +223,7 @@ Wait for explicit confirmation before running the CLI command. DATE=$(date +%Y-%m-%d) JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}" mkdir -p "$JOB_DIR" - curl -sS -o "${JOB_DIR}/${JOB_ID}.jpg" "$IMAGE_URL" + listenhub download "$IMAGE_URL" -o "${JOB_DIR}/${JOB_ID}.jpg" ``` Present: diff --git a/music/SKILL.md b/music/SKILL.md index 129adb9..f1ea233 100644 --- a/music/SKILL.md +++ b/music/SKILL.md @@ -1,90 +1,55 @@ --- name: music +description: | + Generate AI music or create covers from reference audio. Triggers on: "音乐", + "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", + "create a song", "做一首歌". metadata: openclaw: emoji: "🎵" requires: bin: ["listenhub"] primaryBin: "listenhub" -description: | - AI music generation and covers via CLI. Triggers on: "音乐", "music", - "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", - "create a song", "做一首歌". --- ## When to Use -- User wants to generate an original song from a text prompt -- User wants to create a cover version from reference audio -- User says "音乐", "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", "create a song", "做一首歌" +- User wants to generate original AI music from a prompt +- User wants to create a cover from reference audio +- User says "音乐", "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", "create a song", or "做一首歌" ## When NOT to Use -- User wants text-to-speech reading (use `/tts`) +- User wants text-to-speech reading (use `/speech`) - User wants a podcast discussion (use `/podcast`) - User wants an explainer video with narration (use `/explainer`) - User wants to transcribe audio to text (use `/asr`) ## Purpose -Generate original AI music or create cover versions from reference audio using the ListenHub CLI. Two modes: +Generate original AI music from text prompts, or create cover versions from reference audio. Two modes: 1. **Generate** (original): Create a new song from a text prompt, with optional style, title, and instrumental-only options. 2. **Cover**: Transform a reference audio file into a new version, with optional style modifications. ## Hard Constraints -- Always check CLI authentication via `shared/cli-authentication.md` before any operation +- Always read config following `shared/config-pattern.md` before any interaction - Follow `shared/cli-patterns.md` for execution modes, error handling, and interaction patterns -- Follow `shared/config-pattern.md` for config lookup, creation, and update -- No speakers involved — this is music generation, not speech -- Audio file constraints for cover mode: mp3, wav, flac, m4a, ogg, aac; max 20 MB -- Long timeout: 600s default. Use `run_in_background: true` with `timeout: 660000` +- Always follow `shared/cli-authentication.md` for auth checks - Never save files to `~/Downloads/` or `.listenhub/` — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming) +- No speakers involved — music generation does not use speaker selection +- Audio file constraints for cover mode: mp3, wav, flac, m4a, ogg, aac; max 20MB +- Long timeout: 600s default. Use `run_in_background: true` with `timeout: 660000` -Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation CLI command until the user has explicitly confirmed. +Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any CLI command until the user has explicitly confirmed. -## CLI Commands - -### Generate (original) - -```bash -listenhub music generate --prompt "..." [--style "..."] [--title "..."] [--instrumental] --json -``` - -### Cover (from reference audio) - -```bash -listenhub music cover --audio "{path-or-url}" [--prompt "..."] [--style "..."] [--title "..."] [--instrumental] --json -``` - -### List tasks - -```bash -listenhub music list --page 1 --page-size 20 [--status pending|generating|uploading|success|failed] --json -``` - -### Get task status - -```bash -listenhub music get --json -``` - ## Step -1: CLI Auth Check -Follow `shared/cli-authentication.md`: - -```bash -AUTH=$(listenhub auth status --json 2>/dev/null) -AUTHED=$(echo "$AUTH" | jq -r '.authenticated // false') -``` - -- If `listenhub` command is not found: tell the user to install it (`npm install -g @marswave/listenhub-cli`). Stop here. -- If `.authenticated` is `false`: tell the user to run `listenhub auth login`. Wait for completion, then re-check. -- If `.authenticated` is `true`: proceed silently. +Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, stop immediately and guide them through setup. ## Step 0: Config Setup @@ -122,7 +87,7 @@ Then ask: 2. **Language** (optional): "默认语言?" - "中文 (zh)" - "English (en)" - - "每次手动选择" -> keep `null` + - "每次手动选择" → keep `null` After collecting answers, save immediately: ```bash @@ -177,7 +142,7 @@ Accept a local file path or URL. This maps to `--audio`. If validation fails, inform the user and re-ask. -Optionally, the user may also provide a prompt to guide the cover style. If not provided in this step, it will be asked in Step 3. +Optionally, the user may also provide a prompt to guide the cover style. ### Step 3: Style (optional) @@ -215,7 +180,7 @@ Summarize all choices: 准备生成音乐: 模式:原创 (Generate) - 描述:{prompt, first 80 chars}... + 描述:{prompt} 风格:{style / 自动} 标题:{title / 自动} 人声:{带人声 / 纯音乐} @@ -228,7 +193,7 @@ Summarize all choices: 准备生成音乐: 模式:翻唱 (Cover) - 参考音频:{audio path or URL} + 参考音频:{path-or-url} 描述:{prompt / 无} 风格:{style / 自动} 标题:{title / 自动} @@ -237,140 +202,86 @@ Summarize all choices: 确认? ``` -Wait for explicit confirmation before proceeding. +Wait for explicit confirmation before running any CLI command. ## Workflow -### Generate Mode - -1. **Submit (foreground)** with `--no-wait` to get the task ID: +1. **Submit (background)**: Run the CLI command with `run_in_background: true` and `timeout: 660000`: + **Generate mode:** ```bash - RESULT=$(listenhub music generate \ + listenhub music generate \ --prompt "{prompt}" \ - ${STYLE:+--style "$STYLE"} \ - ${TITLE:+--title "$TITLE"} \ - ${INSTRUMENTAL:+--instrumental} \ - --no-wait --json 2>/tmp/lh-music-err) - EXIT_CODE=$? - - if [ $EXIT_CODE -ne 0 ]; then - ERROR=$(cat /tmp/lh-music-err) - echo "Error: $ERROR" - rm -f /tmp/lh-music-err - exit $EXIT_CODE - fi - rm -f /tmp/lh-music-err - - TASK_ID=$(echo "$RESULT" | jq -r '.id') - echo "Submitted: $TASK_ID" + --style "{style}" \ + --title "{title}" \ + --instrumental \ + --json ``` -2. Tell the user the task is submitted. - -3. **Poll (background)**: Run with `run_in_background: true` and `timeout: 660000`: - + **Cover mode:** ```bash - TASK_ID="" - for i in $(seq 1 60); do - RESULT=$(listenhub music get "$TASK_ID" --json 2>/dev/null) - STATUS=$(echo "$RESULT" | jq -r '.status // "pending"') - - case "$STATUS" in - success|completed) echo "$RESULT"; exit 0 ;; - failed) echo "FAILED: $RESULT" >&2; exit 1 ;; - *) sleep 10 ;; - esac - done - echo "TIMEOUT" >&2; exit 2 - ``` - -4. When notified of completion, **present the result** (see Result Presentation below). - -### Cover Mode - -1. **Submit (foreground)** with `--no-wait`: - - ```bash - RESULT=$(listenhub music cover \ + listenhub music cover \ --audio "{path-or-url}" \ - ${PROMPT:+--prompt "$PROMPT"} \ - ${STYLE:+--style "$STYLE"} \ - ${TITLE:+--title "$TITLE"} \ - ${INSTRUMENTAL:+--instrumental} \ - --no-wait --json 2>/tmp/lh-music-err) - EXIT_CODE=$? - - if [ $EXIT_CODE -ne 0 ]; then - ERROR=$(cat /tmp/lh-music-err) - echo "Error: $ERROR" - rm -f /tmp/lh-music-err - exit $EXIT_CODE - fi - rm -f /tmp/lh-music-err - - TASK_ID=$(echo "$RESULT" | jq -r '.id') - echo "Submitted: $TASK_ID" + --prompt "{prompt}" \ + --style "{style}" \ + --title "{title}" \ + --instrumental \ + --json ``` -2. Tell the user the task is submitted. + Flag notes: + - `--prompt` — text description of the music (required for generate, optional for cover) + - `--audio` — reference audio file path or URL (cover mode only, required) + - `--style` — optional style/genre hint; omit if not provided + - `--title` — optional track title; omit if not provided + - `--instrumental` — add this flag for instrumental-only (no vocals); omit if not selected + - Omit `--prompt` in cover mode if not provided -3. **Poll (background)**: Same polling loop as Generate mode, with `run_in_background: true` and `timeout: 660000`. + The CLI handles polling internally. Music generation takes up to 10 minutes. -4. When notified of completion, **present the result**. +2. Tell the user the task is submitted and that they will be notified when it finishes. -## Result Presentation +3. When notified of completion, **present the result**: -Read `OUTPUT_MODE` from config: - -```bash -OUTPUT_MODE=$(echo "$CONFIG" | jq -r '.outputMode // "download"') -``` - -Parse the completed result: - -```bash -AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl') -TITLE=$(echo "$RESULT" | jq -r '.title // "Untitled"') -DURATION=$(echo "$RESULT" | jq -r '.duration // 0') -CREDITS=$(echo "$RESULT" | jq -r '.credits // 0') -``` - -### `inline` or `both` - -Display the audio URL as a clickable link: - -``` -音乐已生成! + Parse the CLI JSON output for key fields: + ```bash + AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl') + TITLE=$(echo "$RESULT" | jq -r '.title // "Untitled"') + DURATION=$(echo "$RESULT" | jq -r '.duration // empty') + CREDITS=$(echo "$RESULT" | jq -r '.credits // empty') + ``` -标题:{title} -在线收听:{audioUrl} -时长:{duration}s -消耗积分:{credits} -``` + Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. -### `download` or `both` + **`inline` or `both`**: Display audio URL as a clickable link. -Generate a slug from the title following `shared/config-pattern.md` § Artifact Naming. + ``` + 音乐已生成! -```bash -SLUG="{title-slug}" # e.g. "summer-breeze", "夜空中最亮的星" -NAME="${SLUG}.mp3" -# Dedup: if file exists, append -2, -3, etc. -BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2 -while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done -curl -sS -o "$NAME" "{audioUrl}" -``` + 标题:{title} + 在线收听:{audioUrl} + 时长:{duration}s + 消耗积分:{credits} + ``` -Present: -``` -已保存到当前目录: - {NAME} -``` + **`download` or `both`**: Also download the file. Generate a slug from the title following `shared/config-pattern.md` § Artifact Naming. + ```bash + SLUG="{slug}" # e.g. "summer-breeze" + NAME="${SLUG}.mp3" + # Dedup: if file exists, append -2, -3, etc. + BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2 + while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done + curl -sS -o "$NAME" "{audioUrl}" + ``` + Present: + ``` + 已保存到当前目录: + {NAME} + ``` -## Updating Config +### After Successful Generation -After successful generation, merge the language used this session into config if the user explicitly specified one: +Update config with the language used this session if the user explicitly specified one: ```bash if [ -n "$LANGUAGE" ]; then @@ -379,7 +290,10 @@ if [ -n "$LANGUAGE" ]; then fi ``` -## API Reference +**Estimated times**: +- Music generation: 5-10 minutes + +## Resources - CLI authentication: `shared/cli-authentication.md` - CLI patterns: `shared/cli-patterns.md` @@ -389,7 +303,7 @@ fi ## Composability - **Invokes**: nothing -- **Invoked by**: nothing (standalone) +- **Invoked by**: content-planner (Phase 3) ## Examples @@ -401,10 +315,15 @@ fi 2. Read config (first run: create defaults with `outputMode: "download"`) 3. Infer: mode = generate, prompt = "夏天海边的歌" 4. Ask: style? title? instrumental? -5. Confirm summary -> user confirms -6. Submit `listenhub music generate --prompt "关于夏天海边的歌" --no-wait --json` -7. Poll in background -8. On completion: download `夏天海边.mp3` to cwd +5. Confirm summary → user confirms + +```bash +listenhub music generate \ + --prompt "关于夏天海边的歌" \ + --json +``` + +Wait for CLI to return result, then download `{slug}.mp3` to cwd. **Cover from file:** @@ -414,19 +333,32 @@ fi 2. Validate: `demo.mp3` exists, is a supported format, under 20 MB 3. Infer: style = "jazz" from user input 4. Ask: title? instrumental? -5. Confirm summary -> user confirms -6. Submit `listenhub music cover --audio "demo.mp3" --style "jazz" --no-wait --json` -7. Poll in background -8. On completion: download `demo-cover.mp3` to cwd +5. Confirm summary → user confirms + +```bash +listenhub music cover \ + --audio "demo.mp3" \ + --style "jazz" \ + --json +``` + +Wait for CLI to return result, then download `{slug}.mp3` to cwd. **Generate instrumental:** > "Create an instrumental electronic track for a game intro" 1. Detect: generate mode ("Create ... track") -2. Infer: style = "electronic", instrumental = yes, from user input +2. Infer: style = "electronic", instrumental = yes 3. Ask: title? -4. Confirm summary -> user confirms -5. Submit with `--style "electronic" --instrumental` -6. Poll in background -7. On completion: download `game-intro.mp3` to cwd +4. Confirm summary → user confirms + +```bash +listenhub music generate \ + --prompt "instrumental electronic track for a game intro" \ + --style "electronic" \ + --instrumental \ + --json +``` + +Wait for CLI to return result, then download `{slug}.mp3` to cwd. diff --git a/slides/SKILL.md b/slides/SKILL.md index d353216..4f278db 100644 --- a/slides/SKILL.md +++ b/slides/SKILL.md @@ -1,9 +1,8 @@ --- name: slides description: | - Create slide decks with AI-generated visuals and optional narration. Triggers on: - "幻灯片", "PPT", "slides", "slide deck", "做幻灯片", "create slides", - "presentation". + Create slide decks from topics, URLs, or text. Triggers on: "幻灯片", "PPT", + "slides", "slide deck", "做幻灯片", "create slides", "presentation". metadata: openclaw: emoji: "📊" @@ -15,8 +14,8 @@ metadata: ## When to Use - User wants to create a slide deck or presentation -- User asks to make "slides", "幻灯片", or "PPT" -- User wants a visual presentation with optional narration +- User asks for "slides", "幻灯片", "PPT", or "presentation" +- User wants visual content organized into slides from a topic or URL ## When NOT to Use @@ -27,27 +26,28 @@ metadata: ## Purpose -Generate slide decks that combine structured visual pages with optional voice narration. Ideal for business presentations, educational content, and topic overviews. By default, slides are generated without audio — narration can be enabled on request. +Generate slide decks with AI-generated visuals from topics, URLs, or text. By default, slides are generated without audio narration. Narration can be optionally enabled. Ideal for presentations, summaries, and visual storytelling. ## Hard Constraints -- Always check CLI authentication following `shared/cli-authentication.md` -- Follow `shared/cli-patterns.md` for command structure, execution, errors, and interaction patterns - Always read config following `shared/config-pattern.md` before any interaction +- Follow `shared/cli-patterns.md` for execution modes, error handling, and interaction patterns +- Always follow `shared/cli-authentication.md` for auth checks - Follow `shared/speaker-selection.md` when narration is enabled +- Never hardcode speaker IDs — always fetch from the speakers CLI when the user wants to change voice - Never save files to `~/Downloads/` or `.listenhub/` — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming) - Mode is always `slides` — never `info` or `story` (those are for `/explainer`) - Only 1 speaker supported (when narration is enabled) -- Default behavior: skip audio (no narration). User must opt in with `--no-skip-audio` +- Default behavior: skip audio (no narration). Only add narration when the user explicitly requests it via `--no-skip-audio` -Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation CLI command until the user has explicitly confirmed. +Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any CLI command until the user has explicitly confirmed. ## Step -1: CLI Auth Check -Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, stop and guide them through setup. +Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, stop immediately and guide them through setup. ## Step 0: Config Setup @@ -101,19 +101,11 @@ CONFIG=$(cat "$CONFIG_PATH") Free text input. Ask the user: -> What would you like to create a slide deck about? +> What would you like to create slides about? -Accept: topic description, text content, URLs as source material. +Accept: topic description, text content, URL(s), or any combination. -### Step 2: Source URLs (optional) - -If the user provided URLs in Step 1, collect them. Otherwise ask: - -> Do you have any reference URLs to include as source material? (optional — type "skip" to proceed without) - -Each URL will be passed as a `--source-url` flag (repeatable). - -### Step 3: Language +### Step 2: Language If `config.language` is set, pre-fill and show in summary — skip this question. Otherwise ask: @@ -123,40 +115,35 @@ Question: "What language?" Options: - "Chinese (zh)" — Content in Mandarin Chinese - "English (en)" — Content in English + - "Japanese (ja)" — Content in Japanese ``` -### Step 4: Narration +### Step 3: Narration -Ask the user: +Ask: ``` Question: "需要语音旁白吗?(默认否)" Options: - "不需要" — Slides only, no narration - - "需要旁白" — Add voice narration to slides + - "需要" — Add voice narration to slides ``` -Default is no narration. +Default is no narration. If the user says yes, proceed to Step 4. Otherwise skip to Step 5. -### Step 5: Speaker Selection (only if narration enabled) +### Step 4: Speaker Selection (only if narration enabled) **Skip this step entirely if narration is not enabled.** Follow `shared/speaker-selection.md`: - If `config.defaultSpeakers.{language}` is set → use saved speaker silently - If not set → use **built-in default** from `shared/speaker-selection.md` for the language -- Show the speaker in the confirmation summary (Step 7) — user can change from there if desired +- Show the speaker in the confirmation summary (Step 5) — user can change from there if desired - Only show the full speaker list if the user explicitly asks to change voice -Only 1 speaker is supported. - -### Step 6: Style (optional) +Only 1 speaker is supported for slides narration. -If the user mentioned a specific visual style, capture it. Otherwise skip — do not ask. - -Style is passed as `--style "{style}"` when specified. - -### Step 7: Confirm & Generate +### Step 5: Confirm & Generate Summarize all choices: @@ -166,8 +153,7 @@ Ready to generate slides: Topic: {topic} Language: {language} - Narration: None - Sources: {urls or "none"} + Narration: No Proceed? ``` @@ -180,7 +166,6 @@ Ready to generate slides: Language: {language} Narration: Yes Speaker: {speaker name} - Sources: {urls or "none"} Proceed? ``` @@ -189,107 +174,89 @@ Wait for explicit confirmation before running any CLI command. ## Workflow -1. **Submit (foreground)** with `--no-wait` to get the creation ID: +1. **Submit (background)**: Run the CLI command with `run_in_background: true` and `timeout: 360000`: - **Base command:** + **Without narration (default):** ```bash - RESULT=$(listenhub slides create \ + listenhub slides create \ --query "{topic}" \ - --lang {language} \ + --lang {en|zh|ja} \ --image-size 2K \ --aspect-ratio 16:9 \ - --no-wait \ - --json) - ID=$(echo "$RESULT" | jq -r '.id') + --json ``` - **If narration enabled**, add: - ``` - --no-skip-audio --speaker "{speakerName}" + **With narration:** + ```bash + listenhub slides create \ + --query "{topic}" \ + --lang {en|zh|ja} \ + --image-size 2K \ + --aspect-ratio 16:9 \ + --no-skip-audio \ + --speaker "{name}" \ + --json ``` - **If style specified**, add: - ``` - --style "{style}" - ``` + If the user provided a source URL, add `--source-url "{url}"`. - **If source URLs provided**, add for each URL: - ``` - --source-url "{url}" - ``` + The CLI handles polling internally and returns the final result when generation completes. -2. Tell the user the task is submitted. +2. Tell the user the task is submitted and that they will be notified when it finishes. -3. **Poll (background)**: Run the following with `run_in_background: true` and `timeout: 360000`: +3. When notified of completion, **parse and present the result**: + Parse the CLI JSON output for key fields: ```bash - ID="" - for i in $(seq 1 60); do - RESULT=$(listenhub creation get "$ID" --json 2>/dev/null) - STATUS=$(echo "$RESULT" | jq -r '.status // "processing"') - - case "$STATUS" in - completed) echo "$RESULT"; exit 0 ;; - failed) echo "FAILED: $RESULT" >&2; exit 1 ;; - *) sleep 10 ;; - esac - done - echo "TIMEOUT" >&2; exit 2 + EPISODE_ID=$(echo "$RESULT" | jq -r '.episodeId') + AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl // empty') + CREDITS=$(echo "$RESULT" | jq -r '.credits // empty') ``` -4. When notified, **parse and present the result**: - Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior. - Extract from the completed result: - - `episodeId` — for the online link - - `pageCount` — number of slides generated - - `credits` — credits consumed + **Without narration:** - **`inline` or `both`**: Present the result inline. + **`inline` or `both`**: Present the online link. ``` 幻灯片已生成! - 「{title}」 - 在线查看:https://listenhub.ai/app/slides/{episodeId} - 页数:{pageCount} 消耗积分:{credits} ``` - **If narration was enabled**, also show: - ``` - 音频链接:{audioUrl} - ``` + **`download` or `both`**: Also save the script file. Generate a topic slug following `shared/config-pattern.md` § Artifact Naming. + - Save as `{slug}-slides.md` in cwd (dedup if exists) + - Present the save path in addition to the above summary. - **`download` or `both`**: Also save files locally. Generate a topic slug following `shared/config-pattern.md` § Artifact Naming. + **With narration:** - Create `{slug}-slides/` folder (dedup if exists): - - Write `script.md` inside (the slide script/outline) - - If narration was enabled: download `audio.mp3` inside + **`inline` or `both`**: Display audio URL as a clickable link. - ```bash - DIR="{slug}-slides" - i=2; while [ -d "$DIR" ]; do DIR="{slug}-slides-${i}"; i=$((i+1)); done - mkdir -p "$DIR" - - # Save script - echo "$RESULT" | jq -r '.script // .content // ""' > "$DIR/script.md" - - # If narration enabled, download audio - AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl // empty') - [ -n "$AUDIO_URL" ] && curl -sS -o "$DIR/audio.mp3" "$AUDIO_URL" ``` + 幻灯片已生成! - Present: - ``` - 已保存到当前目录: - {slug}-slides/ - script.md - audio.mp3 (if narration enabled) + 在线查看:https://listenhub.ai/app/slides/{episodeId} + 音频链接:{audioUrl} + 消耗积分:{credits} ``` + **`download` or `both`**: Also save files. Generate a topic slug following `shared/config-pattern.md` § Artifact Naming. + - Create `{slug}-slides/` folder (dedup if exists) + - Write `script.md` inside + - Download audio: + ```bash + curl -sS -o "{slug}-slides/audio.mp3" "{audioUrl}" + ``` + - Present: + ``` + 已保存到当前目录: + {slug}-slides/ + script.md + audio.mp3 + ``` + ### After Successful Generation Update config with the choices made this session: @@ -301,22 +268,24 @@ NEW_CONFIG=$(echo "$CONFIG" | jq \ echo "$NEW_CONFIG" > "$CONFIG_PATH" ``` -If narration was enabled and a speaker was used: +If narration was used, also save the speaker: ```bash NEW_CONFIG=$(echo "$CONFIG" | jq \ --arg lang "{language}" \ - --argjson ids '["speakerId"]' \ - '.defaultSpeakers[$lang] = $ids') + --arg speakerId "{speakerId}" \ + '. + {"language": $lang, "defaultSpeakers": (.defaultSpeakers + {($lang): [$speakerId]})}') echo "$NEW_CONFIG" > "$CONFIG_PATH" ``` -**Estimated time**: 3-6 minutes +**Estimated times**: +- Slides without narration: 2-4 minutes +- Slides with narration: 4-8 minutes -## API Reference +## Resources - CLI authentication: `shared/cli-authentication.md` - CLI patterns: `shared/cli-patterns.md` -- Speaker list (CLI): `shared/cli-speakers.md` +- Speaker query: `shared/cli-speakers.md` - Speaker selection guide: `shared/speaker-selection.md` - Config pattern: `shared/config-pattern.md` - Output mode: `shared/output-mode.md` @@ -332,45 +301,39 @@ echo "$NEW_CONFIG" > "$CONFIG_PATH" **Agent workflow**: 1. Topic: "量子计算" -2. Source URLs: skip (none provided) -3. Language: pre-filled from config or ask → "zh" -4. Narration: ask → "不需要" -5. Confirm and generate +2. Language: pre-filled from config or ask → "zh" +3. Narration: ask → "不需要" +4. Confirm and generate ```bash -RESULT=$(listenhub slides create \ +listenhub slides create \ --query "量子计算" \ --lang zh \ --image-size 2K \ --aspect-ratio 16:9 \ - --no-wait \ - --json) -ID=$(echo "$RESULT" | jq -r '.id') + --json ``` -Poll until completed, then present the online link and page count. +Wait for CLI to return result, then present the online link. **User**: "Create slides about React hooks with narration" **Agent workflow**: 1. Topic: "React hooks" -2. Source URLs: skip -3. Language: ask → "en" -4. Narration: ask → "需要旁白" -5. Speaker: use built-in default "Mars" (cozy-man-english) -6. Confirm and generate +2. Language: ask → "en" +3. Narration: ask → "需要" +4. Speaker: use built-in default for English +5. Confirm and generate ```bash -RESULT=$(listenhub slides create \ +listenhub slides create \ --query "React hooks" \ --lang en \ --image-size 2K \ --aspect-ratio 16:9 \ --no-skip-audio \ --speaker "Mars" \ - --no-wait \ - --json) -ID=$(echo "$RESULT" | jq -r '.id') + --json ``` -Poll until completed, then present the online link, page count, and audio link. +Wait for CLI to return result, then present the online link and audio link. From 878ba3cc40b5b21276bb11491842f3850f02ab3a Mon Sep 17 00:00:00 2001 From: 0XFANGO Date: Wed, 8 Apr 2026 12:20:21 +0800 Subject: [PATCH 14/14] fix: auto-install CLI and auto-login instead of asking user to run commands All skills that depend on listenhub CLI now auto-install via `npm install -g @marswave/listenhub-cli` and auto-login via `listenhub auth login` when dependencies are missing. Updated: - shared/cli-authentication.md (core auth check logic) - shared/cli-patterns.md (error handling table) - shared/config-pattern.md - All 8 skill files that reference CLI auth --- explainer/SKILL.md | 2 +- image-gen/SKILL.md | 2 +- listenhub-cli/SKILL.md | 6 ++-- listenhub/SKILL.md | 6 ++-- music/SKILL.md | 2 +- podcast/SKILL.md | 2 +- shared/cli-authentication.md | 53 ++++++++++++++++++++---------------- shared/cli-patterns.md | 4 +-- shared/config-pattern.md | 2 +- slides/SKILL.md | 2 +- tts/SKILL.md | 2 +- 11 files changed, 44 insertions(+), 39 deletions(-) diff --git a/explainer/SKILL.md b/explainer/SKILL.md index 0c9e70d..39ecab9 100644 --- a/explainer/SKILL.md +++ b/explainer/SKILL.md @@ -47,7 +47,7 @@ Use the AskUserQuestion tool for every multiple-choice step — do NOT print opt ## Step -1: CLI Auth Check -Follow `shared/config-pattern.md` § CLI Auth Check. If the CLI is not installed or the user is not logged in, stop immediately and guide them through setup. +Follow `shared/config-pattern.md` § CLI Auth Check. If the CLI is not installed or the user is not logged in, auto-install and auto-login per `shared/cli-authentication.md` — never ask the user to run commands manually. ## Step 0: Config Setup diff --git a/image-gen/SKILL.md b/image-gen/SKILL.md index 97a58db..4c1556b 100644 --- a/image-gen/SKILL.md +++ b/image-gen/SKILL.md @@ -43,7 +43,7 @@ Use the AskUserQuestion tool for every multiple-choice step — do NOT print opt ## Step -1: CLI Auth Check -Follow `shared/cli-authentication.md` § Auth Check. If CLI is not installed or not logged in, guide the user through setup. +Follow `shared/cli-authentication.md` § Auth Check. If CLI is not installed or not logged in, auto-install and auto-login — never ask the user to run commands manually. ## Step 0: Config Setup diff --git a/listenhub-cli/SKILL.md b/listenhub-cli/SKILL.md index 3365367..50fa37e 100644 --- a/listenhub-cli/SKILL.md +++ b/listenhub-cli/SKILL.md @@ -57,9 +57,9 @@ Most skills require the ListenHub CLI. Check: listenhub auth status --json ``` -If not installed or not logged in, guide the user: +If not installed or not logged in, handle automatically — never ask the user to run install commands: -1. Install: `npm install -g @marswave/listenhub-cli` -2. Login: `listenhub auth login` +1. Auto-install: run `npm install -g @marswave/listenhub-cli` silently +2. Auto-login: run `listenhub auth login` (opens browser) Exception: `/asr` runs locally and needs no CLI or API key. diff --git a/listenhub/SKILL.md b/listenhub/SKILL.md index 238bc82..c31e1cf 100644 --- a/listenhub/SKILL.md +++ b/listenhub/SKILL.md @@ -57,9 +57,9 @@ Most skills require the ListenHub CLI. Check: listenhub auth status --json ``` -If not installed or not logged in, guide the user: +If not installed or not logged in, handle automatically — never ask the user to run install commands: -1. Install: `npm install -g @marswave/listenhub-cli` -2. Login: `listenhub auth login` +1. Auto-install: run `npm install -g @marswave/listenhub-cli` silently +2. Auto-login: run `listenhub auth login` (opens browser) Exception: `/asr` runs locally and needs no CLI or API key. diff --git a/music/SKILL.md b/music/SKILL.md index f1ea233..0cd7869 100644 --- a/music/SKILL.md +++ b/music/SKILL.md @@ -49,7 +49,7 @@ Use the AskUserQuestion tool for every multiple-choice step — do NOT print opt ## Step -1: CLI Auth Check -Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, stop immediately and guide them through setup. +Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually. ## Step 0: Config Setup diff --git a/podcast/SKILL.md b/podcast/SKILL.md index a9758cf..b7f9284 100644 --- a/podcast/SKILL.md +++ b/podcast/SKILL.md @@ -47,7 +47,7 @@ Use the AskUserQuestion tool for every multiple-choice step — do NOT print opt ## Step -1: CLI Auth Check -Follow `shared/cli-authentication.md` § Auth Check. If the CLI is not installed or the user is not logged in, stop immediately and guide them. +Follow `shared/cli-authentication.md` § Auth Check. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually. ## Step 0: Config Setup diff --git a/shared/cli-authentication.md b/shared/cli-authentication.md index a492046..fa9e324 100644 --- a/shared/cli-authentication.md +++ b/shared/cli-authentication.md @@ -3,47 +3,52 @@ ## Prerequisites - **Node.js >= 20** -- **ListenHub CLI**: `npm install -g @marswave/listenhub-cli` +- **ListenHub CLI** (auto-installed if missing) ## Auth Check -Run this before any CLI operation: - -```bash -listenhub auth status --json -``` - -Parse the `.authenticated` field: +Run this before any CLI operation. The check handles both installation and login automatically — never ask the user to run install commands manually. ```bash +# 1. Auto-install if missing +if ! command -v listenhub &>/dev/null; then + npm install -g @marswave/listenhub-cli +fi + +# 2. Verify install succeeded +if ! command -v listenhub &>/dev/null; then + echo "INSTALL_FAILED" + # Stop here — tell user their Node.js/npm setup needs attention +fi + +# 3. Check auth AUTH=$(listenhub auth status --json 2>/dev/null) AUTHED=$(echo "$AUTH" | jq -r '.authenticated // false') ``` -### If CLI not installed +### If install fails -If `listenhub` command is not found, tell the user: +If `npm install -g` fails (e.g., permission issues, Node.js not available), tell the user: -> ListenHub CLI is not installed. Please install it: -> ``` -> npm install -g @marswave/listenhub-cli -> ``` -> Requires Node.js 20 or later. +> ListenHub CLI auto-install failed. Please check your Node.js (>= 20) and npm setup, then retry. -### If not logged in +Do **not** ask them to run `npm install -g @marswave/listenhub-cli` manually — diagnose the issue first (permissions, PATH, Node version). -If `.authenticated` is `false`, tell the user: +### If not logged in -> You're not logged in. Please run: -> ``` -> listenhub auth login -> ``` -> This will open your browser for OAuth authentication. +If `.authenticated` is `false`, run `listenhub auth login` directly — this opens the browser for OAuth. Wait for completion, then re-check auth status. -Then wait for the user to complete login and re-check. +```bash +if [ "$AUTHED" != "true" ]; then + listenhub auth login + # Re-verify after login + AUTH=$(listenhub auth status --json 2>/dev/null) + AUTHED=$(echo "$AUTH" | jq -r '.authenticated // false') +fi +``` ## Security - Credentials are stored at `~/.config/listenhub/credentials.json` (file mode `0600`) -- Tokens refresh automatically -- no manual rotation needed +- Tokens refresh automatically — no manual rotation needed - Never log or display tokens in output diff --git a/shared/cli-patterns.md b/shared/cli-patterns.md index 27381a8..186f575 100644 --- a/shared/cli-patterns.md +++ b/shared/cli-patterns.md @@ -108,8 +108,8 @@ STATUS=$(echo "$RESULT" | jq -r '.status') | Error | Cause | Fix | |-------|-------|-----| -| `command not found: listenhub` | CLI not installed | `npm install -g @marswave/listenhub-cli` | -| `Not authenticated` | Not logged in | `listenhub auth login` | +| `command not found: listenhub` | CLI not installed | Auto-install: run `npm install -g @marswave/listenhub-cli` silently, then retry | +| `Not authenticated` | Not logged in | Auto-login: run `listenhub auth login` directly | | `Insufficient credits` | Account has no credits | Tell user to recharge at listenhub.ai | | `Rate limited` | Too many requests | Wait and retry | | `Invalid speaker` | Speaker ID not found | Re-query speakers list | diff --git a/shared/config-pattern.md b/shared/config-pattern.md index 21a9e14..7b2669b 100644 --- a/shared/config-pattern.md +++ b/shared/config-pattern.md @@ -8,7 +8,7 @@ Run this **before Step 0** in every skill that uses the ListenHub CLI. Follow `shared/cli-authentication.md` § Auth Check. -If CLI is not installed or not logged in, guide the user through setup as described in `shared/cli-authentication.md`. +If CLI is not installed or not logged in, auto-install and auto-login as described in `shared/cli-authentication.md` — never ask the user to run commands manually. ## Config Location diff --git a/slides/SKILL.md b/slides/SKILL.md index 4f278db..646cff7 100644 --- a/slides/SKILL.md +++ b/slides/SKILL.md @@ -47,7 +47,7 @@ Use the AskUserQuestion tool for every multiple-choice step — do NOT print opt ## Step -1: CLI Auth Check -Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, stop immediately and guide them through setup. +Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually. ## Step 0: Config Setup diff --git a/tts/SKILL.md b/tts/SKILL.md index 13c787e..10e874c 100644 --- a/tts/SKILL.md +++ b/tts/SKILL.md @@ -63,7 +63,7 @@ Determine the mode from the user's input **automatically** before asking any que ### Step -1: CLI Auth Check -Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, stop and guide them through setup. +Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually. ### Step 0: Config Setup