feat: voice-chat skill — Discord voice conversation with Claude#8
feat: voice-chat skill — Discord voice conversation with Claude#8
Conversation
Discord voice bot skill that enables real-time voice conversation with Claude through coli ASR + ListenHub TTS pipeline.
- Add Discord bot intents/permissions requirements - Clarify single-user audio handling in v1 - Detail Opus packet → PCM → Float32Array pipeline - Add guild ID to collection flow - Fix TTS playback pipeline (MP3 → PCM → Opus) - Add ffmpeg/prism-media dependencies - Clarify half-duplex state transitions in protocol - Increase TTS timeout to 5s, macOS-only for v1 - Handle empty ASR results (discard silently) - Add conversation context section
- Step 0: check coli version, auto-update if outdated - Step 3: detect if user is in voice channel before joining - Add 'waiting' protocol event for empty channel state
- Fix env var: LISTENHUB_API_KEY → COLI_LISTENHUB_API_KEY - Use coli runCloudTts/listSpeakers instead of direct ListenHub API - Simplify: coli handles both ASR and TTS as single dependency
9 tasks across 3 chunks: - Chunk 1: discord-bot.js (package.json, startup, ASR pipeline, TTS handler) - Chunk 2: SKILL.md with full interaction flow - Chunk 3: README updates and integration test
- Discord connection with voice channel join/wait logic - Opus → PCM → Float32Array → coli streamAsr (VAD + SenseVoice) - Cloud TTS with 5s timeout fallback to local macOS say - stdin/stdout JSON line protocol for Claude Code communication - Half-duplex: pause ASR during TTS playback
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| player.once(AudioPlayerStatus.Idle, resolve); | ||
| player.once('error', reject); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Listener leak on shared AudioPlayer per playback call
Medium Severity
playAudio registers player.once(AudioPlayerStatus.Idle) and player.once('error') on the module-level player singleton each time it's called. When one event fires (e.g., Idle on success), the other once listener (error) remains registered and is never removed. Since AudioPlayer extends EventEmitter, after roughly 10 successful replies, Node.js emits a MaxListenersExceededWarning. Over a longer conversation the stale listeners keep accumulating.
| if (usedCloud) { | ||
| await playAudio(tmpPath); | ||
| await unlink(tmpPath).catch(() => {}); | ||
| } |
There was a problem hiding this comment.
Cloud TTS playback errors silently swallowed by catch
Medium Severity
When cloud TTS succeeds but playAudio(tmpPath) throws (e.g., corrupt file, I/O error), the error propagates out of handleReply's outer try (which has a finally but no catch). It's then caught by the readline handler's catch {} block, which was only intended for JSON parse errors. The error is silently ignored — no error message is emitted to the parent process, and tmpPath is never cleaned up since unlink is skipped.
Additional Locations (1)
| } | ||
| } | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Async ready handler errors cause unhandled promise rejections
High Severity
The client.once('ready', async () => { ... }) callback contains several await calls that can throw (e.g., entersState timing out at line 294, joinVoiceChannel failing), but nothing catches errors from this async callback. The EventEmitter does not await the returned Promise, so any rejection becomes an unhandled promise rejection — crashing the process in Node.js ≥ 15 without emitting a JSON error line via the protocol. The main().catch() at line 345 only catches errors from main() itself (like client.login), not from the separately-invoked ready handler. A try/catch wrapping the body of the ready callback is needed to emit a proper error and exit cleanly.
| emit({ type: 'error', message: `Local TTS also failed: ${localErr.message}` }); | ||
| return; | ||
| } | ||
| } |
There was a problem hiding this comment.
Cloud TTS temp file leaked on timeout fallback
Low Severity
When cloud TTS times out and the code falls back to local TTS (the catch block at line 219), tmpPath — the file that runCloudTts was writing to — is never cleaned up. The localPath file is properly unlinked after use, but tmpPath is abandoned. Additionally, the runCloudTts call continues executing in the background after Promise.race rejects, potentially finishing its write to tmpPath. Since the timeout fallback is a designed feature, this leak occurs during normal operation whenever the network is slow, accumulating orphaned temp files across the session.
v0.18.0 doesn't support Discord's new Voice Gateway v8 and DAVE encryption, causing connection to loop signalling → connecting forever. Also add debug logging for voice connection state changes.
v0.9.0 causes SIGSEGV on Node.js v25 when decoding Opus packets.
…s dir Instead of copying to ~/.listenhub/voice-chat/, just npm install in voice-chat/scripts/ and run from there. Gitignore node_modules.
- Discord bot token → env var (DISCORD_BOT_TOKEN), persisted to shell rc - API key → env var (COLI_LISTENHUB_API_KEY), persisted to shell rc - Guild ID, channel ID, TTS prefs → config.json (non-sensitive)


Summary
/voice-chatskill: launch a Discord voice bot for real-time voice conversation with Claudediscord-bot.jshandles the audio pipeline: Discord Opus → PCM → coli streamAsr (VAD + SenseVoice) → Claude reply → coli cloud TTS → Discord playbacksayfallbackHow it works
/voice-chatin Claude Codediscord-bot.jsas background processDependencies
discord.js+@discordjs/voice— Discord connection@marswave/coli— ASR (SenseVoice + Silero VAD) and cloud TTS@discordjs/opus,sodium-native,prism-media— audio codec supportCOLI_LISTENHUB_API_KEY,ffmpeg, Node.js ≥ 18Test plan
SKILL.mdpasses skill-creator validation (quick_validate.py✅)discord-bot.jspasses ESM syntax check ✅Note
Medium Risk
Introduces a new Discord-connected Node.js voice bot that handles live audio, ASR, and TTS (including API key usage and process I/O), which increases operational and dependency risk despite being additive.
Overview
Adds a new
/voice-chatskill that enables real-time voice conversation with Claude via a Discord voice channel.Includes a standalone Node.js bot (
voice-chat/scripts/discord-bot.js) that joins a configured channel, streams user audio through coli ASR (SenseVoice + VAD) emitting JSON-linepartial/finaltranscripts, and accepts JSON-linereplycommands to synthesize/play TTS with cloud→local fallback while pausing ASR during playback.Adds
voice-chat/SKILL.mdfor setup/config/launch flow (Discord bot token/channel config,COLI_LISTENHUB_API_KEY, dependency install) plusvoice-chat/scripts/package.jsonand avoice-chat/sharedlink to shared docs, and documents the new skill inREADME.mdandREADME.zh.mdalongside a design spec and implementation plan.Written by Cursor Bugbot for commit 1d3364e. Configure here.