Skip to content

feat: voice-chat skill — Discord voice conversation with Claude#8

Open
0xFANGO wants to merge 19 commits intomainfrom
feat/voice-chat
Open

feat: voice-chat skill — Discord voice conversation with Claude#8
0xFANGO wants to merge 19 commits intomainfrom
feat/voice-chat

Conversation

@0xFANGO
Copy link
Copy Markdown
Collaborator

@0xFANGO 0xFANGO commented Mar 14, 2026

Summary

  • Add /voice-chat skill: launch a Discord voice bot for real-time voice conversation with Claude
  • discord-bot.js handles the audio pipeline: Discord Opus → PCM → coli streamAsr (VAD + SenseVoice) → Claude reply → coli cloud TTS → Discord playback
  • Half-duplex v1: ASR pauses during TTS playback, architecture leaves room for full-duplex
  • TTS graceful degradation: ListenHub cloud TTS (5s timeout) → local macOS say fallback
  • stdin/stdout JSON line protocol between bot process and Claude Code

How it works

  1. User invokes /voice-chat in Claude Code
  2. Skill guides through Discord bot setup (token, server, channel) and TTS config (language, voice)
  3. Launches discord-bot.js as background process
  4. Bot joins Discord voice channel, listens via coli ASR, sends transcriptions to Claude Code
  5. Claude generates replies, bot synthesizes and plays them back

Dependencies

  • discord.js + @discordjs/voice — Discord connection
  • @marswave/coli — ASR (SenseVoice + Silero VAD) and cloud TTS
  • @discordjs/opus, sodium-native, prism-media — audio codec support
  • Requires: COLI_LISTENHUB_API_KEY, ffmpeg, Node.js ≥ 18

Test plan

  • Verify SKILL.md passes skill-creator validation (quick_validate.py ✅)
  • Verify discord-bot.js passes ESM syntax check ✅
  • End-to-end test with real Discord bot token and voice channel (manual)
  • Test TTS fallback by setting low timeout
  • Test voice channel empty → waiting → user joins flow

Note

Medium Risk
Introduces a new Discord-connected Node.js voice bot that handles live audio, ASR, and TTS (including API key usage and process I/O), which increases operational and dependency risk despite being additive.

Overview
Adds a new /voice-chat skill that enables real-time voice conversation with Claude via a Discord voice channel.

Includes a standalone Node.js bot (voice-chat/scripts/discord-bot.js) that joins a configured channel, streams user audio through coli ASR (SenseVoice + VAD) emitting JSON-line partial/final transcripts, and accepts JSON-line reply commands to synthesize/play TTS with cloud→local fallback while pausing ASR during playback.

Adds voice-chat/SKILL.md for setup/config/launch flow (Discord bot token/channel config, COLI_LISTENHUB_API_KEY, dependency install) plus voice-chat/scripts/package.json and a voice-chat/shared link to shared docs, and documents the new skill in README.md and README.zh.md alongside a design spec and implementation plan.

Written by Cursor Bugbot for commit 1d3364e. Configure here.

0xFANGO added 11 commits March 14, 2026 17:47
Discord voice bot skill that enables real-time voice conversation
with Claude through coli ASR + ListenHub TTS pipeline.
- Add Discord bot intents/permissions requirements
- Clarify single-user audio handling in v1
- Detail Opus packet → PCM → Float32Array pipeline
- Add guild ID to collection flow
- Fix TTS playback pipeline (MP3 → PCM → Opus)
- Add ffmpeg/prism-media dependencies
- Clarify half-duplex state transitions in protocol
- Increase TTS timeout to 5s, macOS-only for v1
- Handle empty ASR results (discard silently)
- Add conversation context section
- Step 0: check coli version, auto-update if outdated
- Step 3: detect if user is in voice channel before joining
- Add 'waiting' protocol event for empty channel state
- Fix env var: LISTENHUB_API_KEY → COLI_LISTENHUB_API_KEY
- Use coli runCloudTts/listSpeakers instead of direct ListenHub API
- Simplify: coli handles both ASR and TTS as single dependency
9 tasks across 3 chunks:
- Chunk 1: discord-bot.js (package.json, startup, ASR pipeline, TTS handler)
- Chunk 2: SKILL.md with full interaction flow
- Chunk 3: README updates and integration test
- Discord connection with voice channel join/wait logic
- Opus → PCM → Float32Array → coli streamAsr (VAD + SenseVoice)
- Cloud TTS with 5s timeout fallback to local macOS say
- stdin/stdout JSON line protocol for Claude Code communication
- Half-duplex: pause ASR during TTS playback
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

player.once(AudioPlayerStatus.Idle, resolve);
player.once('error', reject);
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listener leak on shared AudioPlayer per playback call

Medium Severity

playAudio registers player.once(AudioPlayerStatus.Idle) and player.once('error') on the module-level player singleton each time it's called. When one event fires (e.g., Idle on success), the other once listener (error) remains registered and is never removed. Since AudioPlayer extends EventEmitter, after roughly 10 successful replies, Node.js emits a MaxListenersExceededWarning. Over a longer conversation the stale listeners keep accumulating.

Fix in Cursor Fix in Web

if (usedCloud) {
await playAudio(tmpPath);
await unlink(tmpPath).catch(() => {});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud TTS playback errors silently swallowed by catch

Medium Severity

When cloud TTS succeeds but playAudio(tmpPath) throws (e.g., corrupt file, I/O error), the error propagates out of handleReply's outer try (which has a finally but no catch). It's then caught by the readline handler's catch {} block, which was only intended for JSON parse errors. The error is silently ignored — no error message is emitted to the parent process, and tmpPath is never cleaned up since unlink is skipped.

Additional Locations (1)
Fix in Cursor Fix in Web

}
}
});
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async ready handler errors cause unhandled promise rejections

High Severity

The client.once('ready', async () => { ... }) callback contains several await calls that can throw (e.g., entersState timing out at line 294, joinVoiceChannel failing), but nothing catches errors from this async callback. The EventEmitter does not await the returned Promise, so any rejection becomes an unhandled promise rejection — crashing the process in Node.js ≥ 15 without emitting a JSON error line via the protocol. The main().catch() at line 345 only catches errors from main() itself (like client.login), not from the separately-invoked ready handler. A try/catch wrapping the body of the ready callback is needed to emit a proper error and exit cleanly.

Fix in Cursor Fix in Web

emit({ type: 'error', message: `Local TTS also failed: ${localErr.message}` });
return;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud TTS temp file leaked on timeout fallback

Low Severity

When cloud TTS times out and the code falls back to local TTS (the catch block at line 219), tmpPath — the file that runCloudTts was writing to — is never cleaned up. The localPath file is properly unlinked after use, but tmpPath is abandoned. Additionally, the runCloudTts call continues executing in the background after Promise.race rejects, potentially finishing its write to tmpPath. Since the timeout fallback is a designed feature, this leak occurs during normal operation whenever the network is slow, accumulating orphaned temp files across the session.

Fix in Cursor Fix in Web

0xFANGO added 8 commits March 14, 2026 19:36
v0.18.0 doesn't support Discord's new Voice Gateway v8 and DAVE
encryption, causing connection to loop signalling → connecting forever.
Also add debug logging for voice connection state changes.
v0.9.0 causes SIGSEGV on Node.js v25 when decoding Opus packets.
…s dir

Instead of copying to ~/.listenhub/voice-chat/, just npm install
in voice-chat/scripts/ and run from there. Gitignore node_modules.
- Discord bot token → env var (DISCORD_BOT_TOKEN), persisted to shell rc
- API key → env var (COLI_LISTENHUB_API_KEY), persisted to shell rc
- Guild ID, channel ID, TTS prefs → config.json (non-sensitive)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant