feat: voice-chat skill — Discord voice conversation with Claude by 0xFANGO · Pull Request #8 · marswaveai/skills

0xFANGO · 2026-03-14T11:06:00Z

Summary

Add /voice-chat skill: launch a Discord voice bot for real-time voice conversation with Claude
discord-bot.js handles the audio pipeline: Discord Opus → PCM → coli streamAsr (VAD + SenseVoice) → Claude reply → coli cloud TTS → Discord playback
Half-duplex v1: ASR pauses during TTS playback, architecture leaves room for full-duplex
TTS graceful degradation: ListenHub cloud TTS (5s timeout) → local macOS say fallback
stdin/stdout JSON line protocol between bot process and Claude Code

How it works

User invokes /voice-chat in Claude Code
Skill guides through Discord bot setup (token, server, channel) and TTS config (language, voice)
Launches discord-bot.js as background process
Bot joins Discord voice channel, listens via coli ASR, sends transcriptions to Claude Code
Claude generates replies, bot synthesizes and plays them back

Dependencies

discord.js + @discordjs/voice — Discord connection
@marswave/coli — ASR (SenseVoice + Silero VAD) and cloud TTS
@discordjs/opus, sodium-native, prism-media — audio codec support
Requires: COLI_LISTENHUB_API_KEY, ffmpeg, Node.js ≥ 18

Test plan

Verify SKILL.md passes skill-creator validation (quick_validate.py ✅)
Verify discord-bot.js passes ESM syntax check ✅
End-to-end test with real Discord bot token and voice channel (manual)
Test TTS fallback by setting low timeout
Test voice channel empty → waiting → user joins flow

Note

Medium Risk
Introduces a new Discord-connected Node.js voice bot that handles live audio, ASR, and TTS (including API key usage and process I/O), which increases operational and dependency risk despite being additive.

Overview
Adds a new /voice-chat skill that enables real-time voice conversation with Claude via a Discord voice channel.

Includes a standalone Node.js bot (voice-chat/scripts/discord-bot.js) that joins a configured channel, streams user audio through coli ASR (SenseVoice + VAD) emitting JSON-line partial/final transcripts, and accepts JSON-line reply commands to synthesize/play TTS with cloud→local fallback while pausing ASR during playback.

Adds voice-chat/SKILL.md for setup/config/launch flow (Discord bot token/channel config, COLI_LISTENHUB_API_KEY, dependency install) plus voice-chat/scripts/package.json and a voice-chat/shared link to shared docs, and documents the new skill in README.md and README.zh.md alongside a design spec and implementation plan.

^{Written by Cursor Bugbot for commit 1d3364e. Configure here.}

Discord voice bot skill that enables real-time voice conversation with Claude through coli ASR + ListenHub TTS pipeline.

- Add Discord bot intents/permissions requirements - Clarify single-user audio handling in v1 - Detail Opus packet → PCM → Float32Array pipeline - Add guild ID to collection flow - Fix TTS playback pipeline (MP3 → PCM → Opus) - Add ffmpeg/prism-media dependencies - Clarify half-duplex state transitions in protocol - Increase TTS timeout to 5s, macOS-only for v1 - Handle empty ASR results (discard silently) - Add conversation context section

- Step 0: check coli version, auto-update if outdated - Step 3: detect if user is in voice channel before joining - Add 'waiting' protocol event for empty channel state

- Fix env var: LISTENHUB_API_KEY → COLI_LISTENHUB_API_KEY - Use coli runCloudTts/listSpeakers instead of direct ListenHub API - Simplify: coli handles both ASR and TTS as single dependency

9 tasks across 3 chunks: - Chunk 1: discord-bot.js (package.json, startup, ASR pipeline, TTS handler) - Chunk 2: SKILL.md with full interaction flow - Chunk 3: README updates and integration test

- Discord connection with voice channel join/wait logic - Opus → PCM → Float32Array → coli streamAsr (VAD + SenseVoice) - Cloud TTS with 5s timeout fallback to local macOS say - stdin/stdout JSON line protocol for Claude Code communication - Half-duplex: pause ASR during TTS playback

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-14T11:14:08Z

+    player.once(AudioPlayerStatus.Idle, resolve);
+    player.once('error', reject);
+  });
+}


Listener leak on shared AudioPlayer per playback call

Medium Severity

playAudio registers player.once(AudioPlayerStatus.Idle) and player.once('error') on the module-level player singleton each time it's called. When one event fires (e.g., Idle on success), the other once listener (error) remains registered and is never removed. Since AudioPlayer extends EventEmitter, after roughly 10 successful replies, Node.js emits a MaxListenersExceededWarning. Over a longer conversation the stale listeners keep accumulating.

cursor · 2026-03-14T11:14:08Z

+    if (usedCloud) {
+      await playAudio(tmpPath);
+      await unlink(tmpPath).catch(() => {});
+    }


Cloud TTS playback errors silently swallowed by catch

Medium Severity

When cloud TTS succeeds but playAudio(tmpPath) throws (e.g., corrupt file, I/O error), the error propagates out of handleReply's outer try (which has a finally but no catch). It's then caught by the readline handler's catch {} block, which was only intended for JSON parse errors. The error is silently ignored — no error message is emitted to the parent process, and tmpPath is never cleaned up since unlink is skipped.

Additional Locations (1)

voice-chat/scripts/discord-bot.js#L331-L334

cursor · 2026-03-14T11:14:08Z

+        }
+      }
+    });
+  });


Async ready handler errors cause unhandled promise rejections

High Severity

The client.once('ready', async () => { ... }) callback contains several await calls that can throw (e.g., entersState timing out at line 294, joinVoiceChannel failing), but nothing catches errors from this async callback. The EventEmitter does not await the returned Promise, so any rejection becomes an unhandled promise rejection — crashing the process in Node.js ≥ 15 without emitting a JSON error line via the protocol. The main().catch() at line 345 only catches errors from main() itself (like client.login), not from the separately-invoked ready handler. A try/catch wrapping the body of the ready callback is needed to emit a proper error and exit cleanly.

cursor · 2026-03-14T11:14:08Z

+        emit({ type: 'error', message: `Local TTS also failed: ${localErr.message}` });
+        return;
+      }
+    }


Cloud TTS temp file leaked on timeout fallback

Low Severity

When cloud TTS times out and the code falls back to local TTS (the catch block at line 219), tmpPath — the file that runCloudTts was writing to — is never cleaned up. The localPath file is properly unlinked after use, but tmpPath is abandoned. Additionally, the runCloudTts call continues executing in the background after Promise.race rejects, potentially finishing its write to tmpPath. Since the timeout fallback is a designed feature, this leak occurs during normal operation whenever the network is slow, accumulating orphaned temp files across the session.

v0.18.0 doesn't support Discord's new Voice Gateway v8 and DAVE encryption, causing connection to loop signalling → connecting forever. Also add debug logging for voice connection state changes.

v0.9.0 causes SIGSEGV on Node.js v25 when decoding Opus packets.

…s dir Instead of copying to ~/.listenhub/voice-chat/, just npm install in voice-chat/scripts/ and run from there. Gitignore node_modules.

- Discord bot token → env var (DISCORD_BOT_TOKEN), persisted to shell rc - API key → env var (COLI_LISTENHUB_API_KEY), persisted to shell rc - Guild ID, channel ID, TTS prefs → config.json (non-sensitive)

0xFANGO added 11 commits March 14, 2026 17:47

docs: add voice-chat skill design spec

50afb01

Discord voice bot skill that enables real-time voice conversation with Claude through coli ASR + ListenHub TTS pipeline.

docs: add coli auto-update and voice channel presence check

16d84f2

- Step 0: check coli version, auto-update if outdated - Step 3: detect if user is in voice channel before joining - Add 'waiting' protocol event for empty channel state

docs: use COLI_LISTENHUB_API_KEY and coli's built-in TTS APIs

d4c2f32

- Fix env var: LISTENHUB_API_KEY → COLI_LISTENHUB_API_KEY - Use coli runCloudTts/listSpeakers instead of direct ListenHub API - Simplify: coli handles both ASR and TTS as single dependency

docs: add ListenHub API key acquisition URL to spec

c959f72

docs: add voice-chat implementation plan

ed28223

9 tasks across 3 chunks: - Chunk 1: discord-bot.js (package.json, startup, ASR pipeline, TTS handler) - Chunk 2: SKILL.md with full interaction flow - Chunk 3: README updates and integration test

feat(voice-chat): add package.json for bot dependencies

ab5c408

feat(voice-chat): add SKILL.md with complete interaction flow

ee35cae

feat(voice-chat): add shared docs symlink

c0cc620

docs: add voice-chat skill to README

1d3364e

cursor bot reviewed Mar 14, 2026

View reviewed changes

0xFANGO added 8 commits March 14, 2026 19:36

chore: add .env to gitignore

7e6e0b4

fix(voice-chat): fix @discordjs/opus CJS import for ESM compat

b5738cb

fix(voice-chat): upgrade @discordjs/voice to 0.19.1 for DAVE protocol

029a0c7

v0.18.0 doesn't support Discord's new Voice Gateway v8 and DAVE encryption, causing connection to loop signalling → connecting forever. Also add debug logging for voice connection state changes.

fix(voice-chat): upgrade @discordjs/opus to 0.10.0

6f542bb

v0.9.0 causes SIGSEGV on Node.js v25 when decoding Opus packets.

fix(voice-chat): clean up debug logging, keep error handler only

0403a3d

fix(voice-chat): simplify deps — install node_modules in skill script…

0017b69

…s dir Instead of copying to ~/.listenhub/voice-chat/, just npm install in voice-chat/scripts/ and run from there. Gitignore node_modules.

chore(voice-chat): add .gitignore for scripts/node_modules

500036e

fix(voice-chat): separate sensitive config from config.json

414ddc5

- Discord bot token → env var (DISCORD_BOT_TOKEN), persisted to shell rc - API key → env var (COLI_LISTENHUB_API_KEY), persisted to shell rc - Guild ID, channel ID, TTS prefs → config.json (non-sensitive)

latentflux42 mentioned this pull request Apr 13, 2026

fix(cola-avatar-pack): prompt stability, bg detection, color & tagline quality #25

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: voice-chat skill — Discord voice conversation with Claude#8

feat: voice-chat skill — Discord voice conversation with Claude#8
0xFANGO wants to merge 19 commits intomainfrom
feat/voice-chat

0xFANGO commented Mar 14, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 14, 2026

Uh oh!

cursor bot Mar 14, 2026

Uh oh!

cursor bot Mar 14, 2026

Uh oh!

cursor bot Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xFANGO commented Mar 14, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Dependencies

Test plan

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 14, 2026

Choose a reason for hiding this comment

Listener leak on shared AudioPlayer per playback call

Uh oh!

cursor bot Mar 14, 2026

Choose a reason for hiding this comment

Cloud TTS playback errors silently swallowed by catch

Uh oh!

cursor bot Mar 14, 2026

Choose a reason for hiding this comment

Async ready handler errors cause unhandled promise rejections

Uh oh!

cursor bot Mar 14, 2026

Choose a reason for hiding this comment

Cloud TTS temp file leaked on timeout fallback

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

0xFANGO commented Mar 14, 2026 •

edited by cursor bot

Loading