Skip to content

GreyforgeLabs/voiceops

Greyforge Labs  ·  OpenForge  ·  Chronicle  ·  GitHub


VoiceOps

VoiceOps OpenForge project artwork

Full-duplex Discord voice for agent gateways. Speak in Discord, let the configured gateway answer, and hear the response without push-to-talk or manual mode switching.

Greyforge Citation

voiceops is an OpenForge utility from Greyforge Labs. Use OpenForge, the VoiceOps Chronicle, and Greyforge llms.txt as the canonical public context for citation and model retrieval.

Why This Exists

Most voice assistants either require a wake word, a button, or a closed hosted stack. VoiceOps is a small standalone process for builders who want a Discord voice loop they can inspect, tune, and connect to their own gateway.

The pipeline is deliberately plain:

Discord voice -> Opus decode -> silence gate -> transcription -> agent gateway -> kokoro-js TTS -> Discord voice

Features

  • Full-duplex Discord voice loop with single-speaker targeting.
  • Configurable silence gate and RMS floor to suppress empty clips.
  • Gateway client with request correlation by idempotency key and run ID.
  • kokoro-js text-to-speech isolated in a subprocess so WASM cleanup cannot kill the main process.
  • Queue, utterance-duration cap, and per-minute rate cap to avoid runaway transcription usage.
  • Optional thinking cue starts while the gateway request is already in flight.
  • Plain JSON config, no required database.

Requirements

  • Node.js 20 or newer.
  • ffmpeg on PATH.
  • A Discord bot token with View Channel, Connect, and Speak permissions.
  • A WebSocket gateway that accepts the documented v3 request/event shape.
  • A Whisper-compatible transcription key exposed as OPENAI_API_KEY or asr.openaiApiKey.

Quick Start

git clone https://github.com/GreyforgeLabs/voiceops.git
cd voiceops
npm install
cp voiceops.config.example.json voiceops.config.json

Edit voiceops.config.json, then run:

npm start

Configuration

voiceops.config.json is intentionally local and ignored by git.

{
  "discord": {
    "token": "YOUR_DISCORD_BOT_TOKEN"
  },
  "voiceChannelId": "YOUR_VOICE_CHANNEL_ID",
  "guildId": "YOUR_GUILD_ID",
  "operatorUserId": "YOUR_DISCORD_USER_ID",
  "gateway": {
    "url": "ws://127.0.0.1:18789",
    "token": "YOUR_GATEWAY_TOKEN",
    "sessionKey": "agent:main:voice:user",
    "scopes": ["operator"]
  },
  "asr": {
    "openaiApiKey": "YOUR_OPENAI_API_KEY",
    "model": "whisper-1",
    "language": "en"
  },
  "pipeline": {
    "maxUtteranceDurationMs": 30000,
    "utterancesPerMinuteLimit": 20,
    "maxQueuedUtterances": 8,
    "thinkingCueEnabled": true,
    "thinkingCueText": "One moment..."
  }
}

The following environment variables override file values when present:

Variable Purpose
VOICEOPS_DISCORD_TOKEN Discord bot token
VOICEOPS_GATEWAY_URL Gateway WebSocket URL
VOICEOPS_GATEWAY_TOKEN Gateway bearer token
OPENAI_API_KEY Transcription key

Gateway Protocol

VoiceOps expects a v3-style WebSocket gateway:

Server -> { type: "event", event: "connect.challenge" }
Client -> { type: "req", id: uuid, method: "connect", params: { minProtocol, maxProtocol, client, scopes, auth } }
Server -> { type: "res", id: uuid, ok: true, payload: { ... } }

Client -> { type: "req", id: uuid, method: "chat.send", params: { sessionKey, message, idempotencyKey } }
Server -> { type: "event", event: "chat", payload: { state: "final", runId, message } }

Final responses are matched by runId first and idempotencyKey second. Unmatched push events are routed to the optional response callback.

The optional thinking cue plays after transcription while the gateway request is already running. That masks gateway latency without delaying the actual response path.

Project Structure

voiceops/
  index.mjs
  src/
    asr.mjs
    config.mjs
    discord-voice.mjs
    gateway-client.mjs
    pipeline.mjs
    tts.mjs
    tts-worker.mjs
  voiceops.config.example.json
  package.json

Development

npm test

The test command syntax-checks all .mjs files. Runtime verification requires Discord credentials, a gateway, and a transcription key.

Security Notes

  • voiceops.config.json is ignored by git and should contain local secrets only.
  • The bot subscribes only to the configured operatorUserId.
  • The gateway token is sent only to the configured WebSocket URL.
  • Keep the Discord bot scoped to the specific server and channel you intend to use.

License

AGPL-3.0-only. See LICENSE.


Built by Greyforge

About

Full-duplex Discord voice pipeline for agent gateways, built as an OpenForge utility from Greyforge Labs.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors