feat: support voice message STT (Speech-to-Text) for Discord by chaodu-agent · Pull Request #225 · openabdev/openab

chaodu-agent · 2026-04-11T21:21:02Z

Summary

Add optional Speech-to-Text support that transcribes Discord voice message attachments via any OpenAI-compatible /audio/transcriptions endpoint and injects the transcript into the ACP prompt as text.

Closes #224

Changes

File	Change	Description
`src/stt.rs`	NEW	~50 lines — HTTP POST to `/audio/transcriptions` via reqwest multipart
`src/config.rs`	MOD	Add `SttConfig` struct (enabled, api_key, model, base_url)
`src/discord.rs`	MOD	Detect `audio/*` attachments → download → call STT → inject transcript
`src/main.rs`	MOD	Wire `mod stt` + pass config to Handler
`Cargo.toml`	MOD	Add `multipart` and `json` features to reqwest

How it works

Discord voice msg (.ogg)
       │
       ▼
  download audio
       │
       ▼
  POST /audio/transcriptions ──► Groq / OpenAI / local whisper server
       │
       ▼
  inject "[Voice message transcript]: ..." as ContentBlock::Text
       │
       ▼
  ACP agent receives plain text ✅

Configuration

Feature is opt-in: disabled by default, zero impact when [stt] section is omitted.

Cloud (Groq free tier — default)

[stt]
enabled = true
api_key = "${GROQ_API_KEY}"
model = "whisper-large-v3-turbo"
# base_url = "https://api.groq.com/openai/v1"  # default

Local whisper server (Mac Mini / home lab)

For users already running a local whisper server (faster-whisper-server, whisper.cpp, LocalAI, etc.), just point base_url to it:

[stt]
enabled = true
api_key = "not-needed"
model = "large-v3-turbo"
base_url = "http://localhost:8080/v1"

No code changes needed — same OpenAI-compatible /audio/transcriptions endpoint, different URL.

All supported deployment options

Option	`base_url`	Cost
Groq Cloud (default)	`https://api.groq.com/openai/v1`	Free tier
OpenAI	`https://api.openai.com/v1`	~$0.006/min
Local whisper server	`http://localhost:8080/v1`	Free
LAN / sidecar	`http://192.168.x.x:8080/v1`	Free

Add optional STT support that transcribes Discord voice message attachments (audio/ogg) via any OpenAI-compatible /audio/transcriptions endpoint and injects the transcript into the ACP prompt as text. - New src/stt.rs: ~50-line module calling POST /audio/transcriptions - New SttConfig in config.rs: enabled, api_key, model, base_url - discord.rs: detect audio/* attachments, download, transcribe, inject - Defaults to Groq free tier (whisper-large-v3-turbo) - Supports any OpenAI-compatible endpoint via base_url (Groq, OpenAI, local whisper server, etc.) - Feature is opt-in: disabled by default, zero impact when unconfigured Closes openabdev#224

- Reuse shared HTTP_CLIENT in stt.rs instead of creating per-call client - Pass actual MIME type from attachment (not hardcoded audio/ogg) - Fix attachment routing: check audio first, avoid wasted image download - Add api_key validation at startup (fail fast on empty key) - Add response_format=json to multipart form (fixes local servers) - Update docs: clarify api_key requirement, add Technical Notes section

If stt.enabled = true and api_key is not set in config, openab automatically checks for GROQ_API_KEY in the environment. This allows minimal config: [stt] enabled = true No api_key line needed if the env var exists.

Prevents leaking Groq API key to unrelated endpoints when user sets a custom base_url without explicitly setting api_key.

The handler clones stt_config at construction time. Auto-detect was running after the clone, so the handler never received the detected api_key. Now auto-detect runs first.

thepagent

LGTM — all review feedback addressed. Clean opt-in STT feature, reuses shared HTTP client, proper MIME passthrough, startup validation, and correct initialization order.

…ev#225) * feat: support voice message STT (Speech-to-Text) for Discord Add optional STT support that transcribes Discord voice message attachments (audio/ogg) via any OpenAI-compatible /audio/transcriptions endpoint and injects the transcript into the ACP prompt as text. - New src/stt.rs: ~50-line module calling POST /audio/transcriptions - New SttConfig in config.rs: enabled, api_key, model, base_url - discord.rs: detect audio/* attachments, download, transcribe, inject - Defaults to Groq free tier (whisper-large-v3-turbo) - Supports any OpenAI-compatible endpoint via base_url (Groq, OpenAI, local whisper server, etc.) - Feature is opt-in: disabled by default, zero impact when unconfigured Closes openabdev#224 * fix: add json feature to reqwest for resp.json() in stt module * docs: add STT configuration and deployment guide * fix: address PR review feedback - Reuse shared HTTP_CLIENT in stt.rs instead of creating per-call client - Pass actual MIME type from attachment (not hardcoded audio/ogg) - Fix attachment routing: check audio first, avoid wasted image download - Add api_key validation at startup (fail fast on empty key) - Add response_format=json to multipart form (fixes local servers) - Update docs: clarify api_key requirement, add Technical Notes section * feat: auto-detect GROQ_API_KEY from env when stt.enabled=true If stt.enabled = true and api_key is not set in config, openab automatically checks for GROQ_API_KEY in the environment. This allows minimal config: [stt] enabled = true No api_key line needed if the env var exists. * fix: only auto-detect GROQ_API_KEY when base_url points to Groq Prevents leaking Groq API key to unrelated endpoints when user sets a custom base_url without explicitly setting api_key. * docs: clarify GROQ_API_KEY auto-detect scope in stt.md * fix: move STT auto-detect before handler construction The handler clones stt_config at construction time. Auto-detect was running after the clone, so the handler never received the detected api_key. Now auto-detect runs first. --------- Co-authored-by: openab-bot <openab-bot@users.noreply.github.com>

chaodu-agent requested a review from thepagent as a code owner April 11, 2026 21:21

openab-bot added 7 commits April 11, 2026 21:23

fix: add json feature to reqwest for resp.json() in stt module

030fc27

docs: add STT configuration and deployment guide

6b6107f

feat: auto-detect GROQ_API_KEY from env when stt.enabled=true

6da4356

If stt.enabled = true and api_key is not set in config, openab automatically checks for GROQ_API_KEY in the environment. This allows minimal config: [stt] enabled = true No api_key line needed if the env var exists.

fix: only auto-detect GROQ_API_KEY when base_url points to Groq

dad8a66

Prevents leaking Groq API key to unrelated endpoints when user sets a custom base_url without explicitly setting api_key.

docs: clarify GROQ_API_KEY auto-detect scope in stt.md

b8e9298

fix: move STT auto-detect before handler construction

bd8d0c2

The handler clones stt_config at construction time. Auto-detect was running after the clone, so the handler never received the detected api_key. Now auto-detect runs first.

thepagent approved these changes Apr 11, 2026

View reviewed changes

thepagent merged commit 45f8b9d into openabdev:main Apr 11, 2026
1 check passed

This was referenced Apr 12, 2026

fix: voice messages silently dropped when STT is not configured #251

Closed

fix(discord): react 🎤 on voice messages when STT is disabled #252

Merged

ruan330 mentioned this pull request Apr 13, 2026

feat: inline text/plain attachments into prompt (long-message support) #287

Closed

JARVIS-coding-Agent mentioned this pull request Apr 13, 2026

feat(discord): inline text-based file attachments into prompt #291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support voice message STT (Speech-to-Text) for Discord#225

feat: support voice message STT (Speech-to-Text) for Discord#225
thepagent merged 8 commits intoopenabdev:mainfrom
chaodu-agent:feat/stt-voice-message

chaodu-agent commented Apr 11, 2026 •

edited

Loading

Uh oh!

thepagent left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chaodu-agent commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

How it works

Configuration

Cloud (Groq free tier — default)

Local whisper server (Mac Mini / home lab)

All supported deployment options

Uh oh!

thepagent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaodu-agent commented Apr 11, 2026 •

edited

Loading