feat: support voice message STT (Speech-to-Text) for Discord#225
Merged
thepagent merged 8 commits intoopenabdev:mainfrom Apr 11, 2026
Merged
feat: support voice message STT (Speech-to-Text) for Discord#225thepagent merged 8 commits intoopenabdev:mainfrom
thepagent merged 8 commits intoopenabdev:mainfrom
Conversation
Add optional STT support that transcribes Discord voice message attachments (audio/ogg) via any OpenAI-compatible /audio/transcriptions endpoint and injects the transcript into the ACP prompt as text. - New src/stt.rs: ~50-line module calling POST /audio/transcriptions - New SttConfig in config.rs: enabled, api_key, model, base_url - discord.rs: detect audio/* attachments, download, transcribe, inject - Defaults to Groq free tier (whisper-large-v3-turbo) - Supports any OpenAI-compatible endpoint via base_url (Groq, OpenAI, local whisper server, etc.) - Feature is opt-in: disabled by default, zero impact when unconfigured Closes openabdev#224
added 7 commits
April 11, 2026 21:23
- Reuse shared HTTP_CLIENT in stt.rs instead of creating per-call client - Pass actual MIME type from attachment (not hardcoded audio/ogg) - Fix attachment routing: check audio first, avoid wasted image download - Add api_key validation at startup (fail fast on empty key) - Add response_format=json to multipart form (fixes local servers) - Update docs: clarify api_key requirement, add Technical Notes section
If stt.enabled = true and api_key is not set in config, openab automatically checks for GROQ_API_KEY in the environment. This allows minimal config: [stt] enabled = true No api_key line needed if the env var exists.
Prevents leaking Groq API key to unrelated endpoints when user sets a custom base_url without explicitly setting api_key.
The handler clones stt_config at construction time. Auto-detect was running after the clone, so the handler never received the detected api_key. Now auto-detect runs first.
thepagent
approved these changes
Apr 11, 2026
Collaborator
thepagent
left a comment
There was a problem hiding this comment.
LGTM — all review feedback addressed. Clean opt-in STT feature, reuses shared HTTP client, proper MIME passthrough, startup validation, and correct initialization order.
This was referenced Apr 12, 2026
Reese-max
pushed a commit
to Reese-max/openab
that referenced
this pull request
Apr 12, 2026
…ev#225) * feat: support voice message STT (Speech-to-Text) for Discord Add optional STT support that transcribes Discord voice message attachments (audio/ogg) via any OpenAI-compatible /audio/transcriptions endpoint and injects the transcript into the ACP prompt as text. - New src/stt.rs: ~50-line module calling POST /audio/transcriptions - New SttConfig in config.rs: enabled, api_key, model, base_url - discord.rs: detect audio/* attachments, download, transcribe, inject - Defaults to Groq free tier (whisper-large-v3-turbo) - Supports any OpenAI-compatible endpoint via base_url (Groq, OpenAI, local whisper server, etc.) - Feature is opt-in: disabled by default, zero impact when unconfigured Closes openabdev#224 * fix: add json feature to reqwest for resp.json() in stt module * docs: add STT configuration and deployment guide * fix: address PR review feedback - Reuse shared HTTP_CLIENT in stt.rs instead of creating per-call client - Pass actual MIME type from attachment (not hardcoded audio/ogg) - Fix attachment routing: check audio first, avoid wasted image download - Add api_key validation at startup (fail fast on empty key) - Add response_format=json to multipart form (fixes local servers) - Update docs: clarify api_key requirement, add Technical Notes section * feat: auto-detect GROQ_API_KEY from env when stt.enabled=true If stt.enabled = true and api_key is not set in config, openab automatically checks for GROQ_API_KEY in the environment. This allows minimal config: [stt] enabled = true No api_key line needed if the env var exists. * fix: only auto-detect GROQ_API_KEY when base_url points to Groq Prevents leaking Groq API key to unrelated endpoints when user sets a custom base_url without explicitly setting api_key. * docs: clarify GROQ_API_KEY auto-detect scope in stt.md * fix: move STT auto-detect before handler construction The handler clones stt_config at construction time. Auto-detect was running after the clone, so the handler never received the detected api_key. Now auto-detect runs first. --------- Co-authored-by: openab-bot <openab-bot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add optional Speech-to-Text support that transcribes Discord voice message attachments via any OpenAI-compatible
/audio/transcriptionsendpoint and injects the transcript into the ACP prompt as text.Closes #224
Changes
src/stt.rs/audio/transcriptionsvia reqwest multipartsrc/config.rsSttConfigstruct (enabled, api_key, model, base_url)src/discord.rsaudio/*attachments → download → call STT → inject transcriptsrc/main.rsmod stt+ pass config to HandlerCargo.tomlmultipartandjsonfeatures to reqwestHow it works
Configuration
Feature is opt-in: disabled by default, zero impact when
[stt]section is omitted.Cloud (Groq free tier — default)
Local whisper server (Mac Mini / home lab)
For users already running a local whisper server (faster-whisper-server, whisper.cpp, LocalAI, etc.), just point
base_urlto it:No code changes needed — same OpenAI-compatible
/audio/transcriptionsendpoint, different URL.All supported deployment options
base_urlhttps://api.groq.com/openai/v1https://api.openai.com/v1http://localhost:8080/v1http://192.168.x.x:8080/v1