stt.api is a minimal, backend-agnostic R client for OpenAI-compatible speech-to-text (STT) APIs, with optional local fallbacks.
It lets you transcribe audio in R without caring which backend actually performs the transcription.
-
A thin R wrapper around OpenAI-style STT endpoints
-
A way to switch easily between:
- OpenAI
/v1/audio/transcriptions - Local OpenAI-compatible servers (LM Studio, OpenWebUI, AnythingLLM, Whisper containers)
- Local
{audio.whisper}if available
- OpenAI
-
Designed for scripting, Shiny apps, containers, and reproducible pipelines
- Not a Whisper reimplementation
- Not a model manager
- Not a GPU / CUDA helper
- Not an audio preprocessing toolkit
- Not a replacement for
{audio.whisper}
# From CRAN (once available)
install.packages("stt.api")
# Development version
remotes::install_github("cornball-ai/stt.api")Required dependencies are minimal:
curljsonlite
Optional backends:
{audio.whisper}(local transcription){processx}(Docker helpers)
library(stt.api)
set_stt_base("http://localhost:4123")
# Optional, for hosted services like OpenAI
set_stt_key(Sys.getenv("OPENAI_API_KEY"))
res <- stt("speech.wav")
res$textThis works with:
- OpenAI
- Chatterbox / Whisper containers
- LM Studio
- OpenWebUI
- AnythingLLM
- Any server implementing
/v1/audio/transcriptions
res <- stt("speech.wav", backend = "audio.whisper")
res$textIf {audio.whisper} is not installed and you request it explicitly, stt.api will error with clear instructions.
res <- stt("speech.wav")Backend priority:
- OpenAI-compatible API (if
stt.api.api_baseis set) {audio.whisper}(if installed)- Error with guidance
Regardless of backend, stt() always returns the same structure:
list(
text = "Transcribed text",
segments = NULL | data.frame(...),
language = "en",
backend = "api" | "audio.whisper",
raw = <raw backend response>
)This makes it easy to switch backends without changing downstream code.
stt_health()Returns:
list(
ok = TRUE,
backend = "api",
message = "OK"
)Useful for Shiny apps and deployment checks.
Explicit backend choice:
stt("speech.wav", backend = "api")
stt("speech.wav", backend = "audio.whisper")Automatic selection (default):
stt("speech.wav")stt.api targets the OpenAI-compatible STT spec:
POST /v1/audio/transcriptions
This is intentionally chosen because it is:
- Widely adopted
- Simple
- Supported by many local and hosted services
- Easy to proxy and containerize
If you run Whisper or OpenAI-compatible STT in Docker, stt.api can optionally integrate via {processx}.
Example use cases:
- Starting a local Whisper container
- Checking container health
- Inspecting logs
Docker helpers are explicit and opt-in.
stt.api never starts containers automatically.
options(
stt.api.api_base = NULL,
stt.api.api_key = NULL,
stt.api.timeout = 60,
stt.api.backend = "auto"
)Setters:
set_stt_base()
set_stt_key()- No silent failures
- Clear messages when a backend is unavailable
- Actionable instructions when configuration is missing
Example:
Error in stt():
No transcription backend available.
Set stt.api.api_base or install audio.whisper.
stt.api is designed to pair cleanly with tts.api:
| Task | Package |
|---|---|
| Speech → Text | stt.api |
| Text → Speech | tts.api |
Both share:
- Minimal dependencies
- OpenAI-compatible API focus
- Backend-agnostic design
- Optional Docker support
Installing and maintaining local Whisper backends can be difficult:
- CUDA / cuBLAS issues
- Compiler toolchains
- Platform differences
stt.api lets you decouple your R code from those concerns.
Your transcription code stays the same whether the backend is:
- Local
- Containerized
- Cloud-hosted
- GPU-accelerated
- CPU-only
MIT