Smallest AI offers an end-to-end Voice AI suite for developers building real-time voice agents. You can use our Speech-to-Text APIs through Pulse STT for high-accuracy transcription, our Text-to-Speech APIs through Lightning TTS for natural-sounding speech synthesis, or use the Atoms Client to build and operate enterprise-ready Voice Agents with features like tool calling, knowledge bases, and campaign management.
This cookbook contains practical examples and tutorials for building with Smallest AI's APIs. Each example is self-contained and demonstrates a real-world use case — from basic transcription to fully autonomous voice agents.
Documentation: Waves (STT & TTS) · Atoms (Voice Agents) · Python SDK
curl -X POST https://api.smallest.ai/waves/v1/lightning-v3.1/get_speech \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Hello from Smallest AI!", "voice_id": "sophia", "sample_rate": 24000, "output_format": "wav"}' \
--output hello.wavReplace YOUR_API_KEY with your key from app.smallest.ai. That's it — you'll have audio in 2 seconds.
- uv (Python package manager)
- Python >= 3.10 (install via
uv python install 3.13if needed) - A Smallest AI API key — get one at app.smallest.ai
Clone the repo, set up a virtual environment, and install the shared dependencies:
git clone https://github.com/smallest-inc/cookbook.git
cd cookbook
uv venv && uv pip install -r requirements.txtEach example reads keys from the environment. The easiest way is to copy the .env.sample included in every example directory:
cd speech-to-text/getting-started
cp .env.sample .env
# Add your keys to .envOr export directly in your shell:
export SMALLEST_API_KEY="your-api-key-here"uv run speech-to-text/getting-started/python/transcribe.py recording.wavSome examples need additional dependencies beyond the root requirements.txt. Each one has its own requirements.txt — install before running:
uv pip install -r speech-to-text/websocket/jarvis/requirements.txt
uv run speech-to-text/websocket/jarvis/jarvis.pyFor voice agent examples:
uv pip install -r voice-agents/bank_csr/requirements.txt
uv run voice-agents/bank_csr/app.pySMALLEST_API_KEY— app.smallest.ai — Required by all examplesOPENAI_API_KEY— platform.openai.com — Podcast Summarizer, Meeting Notes, Voice AgentsGROQ_API_KEY— console.groq.com — YouTube Summarizer, JarvisRECALL_API_KEY— recall.ai — Meeting Notes
Convert audio and video to text with industry-leading accuracy. Supports 30+ languages with features like speaker diarization, word timestamps, and emotion detection. Powered by Pulse STT.
- Getting Started — Basic transcription, the simplest way to start
- Jarvis Voice Assistant — Always-on assistant with wake word detection, LLM reasoning, and TTS
- Online Meeting Notetaker — Join Google Meet / Zoom / Teams via Recall.ai, auto-identify speakers by name, generate structured notes
- Podcast Summarizer — Transcribe and summarize podcasts with key takeaways using GPT
- Emotion Analyzer — Visualize speaker emotions across a conversation with interactive charts
See all Speech-to-Text examples →
Generate natural-sounding speech from text with real-time latency. 80+ voices across 4 languages (en, hi, es, ta) with 44.1 kHz quality and ~200ms latency. Powered by Lightning TTS v3.1.
- Quickstart — Generate speech in 5 lines of code, under 2 minutes
- Getting Started — Configurable synthesis with voice, speed, language, output format
- Voices — List and preview 80+ voices, filter by language, gender, and accent
- Streaming — Real-time audio streaming via SSE and WebSocket
- Pronunciation Dicts — Custom pronunciation for names, acronyms, and domain terms
- Multilingual Translator — Hear text spoken in English, Hindi, Spanish, and Tamil side by side
- Podcast Generator — AI podcast from a topic — LLM writes the script, TTS voices the hosts
- Audiobook Generator — Convert any text file into a narrated, chaptered audiobook
- Voice Gallery App — Web app to browse & preview all voices — deploy to Vercel
- Expressive TTS — Control emotion, pitch, volume, accent (v3.2) + auto-detect with LLM
- Chinese Whispers — Same sentence, 5 characters, wildly different emotions — viral demo
- Language Translation App — Translate text between 40+ languages with TTS and STT — type or speak, hear results
See all Text-to-Speech examples →
Build AI voice agents that can talk to anyone on voice or text, in any language, in any voice. The Atoms SDK provides abstractions like KnowledgeBase, Campaigns, and graph-based Workflows to let you build the smartest voice agent for your use case. Powered by the Atoms SDK.
- Getting Started — Create your first agent with
OutputAgentNode,generate_response(), andAtomsApp - Agent with Tools — Add tool calling with
@function_toolandToolRegistry - Call Control — Cold/warm transfers and ending a call with
SDKAgentTransferConversationEvent
- Background Agent —
BackgroundAgentNodefor parallel processing, cross-node state sharing - Observability — Langfuse integration via
BackgroundAgentNode— live traces, tool spans, transcript events - Language Switching — Multi-node agents with dynamic language detection and switching
- Inbound IVR — Intent routing, department transfers, mute/unmute control
- Interrupt Control — Mute/unmute events, blocking user interruptions during critical speech
- Knowledge Base RAG — Attach a knowledge base with PDF upload and URL scraping for grounded responses
- Campaigns — Provision bulk outbound calling with audiences and campaign management
- Analytics — Call logs, transcript exports, post-call metrics
- Bank CSR — Full banking agent — SQL queries, multi-round tool chaining, identity verification, FD management, audit logging
- Calendar Receptionist — Google Calendar, webhooks, agent duplication, React client
- Multi-Agent Voice AI Dashboard — Real-time dashboard with specialized agents for gaming and utility powered by Atoms SDK.
See all Voice Agents examples →
Use Smallest AI with popular frameworks and libraries.
Build voice AI applications using LangChain for chains, agents, memory, and prompt orchestration with Smallest AI for STT and TTS.
- STT as LangChain Tool — Wrap Pulse STT as a LangChain Tool
- TTS as LangChain Tool — Wrap Lightning TTS as a LangChain Tool
- Voice-Optimized Prompts — Prompt templates tuned for spoken output
- Conversation Memory for Voice — Memory strategies for voice conversations
- Voice AI Agent — End-to-end example: audio → STT → LangChain agent → TTS → audio
See all LangChain integrations →
Each example includes implementations in:
- Python — Uses
requests,websockets, and standard libraries - JavaScript — Uses
node-fetch,ws, and Node.js built-ins
See CONTRIBUTING.md for guidelines. In short:
- Create a folder with a descriptive name
- Add implementations in
python/and/orjavascript/subdirectories - Include a
README.mdand.env.sample - If the example needs deps beyond the root
requirements.txt, add a localrequirements.txt - Update this root README with your new example
