Smallest AI Cookbook

Smallest AI Cookbook

Smallest AI offers an end-to-end Voice AI suite for developers building real-time voice agents. You can use our Speech-to-Text APIs through Pulse STT for high-accuracy transcription, our Text-to-Speech APIs through Lightning TTS for natural-sounding speech synthesis, or use the Atoms Client to build and operate enterprise-ready Voice Agents with features like tool calling, knowledge bases, and campaign management.

This cookbook contains practical examples and tutorials for building with Smallest AI's APIs. Each example is self-contained and demonstrates a real-world use case — from basic transcription to fully autonomous voice agents.

Documentation: Waves (STT & TTS) · Atoms (Voice Agents) · Python SDK

Try It Now (30 Seconds)

curl -X POST https://api.smallest.ai/waves/v1/lightning-v3.1/get_speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from Smallest AI!", "voice_id": "sophia", "sample_rate": 24000, "output_format": "wav"}' \
  --output hello.wav

Replace YOUR_API_KEY with your key from app.smallest.ai. That's it — you'll have audio in 2 seconds.

Usage

Prerequisites

uv (Python package manager)
Python >= 3.10 (install via uv python install 3.13 if needed)
A Smallest AI API key — get one at app.smallest.ai

Quick Start

Clone the repo, set up a virtual environment, and install the shared dependencies:

git clone https://github.com/smallest-inc/cookbook.git
cd cookbook
uv venv && uv pip install -r requirements.txt

Set up your API key

Each example reads keys from the environment. The easiest way is to copy the .env.sample included in every example directory:

cd speech-to-text/getting-started
cp .env.sample .env
# Add your keys to .env

Or export directly in your shell:

export SMALLEST_API_KEY="your-api-key-here"

Run an example

uv run speech-to-text/getting-started/python/transcribe.py recording.wav

Some examples need additional dependencies beyond the root requirements.txt. Each one has its own requirements.txt — install before running:

uv pip install -r speech-to-text/websocket/jarvis/requirements.txt
uv run speech-to-text/websocket/jarvis/jarvis.py

For voice agent examples:

uv pip install -r voice-agents/bank_csr/requirements.txt
uv run voice-agents/bank_csr/app.py

API Keys

SMALLEST_API_KEY — app.smallest.ai — Required by all examples
OPENAI_API_KEY — platform.openai.com — Podcast Summarizer, Meeting Notes, Voice Agents
GROQ_API_KEY — console.groq.com — YouTube Summarizer, Jarvis
RECALL_API_KEY — recall.ai — Meeting Notes

Speech-to-Text Examples

Convert audio and video to text with industry-leading accuracy. Supports 30+ languages with features like speaker diarization, word timestamps, and emotion detection. Powered by Pulse STT.

Getting Started — Basic transcription, the simplest way to start
Jarvis Voice Assistant — Always-on assistant with wake word detection, LLM reasoning, and TTS
Online Meeting Notetaker — Join Google Meet / Zoom / Teams via Recall.ai, auto-identify speakers by name, generate structured notes
Podcast Summarizer — Transcribe and summarize podcasts with key takeaways using GPT
Emotion Analyzer — Visualize speaker emotions across a conversation with interactive charts

See all Speech-to-Text examples →

Text-to-Speech Examples

Generate natural-sounding speech from text with real-time latency. 80+ voices across 4 languages (en, hi, es, ta) with 44.1 kHz quality and ~200ms latency. Powered by Lightning TTS v3.1.

Quickstart — Generate speech in 5 lines of code, under 2 minutes
Getting Started — Configurable synthesis with voice, speed, language, output format
Voices — List and preview 80+ voices, filter by language, gender, and accent
Streaming — Real-time audio streaming via SSE and WebSocket
Pronunciation Dicts — Custom pronunciation for names, acronyms, and domain terms
Multilingual Translator — Hear text spoken in English, Hindi, Spanish, and Tamil side by side
Podcast Generator — AI podcast from a topic — LLM writes the script, TTS voices the hosts
Audiobook Generator — Convert any text file into a narrated, chaptered audiobook
Voice Gallery App — Web app to browse & preview all voices — deploy to Vercel
Expressive TTS — Control emotion, pitch, volume, accent (v3.2) + auto-detect with LLM
Chinese Whispers — Same sentence, 5 characters, wildly different emotions — viral demo
Language Translation App — Translate text between 40+ languages with TTS and STT — type or speak, hear results

See all Text-to-Speech examples →

Voice Agents Examples

Build AI voice agents that can talk to anyone on voice or text, in any language, in any voice. The Atoms SDK provides abstractions like KnowledgeBase, Campaigns, and graph-based Workflows to let you build the smartest voice agent for your use case. Powered by the Atoms SDK.

Basics

Getting Started — Create your first agent with OutputAgentNode, generate_response(), and AtomsApp
Agent with Tools — Add tool calling with @function_tool and ToolRegistry
Call Control — Cold/warm transfers and ending a call with SDKAgentTransferConversationEvent

Multi-Node Patterns

Background Agent — BackgroundAgentNode for parallel processing, cross-node state sharing
Observability — Langfuse integration via BackgroundAgentNode — live traces, tool spans, transcript events
Language Switching — Multi-node agents with dynamic language detection and switching

Call Handling

Inbound IVR — Intent routing, department transfers, mute/unmute control
Interrupt Control — Mute/unmute events, blocking user interruptions during critical speech

Platform Features

Knowledge Base RAG — Attach a knowledge base with PDF upload and URL scraping for grounded responses
Campaigns — Provision bulk outbound calling with audiences and campaign management
Analytics — Call logs, transcript exports, post-call metrics

Advanced

Bank CSR — Full banking agent — SQL queries, multi-round tool chaining, identity verification, FD management, audit logging
Calendar Receptionist — Google Calendar, webhooks, agent duplication, React client
Multi-Agent Voice AI Dashboard — Real-time dashboard with specialized agents for gaming and utility powered by Atoms SDK.

See all Voice Agents examples →

Integrations

Use Smallest AI with popular frameworks and libraries.

LangChain

Build voice AI applications using LangChain for chains, agents, memory, and prompt orchestration with Smallest AI for STT and TTS.

STT as LangChain Tool — Wrap Pulse STT as a LangChain Tool
TTS as LangChain Tool — Wrap Lightning TTS as a LangChain Tool
Voice-Optimized Prompts — Prompt templates tuned for spoken output
Conversation Memory for Voice — Memory strategies for voice conversations
Voice AI Agent — End-to-end example: audio → STT → LangChain agent → TTS → audio

See all LangChain integrations →

Language Support

Each example includes implementations in:

Python — Uses requests, websockets, and standard libraries
JavaScript — Uses node-fetch, ws, and Node.js built-ins

Contributing

See CONTRIBUTING.md for guidelines. In short:

Create a folder with a descriptive name
Add implementations in python/ and/or javascript/ subdirectories
Include a README.md and .env.sample
If the example needs deps beyond the root requirements.txt, add a local requirements.txt
Update this root README with your new example

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
assets		assets
best-practices		best-practices
blog-code-samples/pulse-stt-developer-guide		blog-code-samples/pulse-stt-developer-guide
integrations		integrations
speech-to-text		speech-to-text
text-to-speech		text-to-speech
voice-agents		voice-agents
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smallest AI Cookbook

Try It Now (30 Seconds)

Usage

Prerequisites

Quick Start

Set up your API key

Run an example

API Keys

Speech-to-Text Examples

Text-to-Speech Examples

Voice Agents Examples

Basics

Multi-Node Patterns

Call Handling

Platform Features

Advanced

Integrations

LangChain

Language Support

Contributing

Get Help

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smallest AI Cookbook

Try It Now (30 Seconds)

Usage

Prerequisites

Quick Start

Set up your API key

Run an example

API Keys

Speech-to-Text Examples

Text-to-Speech Examples

Voice Agents Examples

Basics

Multi-Node Patterns

Call Handling

Platform Features

Advanced

Integrations

LangChain

Language Support

Contributing

Get Help

About

Resources

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages