Skip to content

smallest-inc/cookbook

Repository files navigation

Smallest AI

Smallest AI Cookbook

Smallest AI offers an end-to-end Voice AI suite for developers building real-time voice agents. You can use our Speech-to-Text APIs through Pulse STT for high-accuracy transcription, our Text-to-Speech APIs through Lightning TTS for natural-sounding speech synthesis, or use the Atoms Client to build and operate enterprise-ready Voice Agents with features like tool calling, knowledge bases, and campaign management.

This cookbook contains practical examples and tutorials for building with Smallest AI's APIs. Each example is self-contained and demonstrates a real-world use case — from basic transcription to fully autonomous voice agents.

Documentation: Waves (STT & TTS) · Atoms (Voice Agents) · Python SDK


Try It Now (30 Seconds)

curl -X POST https://api.smallest.ai/waves/v1/lightning-v3.1/get_speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from Smallest AI!", "voice_id": "sophia", "sample_rate": 24000, "output_format": "wav"}' \
  --output hello.wav

Replace YOUR_API_KEY with your key from app.smallest.ai. That's it — you'll have audio in 2 seconds.


Usage

Prerequisites

  • uv (Python package manager)
  • Python >= 3.10 (install via uv python install 3.13 if needed)
  • A Smallest AI API key — get one at app.smallest.ai

Quick Start

Clone the repo, set up a virtual environment, and install the shared dependencies:

git clone https://github.com/smallest-inc/cookbook.git
cd cookbook
uv venv && uv pip install -r requirements.txt

Set up your API key

Each example reads keys from the environment. The easiest way is to copy the .env.sample included in every example directory:

cd speech-to-text/getting-started
cp .env.sample .env
# Add your keys to .env

Or export directly in your shell:

export SMALLEST_API_KEY="your-api-key-here"

Run an example

uv run speech-to-text/getting-started/python/transcribe.py recording.wav

Some examples need additional dependencies beyond the root requirements.txt. Each one has its own requirements.txt — install before running:

uv pip install -r speech-to-text/websocket/jarvis/requirements.txt
uv run speech-to-text/websocket/jarvis/jarvis.py

For voice agent examples:

uv pip install -r voice-agents/bank_csr/requirements.txt
uv run voice-agents/bank_csr/app.py

API Keys


Speech-to-Text Examples

Convert audio and video to text with industry-leading accuracy. Supports 30+ languages with features like speaker diarization, word timestamps, and emotion detection. Powered by Pulse STT.

  • Getting Started — Basic transcription, the simplest way to start
  • Jarvis Voice Assistant — Always-on assistant with wake word detection, LLM reasoning, and TTS
  • Online Meeting Notetaker — Join Google Meet / Zoom / Teams via Recall.ai, auto-identify speakers by name, generate structured notes
  • Podcast Summarizer — Transcribe and summarize podcasts with key takeaways using GPT
  • Emotion Analyzer — Visualize speaker emotions across a conversation with interactive charts

See all Speech-to-Text examples →


Text-to-Speech Examples

Generate natural-sounding speech from text with real-time latency. 80+ voices across 4 languages (en, hi, es, ta) with 44.1 kHz quality and ~200ms latency. Powered by Lightning TTS v3.1.

  • Quickstart — Generate speech in 5 lines of code, under 2 minutes
  • Getting Started — Configurable synthesis with voice, speed, language, output format
  • Voices — List and preview 80+ voices, filter by language, gender, and accent
  • Streaming — Real-time audio streaming via SSE and WebSocket
  • Pronunciation Dicts — Custom pronunciation for names, acronyms, and domain terms
  • Multilingual Translator — Hear text spoken in English, Hindi, Spanish, and Tamil side by side
  • Podcast Generator — AI podcast from a topic — LLM writes the script, TTS voices the hosts
  • Audiobook Generator — Convert any text file into a narrated, chaptered audiobook
  • Voice Gallery App — Web app to browse & preview all voices — deploy to Vercel
  • Expressive TTS — Control emotion, pitch, volume, accent (v3.2) + auto-detect with LLM
  • Chinese Whispers — Same sentence, 5 characters, wildly different emotions — viral demo
  • Language Translation App — Translate text between 40+ languages with TTS and STT — type or speak, hear results

See all Text-to-Speech examples →


Voice Agents Examples

Build AI voice agents that can talk to anyone on voice or text, in any language, in any voice. The Atoms SDK provides abstractions like KnowledgeBase, Campaigns, and graph-based Workflows to let you build the smartest voice agent for your use case. Powered by the Atoms SDK.

Basics

  • Getting Started — Create your first agent with OutputAgentNode, generate_response(), and AtomsApp
  • Agent with Tools — Add tool calling with @function_tool and ToolRegistry
  • Call Control — Cold/warm transfers and ending a call with SDKAgentTransferConversationEvent

Multi-Node Patterns

  • Background AgentBackgroundAgentNode for parallel processing, cross-node state sharing
  • Observability — Langfuse integration via BackgroundAgentNode — live traces, tool spans, transcript events
  • Language Switching — Multi-node agents with dynamic language detection and switching

Call Handling

  • Inbound IVR — Intent routing, department transfers, mute/unmute control
  • Interrupt Control — Mute/unmute events, blocking user interruptions during critical speech

Platform Features

  • Knowledge Base RAG — Attach a knowledge base with PDF upload and URL scraping for grounded responses
  • Campaigns — Provision bulk outbound calling with audiences and campaign management
  • Analytics — Call logs, transcript exports, post-call metrics

Advanced

  • Bank CSR — Full banking agent — SQL queries, multi-round tool chaining, identity verification, FD management, audit logging
  • Calendar Receptionist — Google Calendar, webhooks, agent duplication, React client
  • Multi-Agent Voice AI Dashboard — Real-time dashboard with specialized agents for gaming and utility powered by Atoms SDK.

See all Voice Agents examples →


Integrations

Use Smallest AI with popular frameworks and libraries.

LangChain

Build voice AI applications using LangChain for chains, agents, memory, and prompt orchestration with Smallest AI for STT and TTS.

See all LangChain integrations →


Language Support

Each example includes implementations in:

  • Python — Uses requests, websockets, and standard libraries
  • JavaScript — Uses node-fetch, ws, and Node.js built-ins

Contributing

See CONTRIBUTING.md for guidelines. In short:

  1. Create a folder with a descriptive name
  2. Add implementations in python/ and/or javascript/ subdirectories
  3. Include a README.md and .env.sample
  4. If the example needs deps beyond the root requirements.txt, add a local requirements.txt
  5. Update this root README with your new example

Get Help

About

Examples and guides for using the Smallest AI

Resources

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors