WaNova

WhatsApp-first AI agent — multimodal LangGraph workflows on Meta Cloud API

You already have WhatsApp open — message and get answers back, no new app to install.

Full setup guide · Quick start · Demo · Issues · Contributing

Chat with WaNova in WhatsApp (text responses by default)
Send voice notes; WaNova transcribes with Groq Whisper and responds
Request voice responses; WaNova can send audio via ElevenLabs TTS
Send images; WaNova analyzes them with Groq vision
Request generated images; WaNova produces images using Together (FLUX.1-schnell-Free)
Keep context using memory (short-term state + Qdrant long-term memory)

Demo proof

This video is a demo version of WaNova's capabilities. The real WhatsApp-native flow is even faster in practice because users interact directly inside their existing chat.

How to ask (works best on WhatsApp)

Use explicit phrases when you want media responses:

Text Q&A: Explain X in simple terms
Image analysis: What is happening in this picture?
Generated image (explicit): Generate an image of a futuristic street market in Mumbai at night
Voice reply (explicit): Answer this as a voice note

Tip: the router decides the mode based on your request, so being explicit about voice note / image / generate image improves reliability.

Getting started

As a user: you just need the WhatsApp number where WaNova is running.

As an operator/developer: follow docs/GETTING_STARTED.md to configure env vars and run/deploy the services.

The WhatsApp webhook is implemented in src/ai_companion/interfaces/whatsapp/webhook_endpoint.py on route /whatsapp_response.

Quick start (local)

Copy env file:

cp .env.example .env

Fill required keys in .env:

GROQ_API_KEY, ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID, TOGETHER_API_KEY
QDRANT_URL, QDRANT_API_KEY
WHATSAPP_PHONE_NUMBER_ID, WHATSAPP_TOKEN, WHATSAPP_VERIFY_TOKEN

Start services:

docker compose up --build -d

Verify local endpoints:

Chainlit UI: http://localhost:8000
WhatsApp webhook: http://localhost:8080/whatsapp_response

Supported WhatsApp payload types

text
audio
image

Speech notes (audio) are transcribed with Whisper. If you set STT_LANGUAGE in .env, it forces the transcription language; otherwise Whisper will auto-detect.

Any other incoming message type receives a friendly fallback response asking the user to send text, audio, or image.

How it works

Meta's WhatsApp Cloud API calls the webhook (/whatsapp_response).
WaNova downloads any media, then:

audio -> Whisper STT (Groq)
image -> vision analysis (Groq)

LangGraph router decides the workflow: conversation, image, or audio.
The selected node calls the right providers:

conversation -> Groq chat model (TEXT_MODEL_NAME)
image -> Together images (FLUX.1-schnell-Free)
audio -> ElevenLabs TTS

WaNova sends the response back to the same WhatsApp user via the WhatsApp Cloud API.

Try these prompts

Text Q&A: "Explain X in simple terms."
Image analysis: "What's happening in this picture?"
Image generation (explicit request): "Generate an image of a futuristic street market in Mumbai at night."
Voice output (explicit request): "Answer this as a voice note."

How much does it cost?

The awesome thing about this project is you can run it on your own computer for free!

The free tiers from Groq, ElevenLabs, Qdrant Cloud, and Together AI are more than enough to get you going.

If you want to try it out on Google Cloud Run, you can get a free account and get $300 in free credits. Even if you've already used up your free credits, Cloud Run is super cheap - so it will take just a buck or two for your experiments.

The tech stack

Technology	Description
GROQ	Powering the project with Llama 3.3, Llama 3.2 Vision, and Whisper. Groq models are awesome (and fast!!)
Qdrant	Serving as the long-term database, enabling our agent to recall details you shared months ago.
GCP	Deploying your containers easily to Google Cloud Platform
Langgraph	Learn how to build production-ready LangGraph workflows
ElevenLabs	Amazing TTS models
together.ai	Behind WaNova's image generation process

Contributing

See CONTRIBUTING.md for setup, tests, and pull request expectations.

Code of Conduct

This project follows the Contributor Covenant. Replace the enforcement contact placeholder in that file with your email or GitHub handle before promoting the repo widely.

Security

See SECURITY.md for how to report vulnerabilities responsibly.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.chainlit		.chainlit
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.vscode		.vscode
docs		docs
notebooks		notebooks
src/ai_companion		src/ai_companion
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.chainlit		Dockerfile.chainlit
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
cloudbuild.yaml		cloudbuild.yaml
docker-compose.yml		docker-compose.yml
image.png		image.png
index.html		index.html
langgraph.json		langgraph.json
logo.png		logo.png
pyproject.toml		pyproject.toml
run.bat		run.bat
run.ps1		run.ps1
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WaNova

Table of Contents

WaNova: WhatsApp-first AI Agent

Who it's for

What you can do on WhatsApp

Demo proof

How to ask (works best on WhatsApp)

Getting started

Quick start (local)

Supported WhatsApp payload types

How it works

Try these prompts

How much does it cost?

The tech stack

Contributing

Code of Conduct

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WaNova

Table of Contents

WaNova: WhatsApp-first AI Agent

Who it's for

What you can do on WhatsApp

Demo proof

How to ask (works best on WhatsApp)

Getting started

Quick start (local)

Supported WhatsApp payload types

How it works

Try these prompts

How much does it cost?

The tech stack

Contributing

Code of Conduct

Security

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages