TTS2 Voice Agent

Multi-character AI voice agent with Knowledge Graph memory.

Self-contained project with all models and dependencies bundled. Designed for Hybrid environments: Run the heavy AI core (IndexTTS2) in WSL for Linux performance, while keeping your tools (LM Studio, Microphone) on Windows.

Key Features

Multi-Character: Isolated memory graphs and voices (Hermione, Lisbeth, Assistant).
Graph Memory: SQLite-based Knowledge Graph with GraphRAG community detection.
Agent Skills: MCP tools, Everything search, web search, file access, and more.
Dual TTS: IndexTTS2 (voice cloning) or Kokoro (fast ONNX, ~80ms latency).
Emotion Detection: wav2vec2-based speech emotion recognition.
Hybrid Architecture: Seamlessly bridges WSL (AI) and Windows (Audio/Tools).
Flexible AI: Supports OpenRouter (Cloud) and LM Studio (Local).

Quick Start (Hybrid / Windows)

New User? Read the Installation Guide for detailed setup.

1. Run the Launcher

Right-click VoiceChat.bat → Run as administrator

Note: Administrator privileges are required for WSL2 installation and keyboard hooks (PTT).

2. Install

Select Option [5] Install Dependencies.

Sets up Python venv in WSL (for AI).
Sets up Audio tools in Windows.

3. Configure

Edit config.env:

OPENROUTER_API_KEY=sk-or-v1-your-key...
# LM_STUDIO_HOST=  <-- Leave commented for Auto-Detection!

4. Launch

Select Option [1] Start Voice Chat.

PC Migration

Moving to a new PC? This project is now fully self-contained!

Quick summary:

Copy C:\AI\tts2-voice-agent to external drive (~9GB)
On new PC: Copy folder and run python setup_new_pc.py

All models and dependencies are bundled in the models/ and lib/ directories.

Architecture

                        WINDOWS HOST
    ┌──────────────┐   ┌──────────────┐   ┌─────────────┐
    │  PTT / VAD   │   │  LM Studio   │   │ Microphone  │
    │   (Python)   │   │ (Local AI)   │   │  (Input)    │
    └──────┬───────┘   └──────▲───────┘   └──────┬──────┘
           │ Audio            │ HTTP             │ Audio
           ▼                  ▼                  ▼
    ═══════╪══════════════════╪══════════════════╪═══════════
           │                  │                  │
           │             WSL (UBUNTU)            │
    ┌──────▼───────┐   ┌──────┴───────┐   ┌──────▼──────┐
    │  App Server  │◄─►│  LLM Client  │   │   Whisper   │
    │ (Gradio UI)  │   │   (Logic)    │   │    STT      │
    └──────┬───────┘   └──────┬───────┘   └─────────────┘
           │                  │
           ▼                  ▼
    ┌──────────────┐   ┌──────────────┐   ┌─────────────┐
    │  Services    │   │ Graph Memory │   │  Emotion    │
    │   Layer ◆    │   │ + GraphRAG   │   │  Detection  │
    └──────┬───────┘   └──────────────┘   └─────────────┘
           │
           ▼
    ┌──────────────┐   ┌──────────────┐
    │  IndexTTS2   │   │   Kokoro     │
    │  (High Fid)  │   │   (Fast)     │
    └──────────────┘   └──────────────┘

◆ Phase 1 Complete: Service layer extracted for better maintainability

Mobile Access

Access the voice agent from your phone or any device on your network.

Quick Start

Right-click VoiceChat.bat → Run as administrator
Select Option [2] Mobile/Remote Access
A public HTTPS URL will be generated (e.g., https://xxxxx.gradio.live)
Open the URL on your phone

Features

Mobile PTT Button - Large touch-friendly "HOLD TO TALK" button
PWA Support - Install as a webapp on your phone (Add to Home Screen)
HTTPS Required - Microphone access requires secure connection (handled automatically)

Tips

The Gradio share URL is temporary (72 hours max)
For permanent access, consider Cloudflare Tunnel
PWA install: On mobile browser, tap Share > "Add to Home Screen"

Controls

Action	Key / Mode
Talk (PTT)	Hold Right Shift (Release to send)
Mobile PTT	Hold the "HOLD TO TALK" button
Hands-Free	Toggle "Hands-Free Mode" in UI
Stop Audio	Click "Stop" in UI

TTS Options

Backend	Speed	Quality	VRAM	Voice Clone
IndexTTS2	~800ms	Excellent	4-6GB	Yes (5s sample)
Kokoro	~80ms	Good	~500MB	Preset voices

Switch in Settings > Audio Settings > TTS Backend.

Tools

Tool	Description
`everything_search`	PC-wide file search (requires Everything)
`web_search`	DuckDuckGo search (no API key)
`wikipedia`	Wikipedia lookups
`read_file` / `write_file`	Sandboxed file access
`create_skill`	Agent creates new skills
MCP Tools	Via `mcp_config.json`

Documentation

Installation Guide - Step-by-step setup.
User Manual - Complete user guide.
Architecture Visuals - Diagrams of Hybrid System & Memory.
Technical Reference - Deep dive into Graph Memory & MCP.
Migration Guide - Moving to a new PC.
Disclaimer - Legal & Ethical guidelines.

Troubleshooting

Issue	Solution
LM Studio Orange/Red	Run as Admin, Port 1235. See Docs.
PTT Not Working	Use Right Shift. Ensure `VoiceChat.bat` is running.
Connection Refused	Check `config.env`. Comment out `LM_STUDIO_HOST`.
Emotion always "sad"	Likely audio issue - check mic levels

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
audio		audio
checkpoints		checkpoints
config		config
docker		docker
docs		docs
generated_audio		generated_audio
lib		lib
memory		memory
recordings		recordings
scripts		scripts
services		services
sessions		sessions
skills		skills
tests		tests
tools		tools
ui		ui
voice_reference		voice_reference
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
VoiceChat.bat		VoiceChat.bat
character_manager_ui.py		character_manager_ui.py
config.env.example		config.env.example
docker-compose.yml		docker-compose.yml
generate_bat.py		generate_bat.py
group_manager.py		group_manager.py
mcp_client.py		mcp_client.py
mcp_config.json		mcp_config.json
mcp_manager_ui.py		mcp_manager_ui.py
paths.py		paths.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
restructure.py		restructure.py
settings_ui.py		settings_ui.py
streaming.py		streaming.py
tts2_agent.py		tts2_agent.py
utils.py		utils.py
voicechat.sh		voicechat.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTS2 Voice Agent

Key Features

Quick Start (Hybrid / Windows)

1. Run the Launcher

2. Install

3. Configure

4. Launch

PC Migration

Architecture

Mobile Access

Quick Start

Features

Tips

Controls

TTS Options

Tools

Documentation

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TTS2 Voice Agent

Key Features

Quick Start (Hybrid / Windows)

1. Run the Launcher

2. Install

3. Configure

4. Launch

PC Migration

Architecture

Mobile Access

Quick Start

Features

Tips

Controls

TTS Options

Tools

Documentation

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages