PyVoiceAgent

A powerful, local-first interactive voice assistant built for offline capability and complete control. This project orchestrates state-of-the-art local AI models to provide a seamless voice-to-voice experience without relying on third-party cloud APIs.

Key Highlights

Full Voice Interaction: Talk to the agent and hear it speak back naturally.
Persistent Memory: Remembers your previous conversations across sessions using a robust SQLite database.
Local Intelligence: Powered by DeepSeek R1 (via Ollama) for reasoning and Faster Whisper for transcription.
Intelligent Summarization: Automatically summarizes interactions to maintain concise context.

Pros & Cons

Advantages	Trade-offs
Zero Cost: No recurring API fees; runs entirely on your hardware.	Hardware Dependent: Performance scales with your CPU/GPU power.
Offline: Works completely without an internet connection (after initial setup).	Setup: Requires installing and managing local models (Ollama, etc.).
Customizable: Full access to modify the graph, prompts, and memory logic.	Model Capability: Local models (e.g., 8B) are powerful but may lag behind massive cloud models (e.g., GPT-4) in complex reasoning.
Low Latency: Eliminates network latency constraints.	Resource Usage: Can be memory and compute intensive during inference.

Architecture

The system uses LangGraph to manage the conversational flow:

Transcribe: Faster Whisper converts your voice to text.
Context retrieval: Fetches conversation history and session context from SQLite.
Process: DeepSeek R1 generates a response and "thinks" through the problem.
Synthesize: Chatterbox TTS converts the text response back to audio.
Save & Summarize: The interaction is logged, and a summary is generated for future context.

Quick Start

Prerequisites

Python 3.10+
Ollama running locally with the model pulled: ollama pull deepseek-r1:8b
FFmpeg (often required for audio processing)

Installation

Clone & Setup:

git clone https://github.com/abrarshahh/PyVoiceAgent.git
cd PyVoiceAgent
python -m venv .venv
.venv\Scripts\activate  # MacOS/Ubuntu: source .venv/bin/activate

Install Dependencies:
```
pip install -r requirements.txt
```
Run the Server:
```
python -m fastapi run app/main.py
```

API Usage

The API is simple and RESTful. All endpoints support a session_id to track your specific conversation context.

1. Text Chat

POST /chat/text

{
  "text": "Hello, how are you?",
  "session_id": "optional-custom-session-id"
}

Returns: Audio file (.wav)

2. Voice Chat

POST /chat/voice

Form Data:
- file: (Audio file, e.g., mp3/wav)
- session_id: (Text, optional)

Returns: Audio file (.wav)

Project Structure

app/main.py: API entry point.
app/graph.py: LangGraph workflow definition.
app/nodes.py: Core logic for Transcription, LLM processing, and TTS.
app/database.py: SQLite storage and context management.
app/state.py: Data structures for the agent's state.
conversation_memory.db: Local database file (auto-created).

Built with FastAPI, LangGraph, and Ollama.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyVoiceAgent

Key Highlights

Pros & Cons

Architecture

Quick Start

Prerequisites

Installation

API Usage

1. Text Chat

2. Voice Chat

Project Structure

About

Uh oh!

Releases

Packages

Languages

abrarshahh/PyVoiceAgent

Folders and files

Latest commit

History

Repository files navigation

PyVoiceAgent

Key Highlights

Pros & Cons

Architecture

Quick Start

Prerequisites

Installation

API Usage

1. Text Chat

2. Voice Chat

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages