This is a demo application showcasing Nova Sonic capabilities using a Next.js frontend with a FastAPI backend. The project demonstrates modular tool integration, real-time audio features, and dynamic UI rendering in a dynamic interface.
- Modular Tool System: Demonstrates extensible tool architecture
- Real-time Audio: Audio capture and playback capabilities
- UI Components: Supports dynamic rendering of tool outputs (text, cards, images, videos, PDFs)
- Frontend: Built with Next.js, TypeScript, and Tailwind CSS
- Async Backend: FastAPI-powered backend with async tool execution
- Barge-in Support: For natural voice interactions
The interface features two main control buttons:
- Power Button: Establishes WebSocket connection with the backend
- Mic Button: Starts/stops the Nova Sonic session for voice interaction
Usage Flow:
- Click Power to connect to backend
- Click Mic to begin voice conversation with Nova Sonic
- Speak naturally - tool outputs appear in the display canvas
sample-nova-sonic-agentic-chatbot
├── backend/ # Python FastAPI backend
│ ├── api/ # API endpoints and apps
│ ├── tools/ # Modular tool system
│ │ ├── base/ # Base classes and registry
│ │ ├── categories/ # Tool categories
│ │ │ ├── utility/ # Utility tools
│ │ │ ├── media/ # Media processing tools
│ │ │ └── order/ # Order management tools
│ │ └── tool_manager.py # Tool registration and management
│ ├── main.py # FastAPI application entry point
│ └── requirements.txt # Python dependencies
├── frontend/ # Next.js frontend
│ ├── app/ # Next.js app directory
│ ├── components/ # React components
│ │ ├── ui/ # Base UI components
│ │ ├── tool-outputs/ # Tool result components
│ │ └── apps/ # Application components
│ ├── lib/ # Utility functions
│ ├── public/ # Static assets
│ └── package.json # Node.js dependencies
└── README.md # Project documentation
- Python 3.8+
- Node.js 18+
- npm or yarn
-
Navigate to backend directory:
cd sample-nova-sonic-agentic-chatbot/backend -
Create virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the backend server:
python main.py
The application includes an optional debug audio recording feature for development and troubleshooting purposes.
Default Behavior: Audio recording is disabled by default.
To enable debug audio recording:
export SAVE_DEBUG_AUDIO=true
python main.pyTo explicitly disable debug audio recording:
export SAVE_DEBUG_AUDIO=false
python main.pyAudio files are saved to:
- Input audio:
backend/debug_audio/input_YYYYMMDD_HHMMSS.wav(16kHz, 16-bit) - Output audio:
backend/debug_audio/output_YYYYMMDD_HHMMSS.wav(24kHz, 16-bit)
Note: Debug audio files are created per session and automatically timestamped. This feature is useful for debugging audio quality issues or analyzing conversation flows.
-
Navigate to frontend directory:
cd sample-nova-sonic-agentic-chatbot/frontend -
Install dependencies:
npm install
-
Run the development server:
npm run dev
-
Access the application: Open http://localhost:3000 in your browser
- FastAPI Framework: High-performance async web framework
- Tool Registry: Dynamic tool discovery and registration system
- Category-based Organization: Tools organized by functionality
- Async Execution: Non-blocking tool execution with proper error handling
- Next.js 14: React framework with App Router
- TypeScript: Type-safe development
- Tailwind CSS: Utility-first CSS framework
- Component-based: Modular UI components for tool outputs
- Real-time Features: Audio capture and playback capabilities
The tool system demonstrates a modular architecture where each tool:
- Inherits from
BaseToolbase class - Defines its own configuration schema
- Implements async execution logic
- Returns dual results:
model_resultandui_result
Dual Result Architecture:
model_result: Sent back to Nova Sonic for context and conversation flowui_result: Sent to frontend withtypefield determining how content is displayed (cards, images, text, etc.)
- DateAndTimeTool: Current date and time information (Ask chatbot: What is the date today?)
- SampleImageTool: Image processing and display (Ask chatbot: Show me a sample image)
- SamplePdfTool: PDF document handling
- SampleVideoTool: Video content management
- TrackOrderTool: Order tracking and status updates (Ask chatbot: what is the status of order 2345?)
Create a new tool in the appropriate category folder:
from typing import Dict, Any
from ...base.tool import BaseTool
class MyNewTool(BaseTool):
def __init__(self):
super().__init__()
self.config = {
"name": "myNewTool",
"description": "Tool description for Nova Sonic",
"schema": {
"type": "object",
"properties": {
"param1": {
"type": "string",
"description": "Parameter description"
}
},
"required": ["param1"]
}
}
async def execute(self, content: Dict[str, Any]) -> Dict[str, Any]:
try:
# Tool logic here
result = {"data": "processed"}
return self.format_response(
model_result=result, # Goes to Nova Sonic
ui_result={ # Goes to frontend UI
"type": "card", # UI component type
"content": {
"title": "Result",
"description": "Tool output"
}
}
)
except Exception as e:
return self.format_response(
{"error": str(e)},
{"type": "text", "content": {"title": "Error", "message": str(e)}}
)Add your tool to backend/tools/tool_manager.py:
from .categories.utility import MyNewTool
# In _initialize_registry method:
self.registry.register_tools([
MyNewTool(),
# ... other tools
])Update the category's __init__.py:
from .my_new_tool import MyNewTool
__all__ = ['MyNewTool']The system has some pre-built UI component types that can bee hooked up to tool responses (via ui_result):
- Text: Simple text output
- Card: Rich card with title, description, and details
- Image: Image display with metadata
- Video: Embedded video content
- PDF: PDF document viewer
