AI-Powered Voice-Based Travel Planning System
This repository contains a production-ready voice travel planning agent that generates personalized, multi-day itineraries through natural, real-time voice conversations. The system integrates large language models, speech services, retrieval-augmented generation (RAG), and automated evaluation pipelines to ensure itinerary quality, feasibility, and factual grounding.
The project is designed as a modular experimentation and deployment framework, emphasizing reliability, explainability, and real-world usability in conversational AI systems.
The Voice Travel Agent enables users to plan trips entirely through voice interaction. It combines:
- Real-time speech-to-text and text-to-speech
- Tool-augmented LLM reasoning
- Retrieval-backed travel knowledge
- Automated quality evaluations
The system prioritizes natural interaction, practical itineraries, and verifiable information.
- Real-time voice-to-voice conversations
- Live transcript display
- Human-like speech synthesis
- Interruptible, multi-turn dialogue
- Multi-day travel plans
- POI discovery using external search tools
- Weather-aware scheduling
- Preference-based customization
- Contextual travel tips via RAG (Wikivoyage)
Every generated or edited itinerary is validated automatically using three evaluation layers:
-
Feasibility Evaluation
- Time-bounded daily activity validation
- Travel duration checks
- Balanced daily pacing
-
Edit Correctness Evaluation
- Ensures only intended sections are modified
- Detects unintended changes
- Interprets natural-language edit instructions
-
Grounding and Hallucination Evaluation
- Verifies POIs against search results
- Enforces source-backed travel tips
- Flags uncertainty when information is incomplete
Evaluation results are persisted for traceability and debugging.
The system is built around a LangGraph-based agent that orchestrates multiple external tools using the Model Context Protocol (MCP).
Core Layers:
- Client Layer: Web UI (WebSocket) and CLI
- Voice Layer: Speech-to-text and text-to-speech services
- API Layer: FastAPI server for session handling and email delivery
- Agent Layer: LangGraph state machine with memory and tool orchestration
- Tool Layer: Independent MCP servers for POI search, routing, and weather
- Knowledge Layer: RAG pipeline backed by Wikivoyage
- Evaluation Layer: Automated itinerary validation system
This layered separation ensures scalability, testability, and extensibility.
- Voice-native user experience
- Tool-grounded LLM reasoning
- Evaluation-first reliability
- Modular and extensible services
- Lazy loading and parallel execution
- User speaks a travel request
- Speech is transcribed in real time
- Agent determines the conversation phase (clarifying, planning, reviewing)
- External tools are invoked (POI search, routing, weather, RAG)
- Itinerary is synthesized
- Automated evaluations validate the output
- Response is converted to speech
- Results are optionally emailed and logged
- Language: Python
- Backend: FastAPI, WebSockets
- Agent Framework: LangGraph
- LLM Inference: Groq
- Voice Services: ElevenLabs
- RAG: Sentence Transformers, Pinecone
- Routing: OSRM
- Weather: Open-Meteo
- Email: Resend
- Deployment: Docker, Render
- Voice-based travel assistants
- Conversational AI planning tools
- Agent evaluation research
- Tool-augmented LLM systems
- Production-grade AI agent architectures
This project is licensed under the MIT License.