Skip to content

8harath/Vishvam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

92 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 AI-Powered Voice-Driven Multilingual RAG Assistant

Python Version License Status

πŸ” Overview

An intelligent AI assistant that enables users to upload PDF documents (manuals, legal documents, research papers) and interact with them through text or voice queries. The system provides contextually relevant answers in multiple languages through text responses, voice synthesis, or AI-generated avatar videos.

Built with open-source LLMs, advanced RAG (Retrieval-Augmented Generation) pipelines, and comprehensive multilingual support to ensure accessibility across diverse user bases and use cases.


🎯 Key Features

πŸ“„ Document Processing

  • PDF Upload & Parsing - Support for complex document structures
  • Intelligent Text Chunking - Optimized content segmentation
  • Vector Embeddings - Semantic search capabilities

οΏ½ AI & Language Processing

  • Open-Source LLM Integration - DeepSeek, Mistral models
  • Advanced RAG Pipeline - Context-aware response generation
  • Multilingual Support - Automatic language detection and translation

πŸ”Š Voice & Interaction

  • Speech-to-Text - Voice query input
  • Text-to-Speech - Audio response generation
  • AI Avatar Integration - Lip-synced video responses

🌐 User Interface

  • Streamlit Web App - Interactive browser interface
  • Real-time Audio Recording - Seamless voice interaction
  • Responsive Design - Cross-device compatibility

πŸ§ͺ Use Cases

Domain Application
Legal Services Multilingual client assistance and document analysis
Technical Support Voice-driven appliance manuals and troubleshooting
Education Rural education tools in local languages
Healthcare Elderly-friendly voice assistants for medical information
Business Document Q&A for training materials and procedures

πŸ”§ Technology Stack

Core AI & ML

Component Technology Purpose
Language Models DeepSeek, Mistral Text generation and reasoning
Embeddings SentenceTransformers Semantic text representation
Vector Store FAISS Fast similarity search

Document Processing

Component Technology Purpose
PDF Parsing PyPDF2, PyMuPDF Extract text from documents
Text Processing Custom splitters Optimize content for retrieval

Speech & Language

Component Technology Purpose
Speech Recognition SpeechRecognition, Whisper Voice input processing
Text-to-Speech gTTS Audio response generation
Language Detection langdetect Automatic language identification
Translation Google Translate API Multilingual support

Interface & Deployment

Component Technology Purpose
Web Interface Streamlit Interactive user interface
Audio Components streamlit-audio-recorder Voice recording integration
Avatar Generation D-ID API AI-powered video responses

πŸš€ Development Phases

Our project is structured in five progressive phases, each building upon the previous to create a comprehensive AI assistant:

βœ… Phase 1: Core RAG Logic & Text Interaction

Lead Developer: Bharath | Deadline: July 23, 2025

🎯 Objectives

Establish the foundational RAG pipeline with basic text-based interaction capabilities.

οΏ½ Tasks

  • Initialize VS Code project and GitHub repository
  • Implement robust PDF parsing using PyPDF2
  • Develop intelligent text chunking algorithms
  • Generate semantic embeddings with SentenceTransformers
  • Integrate open-source LLM (DeepSeek 7B)
  • Build complete RAG pipeline (parse β†’ embed β†’ retrieve β†’ respond)
  • Create CLI interface via main.py

βœ… Deliverable: Context-aware text responses from PDF content using RAG methodology.


βœ… Phase 2: Speech I/O & Multilingual Support

Lead Developer: Suhas | Deadline: July 25, 2025

🎯 Objectives

Extend the system with voice interaction and comprehensive multilingual capabilities.

οΏ½ Tasks

  • Implement microphone input using SpeechRecognition
  • Add voice synthesis using gTTS
  • Deploy language detection with langdetect
  • Integrate query translation to English
  • Implement response translation to user's native language
  • Validate complete speech β†’ RAG β†’ speech workflow

βœ… Deliverable: Voice-driven multilingual interaction with spoken responses.


βœ… Phase 3: AI Avatar Integration

Lead Developer: Sreeya | Deadline: July 27, 2025

🎯 Objectives

Enhance user experience with AI-generated avatar responses for visual communication.

οΏ½ Tasks

  • Integrate D-ID API for avatar generation
  • Convert audio/text to synchronized lip-synced video responses
  • Develop modular avatar components for Streamlit integration
  • Optimize video generation performance

βœ… Deliverable: AI avatar delivering visually synchronized spoken responses.


βœ… Phase 4: Streamlit Interface - Text & File UI

Lead Developer: Kiran | Deadline: July 28, 2025

🎯 Objectives

Create an intuitive web-based interface for document upload and text-based interactions.

πŸ“‹ Tasks

  • Design PDF upload interface with validation
  • Implement chat-style text input/output
  • Display contextual text responses
  • Add language selection controls
  • Test complete text-based workflow in browser

βœ… Deliverable: Professional web application for PDF upload and text-based Q&A.


βœ… Phase 5: Complete Voice & Avatar UI + Deployment

Lead Developer: Vipul | Deadline: July 28, 2025

🎯 Objectives

Finalize the application with full voice and avatar capabilities, then deploy for production use.

οΏ½ Tasks

  • Integrate microphone components in Streamlit
  • Implement TTS audio playback functionality
  • Embed D-ID avatar video display
  • Deploy application using Streamlit Cloud/HuggingFace Spaces
  • Conduct comprehensive end-to-end testing

βœ… Deliverable: Production-ready web application with complete voice and avatar interaction capabilities.


οΏ½ Project Structure

ai-rag-assistant/
β”œβ”€β”€ πŸ“„ main.py                    # Entry point and CLI interface
β”œβ”€β”€ πŸ“‹ requirements.txt           # Python dependencies
β”œβ”€β”€ πŸ“– README.md                  # Project documentation
β”œβ”€β”€ βš™οΈ  config.py                 # Configuration settings
β”‚
β”œβ”€β”€ πŸ“‚ modules/                   # Core functionality modules
β”‚   β”œβ”€β”€ πŸ”§ __init__.py
β”‚   β”œβ”€β”€ πŸ“„ pdf_parser.py          # PDF document processing
β”‚   β”œβ”€β”€ βœ‚οΈ  text_splitter.py       # Content chunking algorithms
β”‚   β”œβ”€β”€ 🎯 embedder.py            # Vector embedding generation
β”‚   β”œβ”€β”€ 🧠 rag_pipeline.py        # RAG orchestration logic
β”‚   β”œβ”€β”€ πŸ’¬ llm_query.py           # LLM interaction layer
β”‚   β”œβ”€β”€ πŸŽ™οΈ  speech_module.py       # Voice I/O handling (Phase 2)
β”‚   β”œβ”€β”€ 🌍 multilingual.py        # Language processing (Phase 2)
β”‚   └── 🎭 avatar_generator.py    # AI avatar creation (Phase 3)
β”‚
β”œβ”€β”€ πŸ“‚ frontend/                  # User interface components
β”‚   β”œβ”€β”€ 🌐 streamlit_ui.py        # Main Streamlit application (Phase 4-5)
β”‚   β”œβ”€β”€ πŸ”Š audio_components.py    # Audio recording/playback
β”‚   └── 🎬 avatar_components.py   # Avatar display components
β”‚
β”œβ”€β”€ πŸ“‚ tests/                     # Test suites
β”‚   β”œβ”€β”€ πŸ§ͺ test_pdf_parser.py
β”‚   β”œβ”€β”€ πŸ§ͺ test_text_splitter.py
β”‚   └── πŸ§ͺ test_rag_pipeline.py
β”‚
β”œβ”€β”€ πŸ“‚ sample_data/               # Sample documents and outputs
β”‚   β”œβ”€β”€ πŸ“„ sample_document.pdf
β”‚   β”œβ”€β”€ πŸ“ sample_content.txt
β”‚   └── πŸ—‚οΈ  chunked_output.txt
β”‚
└── πŸ“‚ assets/                    # Static resources
    β”œβ”€β”€ πŸ“„ sample.pdf
    β”œβ”€β”€ 🎡 test_audio.wav
    └── πŸ–ΌοΈ  avatar_templates/

πŸ› οΈ Installation & Setup

Prerequisites

  • Python 3.8 or higher
  • Git
  • 4GB+ RAM recommended
  • Internet connection for model downloads

Quick Start

  1. Clone the repository

    git clone https://github.com/8harath/Vishvam.git
    cd Vishvam
  2. Create virtual environment

    python -m venv .venv
    
    # Windows
    .venv\Scripts\activate
    
    # macOS/Linux
    source .venv/bin/activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Configure environment variables (Optional)

    # Create .env file for API keys
    echo "D_ID_API_KEY=your_d_id_api_key_here" > .env

πŸš€ Usage

Command Line Interface (Phase 1)

python main.py
# Follow prompts to upload PDF and ask questions

Web Interface (Phases 4-5)

streamlit run frontend/streamlit_ui.py

Navigate to http://localhost:8501 in your browser.

Key Commands

  • Upload PDF: Drag and drop or use file picker
  • Text Query: Type questions in the chat interface
  • Voice Query: Click microphone button and speak
  • Language Selection: Choose output language from dropdown
  • Avatar Mode: Toggle AI avatar responses

πŸ§ͺ Testing

Run the comprehensive test suite:

# Run all tests
python -m pytest tests/ -v

# Run specific test modules
python -m pytest tests/test_pdf_parser.py -v
python -m pytest tests/test_text_splitter.py -v

# Generate coverage report
python -m pytest tests/ --cov=modules --cov-report=html

οΏ½ Development Team

Phase Lead Developer Responsibility Status
Phase 1 πŸ‘¨β€πŸ’» Bharath Core RAG Logic & Text Interaction βœ… Complete
Phase 2 πŸ‘¨β€πŸ’» Suhas Speech I/O & Multilingual Support βœ… Complete
Phase 3 πŸ‘©β€πŸ’» Sreeya AI Avatar Integration βœ… Complete
Phase 4 πŸ‘¨β€πŸ’» Kiran Streamlit UI - Text Interface βœ… Complete
Phase 5 πŸ‘¨β€πŸ’» Vipul Voice UI & Deployment βœ… Complete

πŸ“… Project Timeline

gantt
    title AI RAG Assistant Development Timeline
    dateFormat  YYYY-MM-DD
    section Phase 1
    Core RAG Logic          :done,    p1, 2025-07-19, 2025-07-23
    section Phase 2
    Speech & Multilingual   :done,    p2, 2025-07-24, 2025-07-25
    section Phase 3
    AI Avatar Integration   :done,    p3, 2025-07-26, 2025-07-27
    section Phase 4
    Streamlit Text UI       :done,    p4, 2025-07-27, 2025-07-28
    section Phase 5
    Voice UI & Deployment   :done,    p5, 2025-07-28, 2025-07-28
Loading

🀝 Contributing

We welcome contributions to improve the AI RAG Assistant! Please follow these guidelines:

Getting Started

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Follow our coding standards and add tests
  4. Commit your changes (git commit -m 'Add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

Code Standards

  • Follow PEP 8 for Python code
  • Add docstrings for all functions and classes
  • Include unit tests for new functionality
  • Update documentation as needed

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ†˜ Support & Contact


πŸ™ Acknowledgments

  • OpenAI for Whisper speech recognition models
  • Hugging Face for transformer models and embeddings
  • D-ID for AI avatar generation API
  • Streamlit team for the excellent web framework
  • The open-source community for various libraries and tools

Built with ❀️ by the Vishvam Development Team

GitHub stars GitHub forks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published