An intelligent AI assistant that enables users to upload PDF documents (manuals, legal documents, research papers) and interact with them through text or voice queries. The system provides contextually relevant answers in multiple languages through text responses, voice synthesis, or AI-generated avatar videos.
Built with open-source LLMs, advanced RAG (Retrieval-Augmented Generation) pipelines, and comprehensive multilingual support to ensure accessibility across diverse user bases and use cases.
- PDF Upload & Parsing - Support for complex document structures
- Intelligent Text Chunking - Optimized content segmentation
- Vector Embeddings - Semantic search capabilities
- Open-Source LLM Integration - DeepSeek, Mistral models
- Advanced RAG Pipeline - Context-aware response generation
- Multilingual Support - Automatic language detection and translation
- Speech-to-Text - Voice query input
- Text-to-Speech - Audio response generation
- AI Avatar Integration - Lip-synced video responses
- Streamlit Web App - Interactive browser interface
- Real-time Audio Recording - Seamless voice interaction
- Responsive Design - Cross-device compatibility
| Domain | Application |
|---|---|
| Legal Services | Multilingual client assistance and document analysis |
| Technical Support | Voice-driven appliance manuals and troubleshooting |
| Education | Rural education tools in local languages |
| Healthcare | Elderly-friendly voice assistants for medical information |
| Business | Document Q&A for training materials and procedures |
| Component | Technology | Purpose |
|---|---|---|
| Language Models | DeepSeek, Mistral | Text generation and reasoning |
| Embeddings | SentenceTransformers | Semantic text representation |
| Vector Store | FAISS | Fast similarity search |
| Component | Technology | Purpose |
|---|---|---|
| PDF Parsing | PyPDF2, PyMuPDF | Extract text from documents |
| Text Processing | Custom splitters | Optimize content for retrieval |
| Component | Technology | Purpose |
|---|---|---|
| Speech Recognition | SpeechRecognition, Whisper | Voice input processing |
| Text-to-Speech | gTTS | Audio response generation |
| Language Detection | langdetect | Automatic language identification |
| Translation | Google Translate API | Multilingual support |
| Component | Technology | Purpose |
|---|---|---|
| Web Interface | Streamlit | Interactive user interface |
| Audio Components | streamlit-audio-recorder | Voice recording integration |
| Avatar Generation | D-ID API | AI-powered video responses |
Our project is structured in five progressive phases, each building upon the previous to create a comprehensive AI assistant:
Lead Developer: Bharath | Deadline: July 23, 2025
Establish the foundational RAG pipeline with basic text-based interaction capabilities.
- Initialize VS Code project and GitHub repository
- Implement robust PDF parsing using
PyPDF2 - Develop intelligent text chunking algorithms
- Generate semantic embeddings with
SentenceTransformers - Integrate open-source LLM (DeepSeek 7B)
- Build complete RAG pipeline (parse β embed β retrieve β respond)
- Create CLI interface via
main.py
β Deliverable: Context-aware text responses from PDF content using RAG methodology.
Lead Developer: Suhas | Deadline: July 25, 2025
Extend the system with voice interaction and comprehensive multilingual capabilities.
- Implement microphone input using
SpeechRecognition - Add voice synthesis using
gTTS - Deploy language detection with
langdetect - Integrate query translation to English
- Implement response translation to user's native language
- Validate complete speech β RAG β speech workflow
β Deliverable: Voice-driven multilingual interaction with spoken responses.
Lead Developer: Sreeya | Deadline: July 27, 2025
Enhance user experience with AI-generated avatar responses for visual communication.
- Integrate D-ID API for avatar generation
- Convert audio/text to synchronized lip-synced video responses
- Develop modular avatar components for Streamlit integration
- Optimize video generation performance
β Deliverable: AI avatar delivering visually synchronized spoken responses.
Lead Developer: Kiran | Deadline: July 28, 2025
Create an intuitive web-based interface for document upload and text-based interactions.
- Design PDF upload interface with validation
- Implement chat-style text input/output
- Display contextual text responses
- Add language selection controls
- Test complete text-based workflow in browser
β Deliverable: Professional web application for PDF upload and text-based Q&A.
Lead Developer: Vipul | Deadline: July 28, 2025
Finalize the application with full voice and avatar capabilities, then deploy for production use.
- Integrate microphone components in Streamlit
- Implement TTS audio playback functionality
- Embed D-ID avatar video display
- Deploy application using Streamlit Cloud/HuggingFace Spaces
- Conduct comprehensive end-to-end testing
β Deliverable: Production-ready web application with complete voice and avatar interaction capabilities.
ai-rag-assistant/
βββ π main.py # Entry point and CLI interface
βββ π requirements.txt # Python dependencies
βββ π README.md # Project documentation
βββ βοΈ config.py # Configuration settings
β
βββ π modules/ # Core functionality modules
β βββ π§ __init__.py
β βββ π pdf_parser.py # PDF document processing
β βββ βοΈ text_splitter.py # Content chunking algorithms
β βββ π― embedder.py # Vector embedding generation
β βββ π§ rag_pipeline.py # RAG orchestration logic
β βββ π¬ llm_query.py # LLM interaction layer
β βββ ποΈ speech_module.py # Voice I/O handling (Phase 2)
β βββ π multilingual.py # Language processing (Phase 2)
β βββ π avatar_generator.py # AI avatar creation (Phase 3)
β
βββ π frontend/ # User interface components
β βββ π streamlit_ui.py # Main Streamlit application (Phase 4-5)
β βββ π audio_components.py # Audio recording/playback
β βββ π¬ avatar_components.py # Avatar display components
β
βββ π tests/ # Test suites
β βββ π§ͺ test_pdf_parser.py
β βββ π§ͺ test_text_splitter.py
β βββ π§ͺ test_rag_pipeline.py
β
βββ π sample_data/ # Sample documents and outputs
β βββ π sample_document.pdf
β βββ π sample_content.txt
β βββ ποΈ chunked_output.txt
β
βββ π assets/ # Static resources
βββ π sample.pdf
βββ π΅ test_audio.wav
βββ πΌοΈ avatar_templates/
- Python 3.8 or higher
- Git
- 4GB+ RAM recommended
- Internet connection for model downloads
-
Clone the repository
git clone https://github.com/8harath/Vishvam.git cd Vishvam -
Create virtual environment
python -m venv .venv # Windows .venv\Scripts\activate # macOS/Linux source .venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment variables (Optional)
# Create .env file for API keys echo "D_ID_API_KEY=your_d_id_api_key_here" > .env
python main.py
# Follow prompts to upload PDF and ask questionsstreamlit run frontend/streamlit_ui.pyNavigate to http://localhost:8501 in your browser.
- Upload PDF: Drag and drop or use file picker
- Text Query: Type questions in the chat interface
- Voice Query: Click microphone button and speak
- Language Selection: Choose output language from dropdown
- Avatar Mode: Toggle AI avatar responses
Run the comprehensive test suite:
# Run all tests
python -m pytest tests/ -v
# Run specific test modules
python -m pytest tests/test_pdf_parser.py -v
python -m pytest tests/test_text_splitter.py -v
# Generate coverage report
python -m pytest tests/ --cov=modules --cov-report=html| Phase | Lead Developer | Responsibility | Status |
|---|---|---|---|
| Phase 1 | π¨βπ» Bharath | Core RAG Logic & Text Interaction | β Complete |
| Phase 2 | π¨βπ» Suhas | Speech I/O & Multilingual Support | β Complete |
| Phase 3 | π©βπ» Sreeya | AI Avatar Integration | β Complete |
| Phase 4 | π¨βπ» Kiran | Streamlit UI - Text Interface | β Complete |
| Phase 5 | π¨βπ» Vipul | Voice UI & Deployment | β Complete |
gantt
title AI RAG Assistant Development Timeline
dateFormat YYYY-MM-DD
section Phase 1
Core RAG Logic :done, p1, 2025-07-19, 2025-07-23
section Phase 2
Speech & Multilingual :done, p2, 2025-07-24, 2025-07-25
section Phase 3
AI Avatar Integration :done, p3, 2025-07-26, 2025-07-27
section Phase 4
Streamlit Text UI :done, p4, 2025-07-27, 2025-07-28
section Phase 5
Voice UI & Deployment :done, p5, 2025-07-28, 2025-07-28
We welcome contributions to improve the AI RAG Assistant! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Follow our coding standards and add tests
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 for Python code
- Add docstrings for all functions and classes
- Include unit tests for new functionality
- Update documentation as needed
This project is licensed under the MIT License - see the LICENSE file for details.
- GitHub Issues: Report bugs or request features
- Documentation: Project Wiki
- OpenAI for Whisper speech recognition models
- Hugging Face for transformer models and embeddings
- D-ID for AI avatar generation API
- Streamlit team for the excellent web framework
- The open-source community for various libraries and tools