A Retrieval-Augmented Generation (RAG) system designed as an AI-powered onboarding assistant for "Learning Thoughts" company. The system helps new employees get answers to onboarding questions by retrieving relevant information from company documents.
Core Architecture:
- Indexer: Processes and indexes company documents (PDFs) into vector embeddings
- Retriever: Searches indexed documents for relevant context based on user queries
- Generator: Uses retrieved context with LLM to generate accurate, contextual responses
Technology Stack:
- LangChain framework for RAG pipeline
- ChromaDB for vector storage
- Google Vertex AI for embeddings and LLM (Gemini 2.5 Flash Lite)
- PyPDF for document processing
- SQLite for record management
Data Sources:
- Employee handbook (PDF format) stored in
data/directory - Configurable to handle multiple document types and sources
Key Features:
- Incremental indexing with cleanup management
- Customizable chunking strategies (1000 chars, 100 overlap)
- Source citation in responses
- India-specific context awareness
- Similarity and MMR search options
The system processes company documents, creates searchable embeddings, and provides an interactive Q&A interface where employees can ask onboarding-related questions and receive accurate, source-cited answers based solely on company documentation.
uv syncuv run onboarding