Skip to content

GenAIDevelopment/onboarding_rag_sample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Onboarding RAG System

A Retrieval-Augmented Generation (RAG) system designed as an AI-powered onboarding assistant for "Learning Thoughts" company. The system helps new employees get answers to onboarding questions by retrieving relevant information from company documents.

Key Components

Core Architecture:

  • Indexer: Processes and indexes company documents (PDFs) into vector embeddings
  • Retriever: Searches indexed documents for relevant context based on user queries
  • Generator: Uses retrieved context with LLM to generate accurate, contextual responses

Technology Stack:

  • LangChain framework for RAG pipeline
  • ChromaDB for vector storage
  • Google Vertex AI for embeddings and LLM (Gemini 2.5 Flash Lite)
  • PyPDF for document processing
  • SQLite for record management

Data Sources:

  • Employee handbook (PDF format) stored in data/ directory
  • Configurable to handle multiple document types and sources

Key Features:

  • Incremental indexing with cleanup management
  • Customizable chunking strategies (1000 chars, 100 overlap)
  • Source citation in responses
  • India-specific context awareness
  • Similarity and MMR search options

Usage

The system processes company documents, creates searchable embeddings, and provides an interactive Q&A interface where employees can ask onboarding-related questions and receive accurate, source-cited answers based solely on company documentation.

Installation

uv sync

Running

uv run onboarding

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages