Skip to content

A modular, year-aware RAG system built with LlamaIndex and ChromaDB, implementing cumulative knowledge access for academic environments

License

Notifications You must be signed in to change notification settings

ZaheerH-03/Modular-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ“ Scholarly-RAG: Multi-Tier Academic Assistant

Scholarly-RAG is a specialized Retrieval-Augmented Generation (RAG) system designed for college ecosystems. It implements a Modular Silo Architecture that segregates academic data by branch (DS, AI, CSE) and year, providing students with context-aware answers based on their academic level.

πŸš€ Key Features

  • Modular Silo Architecture: Stores data in separate ChromaDB collections following the year_yr_branch format.
  • Cumulative Access Logic: Implements a hierarchical retrieval system where 4th-year students can access 1st-3rd year data, while junior access remains restricted to their level.
  • Semantic Chunking: Utilizes SemanticSplitterNodeParser to break text based on "topic shifts" rather than arbitrary character limits, ensuring mathematical and logical context is preserved.
  • Local Embedding Support: Powered by the BAAI/BGE-M3 model loaded from local paths, with full support for CUDA/GPU acceleration.
  • Persistent Storage: Uses ChromaDB to save embeddings to disk, eliminating the need for re-processing documents during every session.

πŸ› οΈ Tech Stack

  • Framework: LlamaIndex (v0.10+)
  • Vector Database: ChromaDB
  • Embedding Model: BAAI/BGE-M3 (HuggingFace)
  • Language: Python 3.10+
  • Acceleration: PyTorch with CUDA support

πŸ“ Project Structure

ModularRAG/
β”œβ”€β”€ data/                 # Raw PDF files organized by year (Ignored in Git)
β”‚   β”œβ”€β”€ 1st_yr_ds/
β”‚   β”œβ”€β”€ 4th_yr_ds/
β”œβ”€β”€ ingestion/            # Core data processing logic
β”‚   β”œβ”€β”€ data_loader.py    # Recursive directory and cumulative path logic
β”‚   └── build_index.py    # Splitting, Embedding, and Vector Storage
β”œβ”€β”€ models/               # Local model weights/snapshots (Ignored in Git)
β”œβ”€β”€ chromadb/             # Persistent Vector Store directory (Ignored in Git)
β”œβ”€β”€ requirements.txt      # Python dependencies
└── README.md

About

A modular, year-aware RAG system built with LlamaIndex and ChromaDB, implementing cumulative knowledge access for academic environments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages