Skip to content

A context-aware RAG Research Assistant for deep analysis of dense technical documents. Optimized for academic workflows with structural awareness and persistent history.

Notifications You must be signed in to change notification settings

cyrexez/study-rag-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 Study RAG Assistant

test.mov

Uploading Screen Recording 2026-01-08 at 16.01.53.mov…

Python Streamlit LangChain Gemini

A context-aware Research Assistant for deep technical document analysis. This tool is designed to aid the consumption of documents for students and researchers studying dense materials by providing structural awareness and persistent conversation memory.

🌟 Key Technical Features

  • Structural Context Buffer: Unlike standard RAG systems, this assistant extracts and caches the first 30 pages of documents to understand the "global structure" (Table of Contents, Introduction, and Chapters), allowing it to answer high-level structural questions.
  • Persistent Multi-Book Workspace: Switch between different research subjects in your library without losing your specific chat history for each book.
  • High-Speed Streaming: Optimized "typewriter-style" response delivery using Gemini 2.5 Flash for a seamless, ChatGPT-like user experience.
  • Security-First: Fully integrated with python-dotenv to ensure API keys remain private and are never leaked to version control.

🛠 Tech Stack

  • LLM: Google Gemini 2.5 Flash
  • Embeddings: Google text-embedding-004
  • Vector Store: FAISS (Facebook AI Similarity Search)
  • Orchestration: LangChain
  • UI Framework: Streamlit

🚀 Quick Start

1. Clone & Setup

git clone [https://github.com/cyrexez/study-rag-assistant.git](https://github.com/cyrexez/study-rag-assistant.git)
cd study-rag-assistant
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Configure API Key

Create a .env file in the root directory: GOOGLE_API_KEY=your_gemini_api_key_here

3. Run the Assistant

python -m streamlit run app.py

📂 Project Structure

app.py: Streamlit frontend and session state management.

rag_backend.py: Core RAG logic, structural indexing, and retrieval chains.

data/: Local directory for your academic PDFs (excluded from Git).

vectorstore/: Local FAISS index storage (excluded from Git).


💡 Credits & Inspiration

This project was inspired by the original RAG-with-Langchain-and-FastAPI repository by Ana Rojo-Echeburúa.

I have modified and extended the original concept to better suit academic research needs by implementing:

  • Streamlit-based Interactive UI: Replaced the FastAPI backend with a dedicated researcher dashboard.
  • Google Gemini 2.5 Flash Integration: Transitioned from OpenAI to Gemini.
  • Structural Context Retrieval: Added specialized logic to index Table of Contents and document structure for better global awareness.
  • Persistent Chat History: Enabled per-book session management to keep multiple research threads organized.

About

A context-aware RAG Research Assistant for deep analysis of dense technical documents. Optimized for academic workflows with structural awareness and persistent history.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages