LangGraph Research Assistant

An intelligent AI-powered research assistant that autonomously searches, retrieves, and synthesizes information using multi-agent workflows. Built with LangGraph state machines, RAG pipelines, and semantic search.

🎯 Project Overview

This research assistant demonstrates advanced AI agent capabilities by intelligently routing between a local knowledge base and web search. The agent maintains conversation context, learns from web searches by saving results to its knowledge base, and provides cited answers to user queries.

Key Innovation: The agent uses LangGraph's state machine architecture to make intelligent decisions about information retrieval, creating a self-improving system that gets smarter with each query.

✨ Features

Implemented

✅ Semantic Search with RAG - Vector-based document retrieval using ChromaDB embeddings
✅ Intelligent Query Routing - LangGraph agent decides between local KB and web search
✅ Web Search Integration - Tavily API integration for current information
✅ Self-Learning Knowledge Base - Automatically saves web search results for future queries
✅ Conversation Memory - Multi-turn dialogue with context understanding
✅ Citation Tracking - All answers include source references
✅ Relevance Checking - LLM validates search results before answering

Planned

⏳ Streamlit web interface
⏳ Document upload functionality
⏳ Research report generation
⏳ Cloud deployment

🏗️ Architecture

System Flow

User Query
    ↓
Check Knowledge Base (Vector Search)
    ↓
Relevance Check (LLM)
    ↓
├─→ Relevant? → Generate Answer → END
└─→ Not Relevant? → Web Search → Generate Answer → Save to KB → END

State Machine (LangGraph)

The agent uses a finite state machine to orchestrate decision-making:

States:

check_kb - Search vector database for relevant information
search_web - Query Tavily API for current information
generate - Synthesize answer from sources with LLM
save_results - Store web results in knowledge base

Decision Logic:

Conditional routing based on relevance scores
Context-aware query understanding
Automatic knowledge base expansion

🛠️ Tech Stack

Component	Technology
Agent Framework	LangGraph
LLM Integration	LangChain
Language Model	OpenAI GPT-3.5-turbo
Vector Database	ChromaDB
Embeddings	OpenAI text-embedding-3-small
Web Search	Tavily API
Language	Python 3.x

📁 Project Structure

langgraph-research-assistant/
├── agent.py                 # Main LangGraph agent with state machine
├── document_loader.py       # Document ingestion and text chunking
├── vector_store.py         # Vector database operations
├── persistent_store.py     # Persistent ChromaDB management
├── rag_chain.py           # RAG pipeline implementation
├── test_llm.py            # LLM connection testing
├── documents/             # Source documents (gitignored)
├── chroma_db/            # Vector database storage (gitignored)
├── requirements.txt      # Python dependencies
└── .env                 # API keys (gitignored)

🚀 Setup

Prerequisites

Python 3.8+
OpenAI API key
Tavily API key

Installation

Clone the repository

git clone https://github.com/yourusername/langgraph-research-assistant.git
cd langgraph-research-assistant

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Configure API keys

Create a .env file in the project root:

OPENAI_API_KEY=your-openai-key-here
TAVILY_API_KEY=your-tavily-key-here

Get your API keys:

OpenAI: https://platform.openai.com/api-keys
Tavily: https://tavily.com

Add documents (optional)

Place any .txt files in the documents/ folder. The agent will automatically index them on first run.

💻 Usage

Interactive Chat Mode

python agent.py

Example conversation:

💬 You: What is quantum computing?
🤖 Assistant: Quantum computing is a type of computing that utilizes quantum 
physics to access new computational abilities...

💬 You: What are its main applications?
🤖 Assistant: The main applications include solving complex optimization 
problems, simulating quantum systems, enhancing machine learning...

The agent understands context - "its" refers to quantum computing from the previous question.

Testing Individual Components

# Test LLM connection
python test_llm.py

# Test document loading
python document_loader.py

# Test vector store
python vector_store.py

# Test RAG pipeline
python rag_chain.py

🧠 How It Works

1. Document Ingestion

Documents are loaded from the documents/ folder
Text is split into 500-character chunks with 50-character overlap
Chunks are converted to embeddings and stored in ChromaDB

2. Query Processing

When you ask a question:

Query is converted to an embedding vector
Semantic search finds top 3 most similar chunks
LLM evaluates if chunks can answer the question

3. Intelligent Routing

If KB has relevant info:

Generate answer from retrieved chunks
Cite sources

If KB lacks info:

Search web via Tavily API
Generate answer from web results
Save results to KB for future queries

4. Conversation Memory

Last 4 messages (2 Q&A pairs) stored in state
Context passed to relevance checks and answer generation
Enables follow-up questions with pronouns

📊 Example Workflow

First Query: "What is quantum computing?"

Check KB → Not found → Web search → Generate answer → Save to KB

Second Query (same topic): "What is quantum computing?"

Check KB → Found! → Generate answer (no web search needed)

Follow-up Query: "What are its applications?"

Check KB → Found (understands "its" = quantum computing) → Generate answer

🎓 Key Learning Concepts

This project demonstrates:

LangGraph State Machines - Finite state machines for agent orchestration
RAG (Retrieval Augmented Generation) - Combining retrieval with LLM generation
Vector Databases - Semantic search with embeddings
Agent Decision Making - Conditional logic based on state
Persistent Storage - ChromaDB for cross-session memory
Multi-turn Conversations - Context tracking across messages

🔧 Configuration

Chunk Size

Adjust in document_loader.py:

chunk_size=500,      # Characters per chunk
chunk_overlap=50     # Overlap for context preservation

Search Results

Modify in agent.py:

k=3  # Number of chunks to retrieve
max_results=3  # Number of web search results

LLM Model

Change in agent.py:

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
# Or use: "gpt-4", "gpt-4-turbo", etc.

🚧 Future Enhancements

Streamlit web interface for easier interaction
Document upload through UI
Export conversation history
Multi-document research reports
Support for PDF, DOCX, and other file formats
Advanced citation formatting
Query history and analytics
Multi-language support

🤝 Contributing

This is a learning project, but suggestions are welcome! Feel free to:

Open issues for bugs or feature requests
Submit pull requests with improvements
Share your own implementations

📄 License

MIT License - feel free to use this project for learning or as a foundation for your own work.

🙏 Acknowledgments

Built following LangChain and LangGraph best practices
Inspired by modern RAG architectures
Uses OpenAI's GPT models and embeddings
Tavily for intelligent web search

Author: [Samik Kundu]
GitHub: @samik-k21
Project Link: github.com/yourusername/langgraph-research-assistant

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
document_loader.py		document_loader.py
persistent_store.py		persistent_store.py
rag_chain.py		rag_chain.py
requirements.txt		requirements.txt
test_llm.py		test_llm.py
vector_store.py		vector_store.py

Folders and files

Latest commit

History

Repository files navigation

LangGraph Research Assistant

🎯 Project Overview

✨ Features

Implemented

Planned

🏗️ Architecture

System Flow

State Machine (LangGraph)

🛠️ Tech Stack

📁 Project Structure

🚀 Setup

Prerequisites

Installation

💻 Usage

Interactive Chat Mode

Testing Individual Components

🧠 How It Works

1. Document Ingestion

2. Query Processing

3. Intelligent Routing

4. Conversation Memory

📊 Example Workflow

🎓 Key Learning Concepts

🔧 Configuration

Chunk Size

Search Results

LLM Model

🚧 Future Enhancements

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages