🌐 LangGraph RAG Web Agent

LangGraph RAG Web Agent is an intelligent, agentic web navigation and intelligence extraction application. Built using modern RAG (Retrieval-Augmented Generation) architectures and LangGraph, it empowers users to crawl, semantic-search, and chat with entire websites effortlessly.

Whether it's dynamically extracting contact information, summarizing key services, or pulling deeply nested pricing data, this tool transforms the static web into an interactive, queryable database.

✨ Key Features

🕸️ Intelligent Web Crawling & Parsing: Automatically traverses websites based on user-defined parameters (e.g., Target URL, Crawl Depth) and extracts text structure, capturing semantic boundaries (headers, paragraphs, lists) out of raw HTML.
🗺️ Interactive Sitemap Visualization: Generates a visually appealing, interactive graph of the crawled website architecture using pyvis, giving users immediate insight into the site's structure.
🧠 Advanced RAG Architecture: Uses a hierarchical embedding approach. Instead of chunking blindly, it embeds content contextually, preserving the hierarchy of web pages for high-accuracy retrieval.
🛠️ Agentic Capabilities (LangGraph): The WebNavigatorAgent determines when to read more pages, when to use the sitemap tool, and when to synthesize an answer. It acts autonomously to fulfill user prompts like "find contact info."
🔄 Multi-LLM Provider Support: Flexible architecture supporting API keys from Google (Gemini), OpenAI, and OpenRouter, along with Local (HuggingFace) embedding fallback to minimize costs.
💻 Glassmorphism UI: A sleek, premium, and responsive user interface built in Streamlit, featuring an interactive agent chat, a page explorer, and one-click quick action buttons.

🏗️ Architecture & Tech Stack

This project demonstrates strong software engineering patterns, modularity, and a deep understanding of Generative AI integration.

Frontend: Streamlit (with custom CSS for a premium UI)
Orchestration & Agent: LangChain and LangGraph
Embeddings & Vector Store: Chromadb (via LangChain integrations)
LLM Providers: google-generativeai, openai, langchain-anthropic (via OpenRouter)
Web Scraping & Parsing: requests, BeautifulSoup4
Visualization: pyvis (Interactive HTML network graphs)

Core Modules

app.py: The Streamlit entry point, managing UI state, authentication, and layouts.
agent.py: Implementing the LangGraph-based Tool Calling Agent that interprets queries and executes tools.
crawler.py: Handles network requests, honoring crawl depths, and parsing HTML with SectionParser.
embeddings.py & retriever.py: Manages vectorization of text chunks and handles hierarchical semantic search.
sitemap.py: Converts crawl graphs into visual network representations.

🚀 Getting Started

Prerequisites

Python 3.10+
API Keys for your preferred LLM provider (Gemini, OpenAI, or OpenRouter).

Installation

Clone the repository:

git clone https://github.com/yourusername/LangGraph-RAG-Web-Agent.git
cd LangGraph-RAG-Web-Agent

Create and activate a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Configuration (Optional): You can provide a .env file at the root of the project with default keys, but the application also supports secure API key entry directly through the sidebar UI at runtime.
```
GOOGLE_API_KEY="your-gemini-key"
```

Running the App

Start the Streamlit server:

streamlit run app.py

The app will automatically open in your default browser at http://localhost:8501.

💡 Usage Guide

Authenticate: Open the sidebar and select your preferred LLM and Embedding provider. Enter the required API key(s).
Crawl a Website: Enter a target URL (e.g., https://example.com) and choose a crawl depth. Click "Start Crawling".
Explore:
- Interactive Sitemap: View the structure of the scanned web pages visually.
- Page Explorer: Browse through individually extracted sections and headings of pulled pages.
Chat & Action: Use the Agent Chat to ask natural language questions about the site. Use the Quick Action buttons to instantly process repetitive tasks (e.g., Extracting Pricing, Finding Contact Info).

👨‍💻 Note for Recruiters & Stakeholders

This application was built to showcase the capability of integrating modern Large Language Models within standard Software Engineering practices. It highlights:

Systematic Problem Solving: Breaking down web crawling into autonomous agent tools.
UX/UI Implementation: Building user-friendly interfaces around complex AI concepts, managing loading states, and handling streaming interactions gracefully in Python.
Adaptability: Allowing immediate swapping between Open-Source and Proprietary foundational models.
Data Engineering: Processing raw, messy HTML into clean, semantically chunked markdown documents for Vector Storage.

Expect high maintainability, documented code, and an architecture ready to be extended with further tools (e.g., automated form-filling, scheduled monitoring).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
agent.py		agent.py
app.py		app.py
crawler.py		crawler.py
embeddings.py		embeddings.py
main.py		main.py
parser.py		parser.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
retriever.py		retriever.py
sitemap.py		sitemap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 LangGraph RAG Web Agent

✨ Key Features

🏗️ Architecture & Tech Stack

Core Modules

🚀 Getting Started

Prerequisites

Installation

Running the App

💡 Usage Guide

👨‍💻 Note for Recruiters & Stakeholders

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌐 LangGraph RAG Web Agent

✨ Key Features

🏗️ Architecture & Tech Stack

Core Modules

🚀 Getting Started

Prerequisites

Installation

Running the App

💡 Usage Guide

👨‍💻 Note for Recruiters & Stakeholders

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages