https-deeplearning-ai · rogeriorm · Nov 11, 2025 · Nov 13, 2025 · Nov 14, 2025 · Nov 14, 2025
diff --git a/.playwright-mcp/current-ui.png b/.playwright-mcp/current-ui.png
diff --git a/.playwright-mcp/final-result.png b/.playwright-mcp/final-result.png
diff --git a/.playwright-mcp/updated-ui.png b/.playwright-mcp/updated-ui.png
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,234 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Running Commands
+
+**IMPORTANT:** This project uses `uv` as the package manager. **Always use `uv` commands - never use `pip` directly.**
+
+### Start the Application
+```bash
+./run.sh
+```
+This starts the FastAPI server on port 8000 with auto-reload enabled. The application will:
+1. Load course documents from `docs/` folder
+2. Process them into 800-char chunks with 100-char overlap
+3. Create/load ChromaDB embeddings (first run downloads 90MB embedding model)
+4. Serve the web interface at http://localhost:8000
+
+The `run.sh` script uses `uv run` internally.
+
+### Manual Start (Development)
+```bash
+cd backend
+uv run uvicorn app:app --reload --port 8000
+```
+
+### Install Dependencies
+```bash
+uv sync
+```
+
+### Add New Dependencies
+```bash
+# Add a new package
+uv add package-name
+
+# Add a dev dependency
+uv add --dev package-name
+```
+
+### Run Python Scripts
+```bash
+# Always use uv run to execute Python code
+uv run python script.py
+
+# NOT: python script.py
+# NOT: pip install ...
+```
+
+### Environment Setup
+Create `.env` file with:
+```
+ANTHROPIC_API_KEY=sk-ant-api03-...
+```
+
+## Architecture Overview
+
+### RAG System Flow (Tool-Based Architecture)
+
+This is a **tool-based RAG system** where Claude decides when to search, not a traditional "always search" RAG.
+
+**Query Processing Flow:**
+1. User query → FastAPI endpoint (`/api/query`)
+2. RAG System orchestrates the flow
+3. **First Claude API call**: Claude receives query + tool definition, decides if search is needed
+4. If search needed: Tool execution → Vector search → Format results
+5. **Second Claude API call**: Claude receives search results, synthesizes final answer
+6. Response + sources returned to frontend
+
+**Key Insight:** There are **two Claude API calls per query** - one for decision-making, one for synthesis.
+
+### Component Architecture
+
+**Frontend** (`frontend/`)
+- Vanilla JS (no framework)
+- Uses `marked.js` for markdown rendering
+- Session-based conversation tracking
+- Displays collapsible source citations
+
+**Backend** (`backend/`)
+- **app.py**: FastAPI server, REST endpoints, startup document loading
+- **rag_system.py**: Main orchestrator - coordinates all components
+- **ai_generator.py**: Claude API wrapper with tool calling support
+  - System prompt defines search behavior (one search max, no meta-commentary)
+  - Handles two-phase tool execution (request → execute → synthesize)
+- **vector_store.py**: ChromaDB interface with two collections
+  - `course_catalog`: For fuzzy course name matching (e.g., "MCP" → full title)
+  - `course_content`: Actual content chunks for semantic search
+- **document_processor.py**: Parses structured course documents into chunks
+  - Sentence-based chunking (preserves semantic boundaries)
+  - Adds context prefixes: "Course X Lesson Y content: ..."
+- **search_tools.py**: Tool abstraction layer
+  - `CourseSearchTool`: Implements search with course/lesson filtering
+  - `ToolManager`: Registers and routes tool calls from Claude
+- **session_manager.py**: Conversation history (max 2 exchanges by default)
+- **config.py**: Centralized configuration (see below)
+
+### Data Models (`models.py`)
+
+**Important:** `Course.title` is used as the unique identifier throughout the system.
+
+- **Course**: Contains title (ID), instructor, link, and list of Lessons
+- **Lesson**: Contains lesson_number, title, and link
+- **CourseChunk**: Contains content, course_title (FK), lesson_number, chunk_index
+
+### Vector Store Design
+
+**Two-Collection Architecture:**
+1. **course_catalog** collection:
+   - Purpose: Fuzzy course name resolution
+   - Documents: "Course: {title} taught by {instructor}" + lesson entries
+   - Used when user says "MCP course" → resolves to full title
+
+2. **course_content** collection:
+   - Purpose: Semantic search of actual content
+   - Documents: Text chunks with context prefixes
+   - Metadata: course_title, lesson_number, chunk_index, links
+   - Filtering: Can filter by exact course_title AND/OR lesson_number
+
+**Search Flow:**
+1. If course_name provided: Query `course_catalog` to resolve fuzzy name
+2. Build ChromaDB filter: `{"$and": [{"course_title": "X"}, {"lesson_number": Y}]}`
+3. Query `course_content` with semantic search + filters
+4. Return top 5 chunks by cosine similarity
+
+### Document Format
+
+Course documents in `docs/` must follow this structure:
+```
+Course Title: [title]
+Course Link: [url]
+Course Instructor: [name]
+
+Lesson 0: [title]
+Lesson Link: [url]
+[content...]
+
+Lesson 1: [title]
+Lesson Link: [url]
+[content...]
+```
+
+The parser (`document_processor.py`) extracts this metadata and creates chunks with context.
+
+### Configuration (`backend/config.py`)
+
+Key settings to be aware of:
+- `ANTHROPIC_MODEL`: "claude-sonnet-4-20250514" (Claude Sonnet 4)
+- `EMBEDDING_MODEL`: "all-MiniLM-L6-v2" (384-dim vectors)
+- `CHUNK_SIZE`: 800 chars (with CHUNK_OVERLAP: 100 chars)
+- `MAX_RESULTS`: 5 search results returned to Claude
+- `MAX_HISTORY`: 2 conversation exchanges kept in context
+- `CHROMA_PATH`: "./chroma_db" (persistent vector storage)
+
+### AI System Prompt Behavior
+
+The system prompt in `ai_generator.py` defines critical behavior:
+- **Use search tool ONLY for course-specific questions**
+- **One search per query maximum** (prevents multiple searches)
+- **No meta-commentary** (no "based on the search results" phrases)
+- Responses must be: brief, educational, clear, example-supported
+
+### Session Management
+
+Sessions track conversation history:
+- Session ID created on first query (e.g., "session_1")
+- Stores last `MAX_HISTORY * 2` messages (user + assistant pairs)
+- History formatted as: "User: ...\nAssistant: ...\n..." for context
+- Appended to system prompt on subsequent queries in same session
+
+### API Endpoints
+
+**POST /api/query**
+- Request: `{ "query": "...", "session_id": "session_1" (optional) }`
+- Response: `{ "answer": "...", "sources": ["..."], "session_id": "..." }`
+- Creates session if not provided
+
+**GET /api/courses**
+- Response: `{ "total_courses": 4, "course_titles": ["..."] }`
+- Used by frontend sidebar
+
+### ChromaDB Persistence
+
+- First run: Downloads embedding model, creates collections, processes documents (~30-60 seconds)
+- Subsequent runs: Loads existing ChromaDB from `./chroma_db` (fast startup)
+- Documents only reprocessed if course title doesn't exist in catalog
+- To rebuild: Delete `./chroma_db` folder and restart
+
+### Development Notes
+
+**Adding New Documents:**
+1. Place `.txt`, `.pdf`, or `.docx` files in `docs/` folder
+2. Follow the document format structure above
+3. Restart server - documents auto-loaded on startup
+4. Check logs for: "Added new course: X (Y chunks)"
+
+**Modifying Chunk Size:**
+- Edit `config.py`: `CHUNK_SIZE` and `CHUNK_OVERLAP`
+- Delete `./chroma_db` folder to force reprocessing
+- Restart application
+
+**Debugging Search:**
+- Search tool tracks sources in `last_sources` attribute
+- Sources shown in UI as collapsible section
+- Check `vector_store.py` for filter logic
+
+**Conversation Context:**
+- Modify `MAX_HISTORY` in `config.py` to change context window
+- History is string-formatted and prepended to system prompt
+- Trade-off: More history = more context but higher token usage
+
+### Tool-Based vs Traditional RAG
+
+**This system is NOT a traditional RAG** where every query triggers a search. Instead:
+- Claude analyzes each query and decides if search is warranted
+- General knowledge questions answered without search
+- Course-specific questions trigger tool use
+- This reduces unnecessary vector searches and improves response quality
+
+### Frontend-Backend Contract
+
+**Frontend maintains:**
+- Current session_id in memory
+- Sends with each query for conversation continuity
+
+**Backend returns:**
+- answer: The synthesized response from Claude
+- sources: List of "Course Title - Lesson N" strings for UI
+- session_id: Same or newly created session ID
+
+**Source Tracking:**
+- Search tool stores sources during execution
+- RAG system retrieves after AI generation completes
+- Sources reset after each query to prevent leakage
diff --git a/DOCUMENTATION_STRUCTURE.md b/DOCUMENTATION_STRUCTURE.md
@@ -0,0 +1,135 @@
+# RAG Chatbot Documentation Structure
+
+## Generated Files
+
+### Standalone Mermaid Diagrams
+1. **architecture-diagram.mermaid** - 4-layer system architecture (high-level)
+2. **sequence-diagram.mermaid** - 21-step end-to-end user flow
+3. **rag-deep-dive.mermaid** - RAG & Storage component architecture
+4. **rag-mid-level-sequence.mermaid** - 17-step RAG processing flow with decision branching
+
+### Interactive HTML Documentation
+**architecture-diagram.html** - Complete 4-tab interactive documentation
+
+## Tab Organization
+
+### Tab 1: System Architecture (High-Level)
+- **Purpose**: Overall system structure
+- **Diagram**: 4-layer vertical architecture
+  - Layer 1: Frontend (Vanilla HTML/CSS/JS)
+  - Layer 2: API (FastAPI)
+  - Layer 3: RAG/AI (Claude + Tools)
+  - Layer 4: Database/Storage (ChromaDB)
+- **Overview**: Component descriptions for each layer
+
+### Tab 2: System User Flow (High-Level)
+- **Purpose**: End-to-end user journey
+- **Diagram**: 21-step sequence diagram
+  - Steps 1-3: User interaction
+  - Steps 4-6: Session & context
+  - Steps 7-16: RAG processing
+  - Steps 17-21: Response & display
+- **Overview**: Flow breakdown by phase
+
+### Tab 3: RAG Components (Deep Dive)
+- **Purpose**: Internal RAG & Storage architecture
+- **Diagram**: Component architecture
+  - RAG/AI Layer: 5 components (RAG System, AI Generator, Tool Manager, Search Tool, Session Manager)
+  - Storage Layer: 6 components (ChromaDB, 2 Collections, Document Processor, Chunking, Files)
+  - Shows data flows and cross-layer connections
+- **Overview**: Component descriptions and internal data flows
+
+### Tab 4: RAG Processing Flow (Deep Dive)
+- **Purpose**: Detailed RAG internal processing
+- **Diagram**: 17-step mid-level sequence with decision branching
+  - Steps 1-4: Request & context loading
+  - Steps 5-6: AI decision point (search vs. direct response)
+  - Steps 7-13: Conditional search path
+  - Steps 14-17: Response & session management
+- **Overview**: 
+  - Flow breakdown by phase
+  - Search mechanics
+  - Data structures
+  - Processing pipeline
+  - Configuration details
+
+## Key Features
+
+### Diagram Characteristics
+- **Vertical stacking**: Forced top-to-bottom layout using explicit layer connections
+- **Color coding**: Consistent across all diagrams
+  - Blue (#e3f2fd): Frontend
+  - Orange (#fff3e0): API
+  - Green (#e8f5e9): RAG/AI
+  - Purple (#f3e5f5): Database/Storage
+- **Readable text**: Single-line labels, no overlapping
+- **Emojis**: Consistent visual markers for each component type
+
+### UX Design
+- **Overview-first layout**: Legend/breakdown appears ABOVE diagrams in all tabs
+- **Tabbed interface**: Smooth transitions between perspectives
+- **Responsive design**: Mobile-friendly layout
+- **Interactive navigation**: Easy switching between high-level and deep-dive views
+
+## Abstraction Levels
+
+### Level 1: System Overview (Tabs 1 & 2)
+- **Audience**: Stakeholders, product managers, new team members
+- **Focus**: What the system does and how users interact with it
+- **Diagrams**: 4-layer architecture + 21-step user flow
+
+### Level 2: RAG Deep Dive (Tabs 3 & 4)
+- **Audience**: Developers, architects, AI engineers
+- **Focus**: How RAG and storage layers work internally
+- **Diagrams**: Component architecture + 17-step processing flow with decision logic
+
+## Documentation Prompt Template
+
+**prompts/system-documentation-prompt.md** - Reusable template for future projects
+
+Contains:
+- Master prompt
+- Step-by-step execution guide
+- Quality checklist
+- Common pitfalls
+- Usage examples
+- Version history
+
+## Usage
+
+### View Documentation
+```bash
+# Open in browser
+open architecture-diagram.html
+```
+
+### Test Diagrams
+1. Visit https://mermaid.live/
+2. Paste contents of any .mermaid file
+3. Verify rendering
+
+### Reuse for Other Projects
+1. Read prompts/system-documentation-prompt.md
+2. Adapt master prompt to your codebase
+3. Follow 4-phase workflow:
+   - Phase 1: Codebase exploration
+   - Phase 2: Architecture diagram
+   - Phase 3: Sequence diagram
+   - Phase 4: HTML documentation
+
+## Success Criteria Met
+
+✅ All 4 layers visible in top-to-bottom stack  
+✅ No overlapping text in diagrams  
+✅ Component overview appears before diagrams  
+✅ Diagrams reflect actual codebase architecture  
+✅ Tabs work correctly with smooth transitions  
+✅ HTML renders properly in all modern browsers  
+✅ Mid-level abstraction provides balance between overview and details  
+✅ Decision points (AI search logic) clearly visible  
+
+---
+
+**Generated**: 2025-11-09  
+**Tool**: Claude Code + Mermaid.js  
+**Pattern**: 4-layer vertical architecture with RAG
diff --git a/RAG Chatbot System Architecture.pdf b/RAG Chatbot System Architecture.pdf