E-Commerce Product Discovery RAG Agent

A learning project exploring how RAG (Retrieval-Augmented Generation) works with product data. Great for understanding LangChain basics.

What This Is

A hands-on tutorial showing how to:

Connect an LLM to your own data (not just its training data)
Use vector search to find relevant information
Build a basic question-answering system with LangChain

Example query it handles:

"I'm vegan, training for a marathon, need energy, low-carb"

The system retrieves relevant products and generates a response citing customer reviews.

What this is NOT:

⚠️ Not production-ready (no error handling, security, or compliance)
⚠️ Not a replacement for proper e-commerce search
⚠️ Not suitable for health/medical recommendations without legal review

Use this to: Learn RAG concepts, experiment with LangChain, prototype ideas.

📋 What's Implemented

LangChain Components

DocumentLoaders: Custom JSON loader converting product catalog to LangChain Documents
Embeddings: HuggingFace sentence-transformers/all-MiniLM-L6-v2 (90MB model)
VectorStore: FAISS for efficient similarity search with persistent storage
RetrievalQA Chain: Complete RAG pipeline with custom prompt template
LLM: Google Gemini 2.5 Flash (free tier: 1,500 requests/day)

Features Demonstrated

Multi-constraint retrieval: Handles queries with multiple requirements (e.g., vegan + low-carb + energy + marathon training)
Source citation: Shows which products were retrieved and used in the response
Custom prompting: System prompt guides LLM to act as "Product Expert" with safety disclaimers
Customer review integration: Cites relevant reviews from similar use cases
Persistent vector store: Saves FAISS index to disk for fast subsequent runs

🗂️ Project Structure

langchain-sample/
├── product_catalog.json    # Synthetic product data (8 products)
├── rag_agent.py           # Main RAG agent implementation
├── test_agent.py          # Full RAG demo (3 test queries)
├── test_gemini_simple.py  # Quick API test (verify setup works)
├── requirements.txt       # Python dependencies
├── .env.example          # Environment variable template
├── LICENSE               # MIT License
├── README.md             # This file
└── vector_store/         # FAISS index (created on first run)

⚡ Quick Start (5 minutes)

Option A: Jump Right In (Full Demo)

# 1. Clone and navigate
git clone <your-repo-url>
cd langchain-sample

# 2. Setup environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# 3. Get FREE Gemini API key (30 seconds)
# Visit: https://aistudio.google.com/app/apikey
# Click "Create API Key" → Copy it

# 4. Set API key
export GOOGLE_API_KEY='your-key-here'

# 5. Run the full demo!
python test_agent.py

Expected runtime: ~2 minutes first run (downloads models), ~10 seconds after

Option B: Test First (Recommended for Learners)

Prefer to verify things work step-by-step? Start here:

# After steps 1-4 above, test your API key first:
python test_gemini_simple.py

What this does:

✅ Verifies your API key works
✅ Shows available Gemini models
✅ Tests a simple query (10 seconds)
✅ Builds confidence before the full RAG system

Then run the full demo:

python test_agent.py

Why two scripts?

test_gemini_simple.py - Minimal test, isolates API issues
test_agent.py - Full RAG demo with vector search

🚀 Detailed Setup Instructions

1. Install Dependencies

# Create a virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install required packages
pip install -r requirements.txt

Requirements:

Python 3.9+ (3.10+ recommended)
~2GB disk space (for models)
Internet connection (first run only)

2. Get Google Gemini API Key (FREE)

Visit https://aistudio.google.com/app/apikey
Sign in with Google account
Click "Create API Key"
Copy the key (starts with AIza...)

Free Tier Limits:

✅ 1,500 requests/day
✅ 1 million tokens/day
✅ No credit card required

3. Set API Key

export GOOGLE_API_KEY='your-api-key-here'

Note: This sets the key for your current terminal session only. You'll need to re-export it if you open a new terminal.

Security tip: Never commit API keys to git. The export method keeps your key out of files.

4. Run the Test Script

python test_agent.py

What happens:

Downloads embedding model (~90MB, first run only)
Loads 8 products from catalog
Creates vector embeddings
Saves FAISS index to disk
Runs 3 test queries
Shows personalized recommendations

Performance:

First run: ~2 minutes (model download)
Subsequent runs: ~10 seconds
Cost: $0.00 (free tier)

📝 Example Test Query

Query:

"I'm a vegan athlete. What products would you recommend for overall performance and recovery?"

Expected Behavior:

Retrieves vegan products suitable for athletes
Highlights customer reviews from athletes
Provides professional, non-medical recommendations
Addresses all query constraints (vegan, athlete, performance, recovery)

Use Case: This demonstrates multi-constraint product search in e-commerce, where customers have specific dietary needs, fitness goals, and preferences.

🏗️ Architecture

Data Flow

Product Catalog (JSON)
    ↓
Document Loader (custom)
    ↓
Text Splitter (RecursiveCharacterTextSplitter)
    ↓
Embeddings (HuggingFace)
    ↓
Vector Store (FAISS)
    ↓
Retriever (similarity search, k=4)
    ↓
RetrievalQA Chain (with custom prompt)
    ↓
LLM (GPT-3.5-turbo)
    ↓
Personalized Response

Key Components

1. Product Catalog (`product_catalog.json`)

8 synthetic health & wellness products with realistic attributes
Includes: name, category, price, benefits, dietary attributes
Customer reviews with specific use cases (marathon training, vegan diet, etc.)
Q&A snippets for additional context
Note: This is sample data for demonstration purposes

2. RAG Agent (`rag_agent.py`)

ProductDiscoveryAgent class encapsulating all RAG functionality
Methods:
- load_and_process_catalog(): Converts JSON to LangChain Documents
- create_vector_store(): Builds FAISS index with embeddings
- setup_qa_chain(): Configures RetrievalQA with custom prompt
- query(): Processes customer questions and returns recommendations

3. Custom Prompt Template

template = """You are an expert Product Advisor for a health & wellness e-commerce platform...

IMPORTANT GUIDELINES:
1. Only recommend products based on the provided context
2. Consider dietary restrictions, health goals, and use cases
3. Highlight relevant customer reviews
4. Be honest about product limitations
5. Do NOT provide medical advice
6. Include specific details: price, benefits, customer feedback
..."""

🧪 Testing

The test_agent.py script includes:

Required Test: Marathon training + vegan + low-carb + energy query
Additional Tests:
- Joint health for runners
- Comprehensive vegan athletic support

Each test demonstrates:

Multi-constraint retrieval
Context-aware recommendations
Professional response quality

📊 Product Catalog Highlights

Product	Category	Key Features	Price
Vegan Protein Powder	Sports Nutrition	Low carb, energy, marathon reviews	$29.99
Vegan BCAA	Sports Nutrition	Zero carb, endurance, marathon reviews	$44.99
Electrolyte Mix	Sports Nutrition	Zero carb, vegan, marathon reviews	$44.99
Iron Plus	Supplements	Vegan, energy, runner reviews	$19.99
Omega-3 Fish Oil	Supplements	Joint support, athletic recovery	$39.99

🔧 Customization

Using Different Embeddings

from langchain_community.embeddings import OpenAIEmbeddings

agent = ProductDiscoveryAgent()
agent.embeddings = OpenAIEmbeddings()

Using Different Vector Stores

from langchain_community.vectorstores import Chroma

# Modify create_vector_store() method to use Chroma instead of FAISS

Using Different LLMs

from langchain_community.chat_models import ChatAnthropic

# In setup_qa_chain(), replace ChatOpenAI with ChatAnthropic
llm = ChatAnthropic(model="claude-3-sonnet-20240229")

🎓 Learning Outcomes

This project demonstrates:

RAG Architecture: Complete implementation of retrieval-augmented generation for e-commerce
LangChain Mastery: Proper use of Documents, Embeddings, VectorStores, Chains
Prompt Engineering: Safety-focused, domain-specific prompts for product recommendations
Vector Databases: FAISS for efficient similarity search
Production Considerations: Persistent storage, error handling, modularity
Multi-Constraint Search: Handling complex queries with multiple filters (dietary, price, use-case)
Customer Review Integration: Leveraging user-generated content for personalized recommendations

📚 Additional Resources

🤝 Extending This Project

Ideas for enhancement:

Expand the catalog: Add more products with diverse categories
Add filters: Implement price range, brand, rating filters
Conversation memory: Enable multi-turn conversations with context
Different domains: Adapt for electronics, fashion, or other e-commerce verticals
Hybrid search: Combine semantic search with traditional filters
A/B testing: Compare different embedding models or LLMs
User feedback loop: Incorporate user ratings to improve recommendations

📄 License

MIT License - See LICENSE file for details.

This is a demonstration project for educational purposes. The product data is synthetic and for learning only.

Built with LangChain 🦜🔗 | Powered by Google Gemini 2.5 Flash 🤖 | Vector Search by FAISS 🔍

💡 About This Project

This project was created as a learning exercise to explore:

How RAG systems work conceptually
Basic LangChain patterns and components
Vector search with embeddings
Prompt engineering techniques

Disclaimer: This is educational code. Production use would require:

Proper error handling and validation
Security review (especially for health/medical content)
Legal compliance (FDA, FTC, GDPR, etc.)
Performance optimization and monitoring
User testing and feedback loops

Feel free to fork and experiment!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
product_catalog.json		product_catalog.json
rag_agent.py		rag_agent.py
requirements.txt		requirements.txt
test_agent.py		test_agent.py
test_gemini_simple.py		test_gemini_simple.py

Folders and files

Latest commit

History

Repository files navigation

E-Commerce Product Discovery RAG Agent

What This Is

📋 What's Implemented

LangChain Components

Features Demonstrated

🗂️ Project Structure

⚡ Quick Start (5 minutes)

Option A: Jump Right In (Full Demo)

Option B: Test First (Recommended for Learners)

🚀 Detailed Setup Instructions

1. Install Dependencies

2. Get Google Gemini API Key (FREE)

3. Set API Key

4. Run the Test Script

📝 Example Test Query

🏗️ Architecture

Data Flow

Key Components

1. Product Catalog (product_catalog.json)

2. RAG Agent (rag_agent.py)

3. Custom Prompt Template

🧪 Testing

📊 Product Catalog Highlights

🔧 Customization

Using Different Embeddings

Using Different Vector Stores

Using Different LLMs

🎓 Learning Outcomes

📚 Additional Resources

🤝 Extending This Project

📄 License

💡 About This Project

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Product Catalog (`product_catalog.json`)

2. RAG Agent (`rag_agent.py`)

Packages