A powerful Retrieval-Augmented Generation (RAG) system for processing multiple PDFs and delivering accurate, context-aware responses to user queries. Built in Python, it features GPU-accelerated embeddings, a semantic search engine, and the Mistral model via Ollama, wrapped in an intuitive Streamlit interface.
- 📄 Multi-PDF Support – Upload and process several PDFs at once.
- ⚡ GPU-Accelerated Embeddings – Leverage NVIDIA GPUs for fast processing.
- 🔎 Semantic Search – Uses FAISS for efficient context retrieval with source tracking.
- 🌐 Streamlit Interface – Clean and interactive web UI for document upload, chat, and performance.
- 🧩 Highly Configurable – Adjust chunk size, overlap, batch size, and worker threads.
- 📊 Real-time Metrics – Monitor chunking speed, memory usage, and system stats.
- 💬 Chat History Export – Save and download all chat sessions in JSON format.
| Field | Details |
|---|---|
| Repo Name | Advanced-RAG-Chatbot |
| License | MIT |
| Language | Python |
| Model | Mistral via Ollama |
| Embeddings | intfloat/e5-small (HuggingFace) |
| Interface | Streamlit |
| Vector Store | FAISS |
| PDF Processor | PyMuPDF |
| Status | 🚧 Actively maintained |
| Type | Minimum | Recommended |
|---|---|---|
| CPU | 4-core | 8-core |
| RAM | 8 GB | 16 GB |
| Storage | 10 GB | 20 GB |
| GPU (Optional) | ❌ | ✅ NVIDIA (GTX 1060 or higher) |
- Python 3.8+
- Git
- CUDA Toolkit 11.2+ (for GPU acceleration)
- Ollama (≥ 0.1.0)
# Download Python from https://python.org
python --version # ✅ Ensure it's ≥ 3.8# Download from https://git-scm.com
git --versiongit clone https://github.com/milind899/Advanced-RAG-Chatbot.git
cd Advanced-RAG-Chatbotpython -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activatepip install -r requirements.txtstreamlit– Frontend UIlangchain– RAG orchestrationtorch– Model + CUDA accelerationtransformers– Embedding generationfaiss-cpu– Vector similarity searchPyMuPDF– PDF parser
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull mistral
ollama servecurl http://localhost:11434/api/tags# Verify CUDA
nvidia-smi
# Enable GPU
unset OLLAMA_NO_CUDA
# Test with PyTorch
python -c "import torch; print(torch.cuda.is_available())"streamlit run app.py🔗 Open your browser at: http://localhost:8501
-
Open the app in your browser.
-
Drag-and-drop or browse to upload PDFs.
-
Adjust settings from the sidebar:
Chunk Size(default: 500)Overlap(default: 50)Workers(CPU count default)Batch Size(default: 100)
-
Click Process Documents.
- Go to the Chat with Your Documents section.
- Type a question and hit Send.
- View responses with source links.
- Review chat history at the bottom.
- Export your chats via the Export JSON button.
- View chunks/sec, total time in Performance Metrics.
- Sidebar shows GPU & system usage.
- Troubleshoot with real-time logs.
Advanced-RAG-Chatbot/
├── app.py # Streamlit UI
├── rag_backend.py # Core RAG logic
├── requirements.txt # Dependencies
├── LICENSE # MIT License
├── faiss_index/ # Generated vector store
├── embedding_cache/ # Cached embeddings
| Parameter | Description | Default |
|---|---|---|
chunk_size |
Text chunk size | 500 |
chunk_overlap |
Overlap between chunks | 50 |
max_workers |
Concurrent threads | CPU count |
batch_size |
Documents per embedding batch | 100 |
embedding_model |
HuggingFace model | intfloat/e5-small |
cache_embeddings |
Use embedding cache | True |
| Problem | Fix |
|---|---|
| ❌ Ollama not responding | Ensure ollama serve is running and model is pulled. |
| ❌ CUDA not available | Install CUDA Toolkit, check nvidia-smi, enable with unset. |
| 💥 Memory crash | Reduce batch_size or chunk_size, clear cache folders. |
| 🐢 Slow performance | Enable GPU, increase max_workers, tune chunking strategy. |
- 🔧 Use the Help & Tips section in the app.
- 🐛 File issues or suggestions on GitHub Issues
- 📄 Refer to
rag_backend.pyfor logic and model details.
This project is licensed under the MIT License. See LICENSE for more information.