Skip to content

bazzi24/RAGEve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

130 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAGEve

Local-first RAG platform β€” Fast, private, no cloud required.

License Python FastAPI Next.js Ollama

Get Started Β· Configuration Β· Develop Β· Community Β· Contributing


Table of Contents

What is RAGEve?

RAGEve is a local-first RAG (Retrieval-Augmented Generation) platform built for developers and teams who want the power of RAG workflows without depending on external cloud services.

It combines Ollama for local LLM inference and embeddings, Qdrant as a high-performance vector database, and FastAPI + Next.js for a full-featured web interface. Everything runs on your own machine β€” no API keys, no data leaves your network.

Backend Architecture:

  • FastAPI for high-performance async APIs
  • Peewee ORM with a 27-table schema for persistent storage
  • MySQL (or SQLite for single-node) via Peewee + connection pooling (900 connections)
  • SQLAlchemy (temporary) for legacy chat history storage during migration
  • Qdrant for vector search with hybrid retrieval (dense + sparse)
  • Ollama for embeddings and LLM inference

RAGEve is designed for two audiences:

User Experience
Non-technical users git clone && ./scripts/run.sh β€” everything starts automatically
Developers ./scripts/backend.sh or manual uvicorn / npm run dev for full control

Demo

RAGEve

Start RAGEve locally and open http://localhost:3000:

git clone https://github.com/bazzi24/RAGEve.git
cd RAGEve
./scripts/run.sh

Tip: On first run, install.sh automatically installs uv, Ollama, pulls the required models (~8 GB), and starts Docker services. This takes about 5–10 minutes once, then subsequent starts are instant.


Latest Updates

  • 2026-04-27 Peewee migration complete β€” 27-table schema, new /dialogs and /knowledgebases APIs, transitional support for legacy routes
  • 2026-04-22 Enhanced PDF parsing β€” column detection, structured table extraction, hierarchical chunking, reading order optimization
  • 2026-04-03 Evaluation matrix (16-cell benchmark) + Qdrant hybrid search fix
  • 2026-04-01 9 production fixes: structured 500 handler, health checks, rate limiter proxy safety, request timeouts, streaming 404 fix, file upload limits, paginated datasets API
  • 2026-04-01 Chat history with MySQL/SQLite, session panel, per-agent conversations
  • 2026-03-28 Background HF dataset ingest with live progress tracking
  • 2026-03-26 Real-time streaming upload with per-batch progress stages
  • 2026-03-26 Cross-encoder reranking (sentence-transformers)
  • 2026-03-25 E2E test suite, conversation persistence

Key Features

Deep Document Understanding

  • Ingest PDFs, Word docs, Excel, CSV, images, and more
  • Enhanced PDF parsing: column detection, structured table extraction (markdown), heading hierarchy, reading order optimization
  • Adaptive chunking with quality scoring per profile (clean text, OCR noisy, table-heavy, code)
  • Intelligent text column selection for multi-column datasets
  • Hierarchical chunking preserves section context for better semantic search

Grounded Answers with Citations

  • Exact chunk references from source documents
  • Quality scores exposed to the LLM via enriched context
  • Session history-aware chat with up to 6 prior turns in context

Multiple Retrieval Strategies

  • Dense vector search via Ollama embeddings
  • Sparse keyword search
  • Hybrid fusion combining both with configurable weights
  • Cross-encoder reranking for improved precision

Flexible LLM Support

  • Any Ollama model as the chat backend
  • Any Ollama embedding model
  • Configurable temperature, top-k, top-p, and context window size per dialog (agent)

HuggingFace Integration

  • Browse, preview, and search HuggingFace datasets directly from the UI
  • Download datasets to local storage
  • Background ingest with real-time progress
  • Multi-config and multi-split support

Persistent Conversations

  • Sessions stored in MySQL via Peewee ORM (or SQLite for single-node)
  • Full conversation history per dialog (agent)
  • Thumbs up/down feedback on individual messages
  • Conversation context automatically injected into subsequent turns

Production-Ready Backend

  • Request ID tracing and structured error responses
  • CORS and API key authentication
  • Circuit breaker and retry logic for Ollama calls
  • Dependency health checks (/health pings Ollama and Qdrant)

Developer-Friendly

  • scripts/run.sh β€” everything in one command
  • scripts/backend.sh β€” backend only for technical users
  • Docker Compose for infrastructure (Qdrant + MySQL)
  • Full E2E and stress test suites

System Architecture

RAGEve follows a modern full-stack architecture with a local-first design:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Next.js Frontend (port 3000)           β”‚
β”‚  - Chat interface                                          β”‚
β”‚  - Knowledge base management                               β”‚
β”‚  - HuggingFace integration                                 β”‚
β”‚  - Dialog (agent) configuration                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚ HTTPS/HTTP
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FastAPI Backend (port 8000)                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Routes (API routers)                               β”‚   β”‚
β”‚  β”‚ β€’ /dialogs         β€” Dialog (agent) CRUD          β”‚   β”‚  ← NEW (Peewee)
β”‚  β”‚ β€’ /knowledgebases  β€” KB, document, file, task     β”‚   β”‚  ← NEW (Peewee)
β”‚  β”‚ β€’ /conversations   β€” Conversation + streaming     β”‚   β”‚  ← NEW (Peewee)
β”‚  β”‚ β€’ /chat            β€” Stateless RAG chat           β”‚   β”‚
β”‚  β”‚ β€’ /datasets        β€” Legacy (deprecated)          β”‚   β”‚
β”‚  β”‚ β€’ /agents          β€” Legacy (deprecated)          β”‚   β”‚
β”‚  β”‚ β€’ /chat_history    β€” Legacy (SQLAlchemy)          β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Services (Store pattern)                           β”‚   β”‚
β”‚  β”‚ β€’ DialogStore          β€” Dialog CRUD              β”‚   β”‚
β”‚  β”‚ β€’ KnowledgeBaseStore   β€” KB, Document, File, Taskβ”‚   β”‚
β”‚  β”‚ β€’ ConversationStore    β€” Conversation + messages  β”‚   β”‚
β”‚  β”‚ β€’ TenantUserStore      β€” Multi-tenancy           β”‚   β”‚
β”‚  β”‚ β€’ LLMStore             β€” LLM factory management  β”‚   β”‚
β”‚  β”‚ β€’ EvaluationStore      β€” RAG evaluation          β”‚   β”‚
β”‚  β”‚ β€’ ConnectorStore       β€” External connectors     β”‚   β”‚
β”‚  β”‚ β€’ CanvasStore          β€” Agent workflows         β”‚   β”‚
β”‚  β”‚ β€’ SystemStore          β€” System settings         β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Persistence Layer                                  β”‚   β”‚
β”‚  β”‚ β€’ Peewee ORM (27-table schema)           β”‚   β”‚  ← PRIMARY
β”‚  β”‚   Tables: User, Tenant, Knowledgebase, Document, β”‚   β”‚
β”‚  β”‚   File, Task, Dialog, Conversation, LLM, etc.    β”‚   β”‚
β”‚  β”‚ β€’ SQLAlchemy (legacy chat history)               β”‚   β”‚  ← TEMPORARY
β”‚  β”‚   Tables: chat_sessions, chat_messages           β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ RAG Pipeline (rag/retrieval/rag_pipeline.py)     β”‚   β”‚
β”‚  β”‚ 1. Embed query (dense + sparse)                  β”‚   β”‚
β”‚  β”‚ 2. Qdrant hybrid search with RRF                 β”‚   β”‚
β”‚  β”‚ 3. Optional cross-encoder reranking               β”‚   β”‚
β”‚  β”‚ 4. Build context β†’ Ollama chat                   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚                  β”‚                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Qdrant      β”‚  β”‚     MySQL       β”‚  β”‚    Ollama        β”‚
β”‚  (vector DB)   β”‚  β”‚  (Peewee ORM)   β”‚  β”‚  (LLM + embed)   β”‚
β”‚  Port 6333     β”‚  β”‚  27-table schema β”‚  β”‚  Local daemon    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Database Architecture

Peewee ORM (Primary) β€” 27-table schema stored in MySQL:

  • User & Tenancy: User, Tenant, UserTenant
  • Knowledge Base: Knowledgebase, Document, File, File2Document, Task
  • Dialogs: Dialog, Conversation
  • LLM Management: LLMFactories, LLM, TenantLLM
  • Connectors: Connector, Connector2Kb, SyncLogs
  • Canvas: UserCanvas, CanvasTemplate
  • Evaluation: EvaluationDataset, EvaluationCase, EvaluationRun, EvaluationResult
  • System: SystemSettings, APIToken, API4Conversation, MCP, Search, PipelineOperationLog

SQLAlchemy (Legacy, Temporary) β€” During the migration from SQLAlchemy to Peewee, chat history continues to use the old schema (chat_sessions, chat_messages, chat_feedback) in a separate database. This will be unified with the Peewee Conversation table in a future update.


Get Started

Prerequisites

Requirement Version Notes
Docker >= 24.0.0 Install Docker
Docker Compose >= v2.26.1 Usually bundled with Docker Desktop
macOS / Linux / WSL2 β€” Windows native not supported; use WSL2
Disk >= 50 GB For models (~8 GB) and data
RAM >= 16 GB Recommended; CPU fallback is slower

Windows: Enable WSL2 and run all commands from inside the WSL shell. Do not run scripts from PowerShell or CMD.

Quick Start

One command for everything β€” auto-installs if needed:

git clone https://github.com/bazzi24/RAGEve.git
cd RAGEve
./scripts/run.sh

The first run will:

  1. Install uv (Python package manager)
  2. Install Ollama and pull models (nomic-embed-text + llama3.2)
  3. Start Docker containers (Qdrant + MySQL)
  4. Start the FastAPI backend and Next.js frontend

Open http://localhost:3000 when you see:

[*] Starting FastAPI backend...
[*] Starting Next.js frontend...
[βœ“] RAGEve is running!

  Frontend  http://localhost:3000
  Backend   http://localhost:8000
  API docs  http://localhost:8000/docs

Press Ctrl+C to stop all services cleanly.

Configuration

RAGEve uses environment variables for configuration. Copy the example and customize:

# From the project root:
cp docker/.env.example .env  # Recommended for Docker deployments
# OR
cp .env.example .env        # If .env.example exists (legacy location)

Core Database Settings

Variable Default Description
MYSQL_HOST localhost MySQL server hostname (Peewee ORM β€” primary)
MYSQL_PORT 3306 MySQL server port
MYSQL_USER root MySQL username
MYSQL_PASSWORD (empty) MySQL password
MYSQL_DBNAME rag_flow MySQL database name (27-table schema)
DB_URL (SQLite) Legacy: SQLAlchemy DSN for chat history (e.g., mysql+aiomysql://... or sqlite:///data/chat.db)

Service URLs

Variable Default Description
OLLAMA_BASE_URL http://localhost:11434 Ollama server URL
QDRANT_URL http://localhost:6333 Qdrant server URL
QDRANT_API_KEY (none) Qdrant API key when auth is enabled

Application Settings

Variable Default Description
CORS_ORIGINS http://localhost:3000,http://localhost:3001,http://localhost:3002 Allowed CORS origins (comma-separated, no spaces)
TRUSTED_PROXY_COUNT 1 Number of reverse proxies for X-Forwarded-For (0 to disable)
API_KEY (none) When set, enables X-API-Key authentication on all endpoints
RATE_LIMIT_PER_MINUTE 120 Rate limit per IP (only active when API_KEY is set)
HF_TOKEN (none) HuggingFace token for private datasets

Storage & Processing

Variable Default Description
DATA_ROOT data Base directory for uploads, chunks, vectors, logs
UPLOAD_DIR_NAME uploads Subdirectory for uploaded files
CHUNK_DIR_NAME chunks Subdirectory for extracted text chunks
VECTOR_DIR_NAME vector Subdirectory for vector index data

Advanced PDF Parsing

Variable Default Description
ENABLE_COLUMN_DETECTION true Enable multi-column layout detection
ENABLE_STRUCTURED_TABLES true Extract tables as markdown
ENABLE_HIERARCHICAL_CHUNKING true Preserve section hierarchy in chunks
ENABLE_READING_ORDER_OPTIMIZATION true Fix reading order for multi-column docs
OCR_ENGINE paddle OCR engine: paddle or tesseract
OCR_THRESHOLD_CHARS 50 Minimum chars to consider PDF not scanned

Upload Limits

Variable Default Description
MAX_UPLOAD_BYTES 524288000 (500 MB) Maximum file size
MAX_DATASET_BYTES 107374182400 (100 GB) Maximum total dataset size

Scripts

Script Description
./scripts/run.sh Everything in one command β€” auto-installs on first run
./scripts/install.sh One-time setup only (called automatically by run.sh)
./scripts/backend.sh Backend only β€” for developers who run the frontend manually

Launch Service from Source for Development

For developers who want full control over startup and debugging.

Full Stack

# 1. Start infrastructure
docker compose -f docker/docker-compose.yml up -d qdrant mysql

# 2. Start Ollama (keep running in a terminal)
ollama serve

# 3. Pull required models (first time only)
ollama pull nomic-embed-text
ollama pull llama3.2:latest

# 4. Install Python dependencies
uv sync

# 5. Install frontend dependencies
cd frontend && npm install && cd ..

# 6. Start FastAPI backend (port 8000)
#    Do NOT use --reload β€” it crashes in-flight uploads
uv run uvicorn backend.main:app --host 0.0.0.0 --port 8000

# 7. Start Next.js frontend (port 3000) β€” in another terminal
cd frontend && npm run dev

Open:

Backend Only

For developers who run the frontend manually (e.g. in an IDE with hot reload):

./scripts/backend.sh

Starts: Docker (Qdrant + MySQL) β†’ Ollama β†’ FastAPI. No frontend.

Run Tests

Store Unit Tests (100+ tests across 9 store services):

uv run python test/test_stores.py

Tests: TenantUserStore (14), KnowledgeBaseStore (21), DialogStore (6), ConversationStore (9), LLMStore (12), ConnectorStore (11), CanvasStore (10), EvaluationStore (12), SystemStore (15).

API Integration Tests (34+ tests):

# Run all API tests
uv run python -m pytest test/api/

# Or run individual test files
uv run python test/api/test_dialogs.py
uv run python test/api/test_conversations.py
uv run python test/api/test_knowledgebases.py
uv run python test/api/test_chat.py
uv run python test/api/test_ingestion.py

API tests use a SQLite database (./test_api.db) and bypass authentication via FastAPI dependency overrides.

Full End-to-End Test (requires running Qdrant + Ollama):

uv run python -m pytest test/integration/test_rag.py

Legacy Tests (pre-migration):

uv run python test/_test_e2e.py
uv run python test/_test_stress.py --test all --stream --keep-files

Community


Contributing

RAGEve grows through open-source collaboration. Contributions of all kinds are welcome β€” bug fixes, features, docs, tests, and feedback.

Before contributing:

  1. Fork the repository and create a feature branch from main
  2. Make your changes β€” all code must pass bash -n scripts/*.sh (shell scripts) and cd frontend && npx tsc --noEmit (TypeScript)
  3. Run the E2E test suite: uv run python test/_test_e2e.py
  4. Submit a pull request with a clear description of what changed and why

Development setup:

git clone https://github.com/bazzi24/RAGEve.git
cd RAGEve
cp .env.example .env    # optional: fill in HF_TOKEN, API_KEY, etc.
./scripts/install.sh  # one-time setup
./scripts/backend.sh  # backend only for iterative development

Built with ❀️ for local-first AI β€” RAGEve

About

Currently, there is a file conflict error; please use version v1.0.0. I will update and fix the error as soon as possible

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors