🦙 llama-pg

A production-ready multi-tenant RAG as a Service (RaaS) orchestrator for intelligent document parsing, vector embeddings generation, and retrieval-augmented generation—enabling you to automate embeddings across all your projects in one place

🚀 Features

PDF Processing: Automatic PDF parsing using LlamaParse (or any supported parser)
Vector Embeddings: Built-in support for any embedding model (e.g., BAAI/bge-m3 or OpenAI's text-embedding-3-small)
Admin Interface: Easy-to-use admin panel for organization, project, and document management
REST API: a FastAPI built to support a multi-tenant and multi-project setup
Worker Architecture: Scalable background processing with ARQ workers
pgai Integration: Leverages TimescaleDB's pgai extension for vector operations
Easy Installation: via Helm or Docker

Pipeline: Document → Embeddings

📄 Document Upload
    ↓
🔄 QUEUED (Redis)
    ↓
📝 PARSING (Parser Worker)
    ↓
💾 PARSED (PostgreSQL - TimescaleDB)
    ↓
🤖 VECTORIZING (Vectorizer Worker - pgai)
    ↓
🔍 READY (PostgreSQL - TimescaleDB)

🛠️ Quick Start

Prerequisites

Docker & Docker Compose
LlamaCloud API key for document parsing (get from LlamaIndex)
OpenAI API key for embeddings (or vLLM with a deployed embedding model)

vLLM Prerequisites (optional)

If using vLLM, we recommend using an embedding model that supports matryoshka dimensions. We personally tested BAAI/bge-m3 using the following command:

vllm serve BAAI/bge-m3 \
  --task embed \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.8 \
  --no-enable-prefix-caching \
  --trust-remote-code \
  --enforce-eager \
  --host 0.0.0.0 \
  --port 8000 \
  --max-model-len 2048 \
  --override-pooler-config '{"normalize": true}' \
  --hf_overrides '{"is_matryoshka": true, "matryoshka_dimensions": [768,1024]}'

1. Clone the Repository

git clone https://github.com/akvnn/llama-pg.git
cd llama-pg

2. Environment Configuration

Create a .env file in the root directory:

Include the required .env variables:

OPENAI_API_KEY=<your_openai_key>
LLAMA_CLOUD_API_KEY=<your_llamaparse_api_key_here>

Additionally, all possible .env variables (including optional ones) are listed here:

# PostgreSQL Configuration
DB_URL=<postgresql://{PG_USER}:{PG_PASSWORD}@{PG_HOST}:{PG_PORT}/{PG_DBNAME}>
DB_POOL_MIN_SIZE=5
DB_POOL_MAX_SIZE=10
DB_POOL_IDLE_TIMEOUT=300
DB_POOL_LIFETIME_TIMEOUT=1800

# Security Configuration
JWT_EXPIRES_IN=3600
JWT_SECRET_KEY=some_dummy_key

# Admin User Configuration
CREATE_DEFAULT_ADMIN_USER=True
ADMIN_USERNAME="admin"
ADMIN_PASSWORD="password"

# vLLM/OpenAI Configuration
OPENAI_API_KEY=<your_openai_key>
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_EMBEDDING_DIMENSIONS=1536
OPENAI_MODEL=gpt-5
USE_VLLM=False
OPENAI_HOST=https://api.openai.com/v1
# Note: to use vLLM, set USE_VLLM to True, OPENAI_HOST to <host_ip:host_port>, and change the OPENAI_EMBEDDING_MODEL and OPENAI_MODEL accordingly

# Parser Configuration
LLAMA_CLOUD_API_KEY=<your_llamaparse_api_key_here>
USE_LLAMA_PARSE=true
LLAMA_PARSE_AUTO_MODE=true

# API Configuration
API_PORT=8000

# Redis Configuration
REDIS_ARQ_HOST=redis
REDIS_ARQ_PORT=6379
REDIS_ARQ_DATABASE=1
REDIS_ARQ_MAX_JOBS=10

3. Run the stack

Using Helm (recommended):

helm repo add akvnn https://akvnn.github.io/llama-pg

helm repo update

helm install llama-pg akvnn/llama-pg \
--set configMapEnv.VITE_API_URL=http://chart-example.local/api \
--set api.secretEnv.OPENAI_API_KEY=openai-api-key \
--set api.secretEnv.LLAMA_CLOUD_API_KEY=llama-cloud-api-key \
--set worker.secretEnv.OPENAI_API_KEY=openai-api-key \
--set worker.secretEnv.LLAMA_CLOUD_API_KEY=llama-cloud-api-key

Make sure to modify the values.yaml if needed, or override them via the helm install command as shown above.

Note: the helm chart comes pre-packaged with TimescaleDB and redis dependencies, if you would like to use an external DB and/or redis, you can disable them in values.yaml.

Using Docker:

docker compose up --build -d

This will start:

PostgreSQL (TimescaleDB with pgai): localhost:5432
Redis: localhost:6379
API: API to manage all services localhost:8000
Admin Panel: Frontend to interact with the API localhost:5173
Worker: Background parsing and vector processing

For local development instructions, see the Development section below (using uv or docker-compose-dev.yml).

📚 Usage

Admin Interface

Access the admin panel at http://localhost:5173 to:

Create and manage organizations and projects
Upload and organize documents
Monitor processing status
Search and chat with your documents

API Endpoints

Simply navigate to http://localhost:8000/docs in your browser to access the full API documentation.

🔧 Development

Local Development Setup

API

Using uv (recommended):

uv sync
uv run -m src.server

Frontend

Using bun (recommended):

bun install
bun run dev

Development Docker Compose

Alternatively, you can use docker-compose-dev.yml docker compose for development which uses bind mounts for hot reload. However, please note that the docker images may take some time to build for the first time due to pgai's dependency on torch and cuda.

docker compose -f docker-compose-dev.yml up -d

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Third-Party License

This project uses pgai, which is licensed under the PostgreSQL License. Copyright (c) Timescale, Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github/workflows		.github/workflows
frontend		frontend
k8s/charts/llama-pg		k8s/charts/llama-pg
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cmd.sh		cmd.sh
docker-compose-dev.yml		docker-compose-dev.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
healthcheck.sh		healthcheck.sh
init_psql.sql		init_psql.sql
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦙 llama-pg

🚀 Features

Pipeline: Document → Embeddings

🛠️ Quick Start

Prerequisites

vLLM Prerequisites (optional)

1. Clone the Repository

2. Environment Configuration

3. Run the stack

📚 Usage

Admin Interface

API Endpoints

🔧 Development

Local Development Setup

API

Frontend

Development Docker Compose

🤝 Contributing

📝 License

Third-Party License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦙 llama-pg

🚀 Features

Pipeline: Document → Embeddings

🛠️ Quick Start

Prerequisites

vLLM Prerequisites (optional)

1. Clone the Repository

2. Environment Configuration

3. Run the stack

📚 Usage

Admin Interface

API Endpoints

🔧 Development

Local Development Setup

API

Frontend

Development Docker Compose

🤝 Contributing

📝 License

Third-Party License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages