DeepRAG: A Fast and Efficient Retrieval-Augmented Generation (RAG) Framework for LLMs

DeepRAG is a lightweight, high-performance RAG framework designed for building advanced LLM-powered applications. It supports integration with large language models such as DeepSeek-v3 , Qwen , Llama 3 , and more — whether from cloud APIs or self-hosted backends like SGLang or vLLM .

With support for powerful vector databases like Zilliz/Milvus , efficient data parsing pipelines, and modular architecture, DeepRAG makes it easy to build, scale, and customize retrieval-augmented generation systems for real-world applications.

A Fast and Efficient LLM-RAG Python project based DeepSeek-v3 or other LLM models using Poetry

Initilization of DeepRAG:

Let me introduce the RAG frame developed by myself to help u process your documents

Source Code Deploying

Step 1: You should prepare the LLM model and Embedding API KEY and BASE_URL from the cloud provider or your self-hosted SGLang VLLM ....

For example, u can get the mentioned things from Deepseek official website or Aliyun website. OtherWise, if high-performnce computing device such as Nvidia or Ascend or Google TPU or Intel (these brands mentioned has no sortinhgs ) is around u ,u can self-host the LLM or Embedding Model by SGLang or vllm. For the further information about self-hosting , vllm and SGLang docs website is all u need. Trust me, it is easy.

Step 2: Zilliz/Milvus vector database is required

u can self-host the milvus vector database by docker to setup the milvus_uri and milvus_token which .env file needs as the docker shells for standalone and distributed deploying milvus vector database are different, the shells example wont be provided in the readme. For the further information on how to self-host the milvus vector database, the milvus docs website is your best choice.

As u always concerned,why u dont use the open-source vector database. Cuz I want to build a hight-performance RAG system and self-hosted vector database may bring u some unpredictable problems annoying. Given the Zilliz vector database, u can go to the Zilliz domain website to register your own account. For the economics consideration, by the way ,if u just dev or learn the RAG system, just subscribe the Free cluster in Zilliz. Free, and relative high-performance and low failure rate.

Step 3:PG database and Minio is required

pg database is everywhere in cloud providers.but it mays costs some money. if u dont want to purchase cloud database, just self-host it by docker or podman. let me just show u a case.

docker run --name my-postgres \
  -e POSTGRES_USER=myuser \
  -e POSTGRES_PASSWORD=mypassword \
  -e POSTGRES_DB=mydb \
  -v pg_data:/var/lib/postgresql/data \
  -p 5432:5432 \
  -d postgres

docker run -d --name minio --restart always -p 9002:9000 -p 9001:9001 -v minio_data:/data -e "MINIO_ROOT_USER=minioadmin" -e "MINIO_ROOT_PASSWORD=minioadmin" minio/minio server /data --console-address ":9001"

Step4: complete writing the .env file

git clone https://github.com/fangyisheng/DeepRAG.git

cd DeepRAG/deeprag/src/deeprag

cp .env.example .env

let me take the .env.example file to help u understand how to write them

LLM_BASE_URL=
# This is the base communication URL for the LLM Model
LLM_API_KEY=
# This is the API Key for the LLM Model
LLM_MODEL=
# This is the name of the large language model you want to use
LLM_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected large language model
LLM_MODEL_QPM=
# This is the maximum number of requests per minute for the large language model (15000 RPM)
LLM_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the large language model (1,200,000 TPM)
EMBEDDING_BASE_URL=
# This is the base communication URL for the Embedding Model
EMBEDDING_API_KEY=
# This is the API Key for the Embedding Model
EMBEDDING_MODEL=
# This is the name of the general-purpose text embedding model you selected
EMBEDDING_DIMENSION=
# This is the embedding dimension of the text vectors generated by the selected embedding model
EMBEDDING_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected text embedding model
EMBEDDING_MODEL_RPS=
# This is the maximum number of requests per second for the embedding model (30 RPS)
EMBEDDING_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the embedding model (1,200,000 TPM)
EMBEDDING_MODEL_INPUT_STRING_ARRAY_LENGTH=
# This number represents how many text strings the cloud provider's embedding API can process in one batch. For example, if this value is 10 and the embedding model's max token is 8192, then the maximum supported text capacity is 8192*10=81920. If batch processing is not supported, set it to 1.
MILVUS_CLUSTER_ENDPOINT=
# This is the cluster endpoint URL of Milvus/Zilliz
MILVUS_CLUSTER_TOKEN=
# This is the authentication token for connecting to the Milvus/Zilliz cluster
DATABASE_URL=
# This is the PostgreSQL database URL used to store user spaces, knowledge bases, file metadata, etc.
MINIO_ENDPOINT=
# This is the backend IP address of your MinIO server (not the MinIO Console URL)
MINIO_ACCESS_KEY=
# This is the access key you created in the MinIO Console
MINIO_SECRET_KEY=
# This is the secret key you created in the MinIO Console
MINIO_SECURE=
# This indicates whether HTTPS should be used when connecting to MinIO, set to True or False

Step5: initialize the deeprag module and initialize the ORM and Database.

cd DeepRAG/deeprag

poetry install

if u dont have poetry, please install it in your environment

then initialize the ORM and Database

cd Deeprag/deeprag/src/deeprag/db/prisma

cp origin_schema.prisma schema.prisma

cd Deeprag/deeprag/src/deeprag

prisma db push --schema ./db/prisma/schema.prisma

prisma db execute --file ./db/prisma/create_auto_increment.sql --schema ./db/prisma/schema.prisma

prisma db pull  --schema ./db/prisma/schema.prisma

prisma generate  --schema ./db/prisma/schema.prisma

Step 6: Start the Uvicorn service if you want to access the Deeprag system via the HTTP API.

cd /DeepRAG/deeprag/src/deeprag/api

uvicorn main:app --host 0.0.0.0 --port  8000

Self-host by docker

if u want to self-host deeprag system by docker, following instructions are needed.

Step 1: fill out two env-files

This is the first env file you need to fill out.

cd /DeepRAG

cp docker.env.example docker.env

POSTGRES_USER=
# Specifies the username for the PostgreSQL database
POSTGRES_PASSWORD=
# Specifies the password for the PostgreSQL database
POSTGRES_DB=
# Specifies the name of the database to be created in PostgreSQL
PG_EXPOSED_PORT=
# Specifies the port on the host machine that PostgreSQL will listen on
MINIO_EXPOSED_PORT=
# Specifies the port on the host machine that MinIO object storage will listen on
MINIO_CONSOLE_EXPOSED_PORT=
# Specifies the port on the host machine for accessing the MinIO management console
MINIO_ROOT_USER=
# Specifies the root username for MinIO object storage
MINIO_ROOT_PASSWORD=
# Specifies the root password for MinIO object storage
DEEPRAG_APP_PORT=
# Specifies the port on the host machine for the DeepRAG application

This is the second env file u need to fill out.

cd /DeepRAG/deeprag/src/deeprag

cp .env.example .env

LLM_BASE_URL=
# This is the base communication URL for the LLM Model
LLM_API_KEY=
# This is the API Key for the LLM Model
LLM_MODEL=
# This is the name of the large language model you want to use
LLM_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected large language model
LLM_MODEL_QPM=
# This is the maximum number of requests per minute for the large language model (15000 RPM)
LLM_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the large language model (1,200,000 TPM)
EMBEDDING_BASE_URL=
# This is the base communication URL for the Embedding Model
EMBEDDING_API_KEY=
# This is the API Key for the Embedding Model
EMBEDDING_MODEL=
# This is the name of the general-purpose text embedding model you selected
EMBEDDING_DIMENSION=
# This is the embedding dimension of the text vectors generated by the selected embedding model
EMBEDDING_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected text embedding model
EMBEDDING_MODEL_RPS=
# This is the maximum number of requests per second for the embedding model (30 RPS)
EMBEDDING_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the embedding model (1,200,000 TPM)
EMBEDDING_MODEL_INPUT_STRING_ARRAY_LENGTH=
# This number represents how many text strings the cloud provider's embedding API can process in one batch. For example, if this value is 10 and the embedding model's max token is 8192, then the maximum supported text capacity is 8192*10=81920. If batch processing is not supported, set it to 1.
MILVUS_CLUSTER_ENDPOINT=
# This is the cluster endpoint URL of Milvus/Zilliz
MILVUS_CLUSTER_TOKEN=
# This is the authentication token for connecting to the Milvus/Zilliz cluster
DATABASE_URL=
# This is the PostgreSQL database URL used to store user spaces, knowledge bases, file metadata, etc.
MINIO_ENDPOINT=
# This is the backend IP address of your MinIO server (not the MinIO Console URL)
MINIO_ACCESS_KEY=
# This is the access key you created in the MinIO Console
MINIO_SECRET_KEY=
# This is the secret key you created in the MinIO Console
MINIO_SECURE=
# This indicates whether HTTPS should be used when connecting to MinIO, set to True or False

as u are concerned, two variants (DATABASE_URL and MINIO_ENDPOINT) do not need to be filled in.

Why not ? Cuz the two variants have been filled in by docker-compose.yml

Step 2: Create the prisma schema*

cd Deeprag/deeprag/src/deeprag/db/prisma

cp origin_schema.prisma schema.prisma

Step 3: Start the Deeprag container service using the Docker Compose configuration file.

cd /DeepRAG

docker compose --env-file docker.env up -d

then run the three docker commands to check container service.

docker logs -f deeprag-app

docker logs -f deeprag-db

docker logs -f deeprag-minio

if u can see the circumstances similar to the following instructions, u are absolutely successful

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
deeprag		deeprag
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
docker.env.example		docker.env.example
entrypoint.sh		entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepRAG: A Fast and Efficient Retrieval-Augmented Generation (RAG) Framework for LLMs

Initilization of DeepRAG:

Source Code Deploying

Step 1: You should prepare the LLM model and Embedding API KEY and BASE_URL from the cloud provider or your self-hosted SGLang VLLM ....

Step 2: Zilliz/Milvus vector database is required

Step 3:PG database and Minio is required

Step4: complete writing the .env file

Step5: initialize the deeprag module and initialize the ORM and Database.

Step 6: Start the Uvicorn service if you want to access the Deeprag system via the HTTP API.

Self-host by docker

Step 1: fill out two env-files

Step 2: Create the prisma schema*

Step 3: Start the Deeprag container service using the Docker Compose configuration file.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepRAG: A Fast and Efficient Retrieval-Augmented Generation (RAG) Framework for LLMs

Initilization of DeepRAG:

Source Code Deploying

Step 1: You should prepare the LLM model and Embedding API KEY and BASE_URL from the cloud provider or your self-hosted SGLang VLLM ....

Step 2: Zilliz/Milvus vector database is required

Step 3:PG database and Minio is required

Step4: complete writing the .env file

**Step5: initialize the deeprag module and initialize the ORM and Database. **

Step 6: Start the Uvicorn service if you want to access the Deeprag system via the HTTP API.

Self-host by docker

Step 1: fill out two env-files

Step 2: Create the prisma schema*

Step 3: Start the Deeprag container service using the Docker Compose configuration file.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step5: initialize the deeprag module and initialize the ORM and Database.

Packages