DeepRAG is a lightweight, high-performance RAG framework designed for building advanced LLM-powered applications. It supports integration with large language models such as DeepSeek-v3 , Qwen , Llama 3 , and more — whether from cloud APIs or self-hosted backends like SGLang or vLLM .
With support for powerful vector databases like Zilliz/Milvus , efficient data parsing pipelines, and modular architecture, DeepRAG makes it easy to build, scale, and customize retrieval-augmented generation systems for real-world applications.
A Fast and Efficient LLM-RAG Python project based DeepSeek-v3 or other LLM models using Poetry
Let me introduce the RAG frame developed by myself to help u process your documents
Step 1: You should prepare the LLM model and Embedding API KEY and BASE_URL from the cloud provider or your self-hosted SGLang VLLM ....
For example, u can get the mentioned things from Deepseek official website or Aliyun website. OtherWise, if high-performnce computing device such as Nvidia or Ascend or Google TPU or Intel (these brands mentioned has no sortinhgs ) is around u ,u can self-host the LLM or Embedding Model by SGLang or vllm. For the further information about self-hosting , vllm and SGLang docs website is all u need. Trust me, it is easy.
u can self-host the milvus vector database by docker to setup the milvus_uri and milvus_token which .env file needs as the docker shells for standalone and distributed deploying milvus vector database are different, the shells example wont be provided in the readme. For the further information on how to self-host the milvus vector database, the milvus docs website is your best choice.
As u always concerned,why u dont use the open-source vector database. Cuz I want to build a hight-performance RAG system and self-hosted vector database may bring u some unpredictable problems annoying. Given the Zilliz vector database, u can go to the Zilliz domain website to register your own account. For the economics consideration, by the way ,if u just dev or learn the RAG system, just subscribe the Free cluster in Zilliz. Free, and relative high-performance and low failure rate.
pg database is everywhere in cloud providers.but it mays costs some money. if u dont want to purchase cloud database, just self-host it by docker or podman. let me just show u a case.
docker run --name my-postgres \
-e POSTGRES_USER=myuser \
-e POSTGRES_PASSWORD=mypassword \
-e POSTGRES_DB=mydb \
-v pg_data:/var/lib/postgresql/data \
-p 5432:5432 \
-d postgresdocker run -d --name minio --restart always -p 9002:9000 -p 9001:9001 -v minio_data:/data -e "MINIO_ROOT_USER=minioadmin" -e "MINIO_ROOT_PASSWORD=minioadmin" minio/minio server /data --console-address ":9001"git clone https://github.com/fangyisheng/DeepRAG.gitcd DeepRAG/deeprag/src/deepragcp .env.example .envlet me take the .env.example file to help u understand how to write them
LLM_BASE_URL=
# This is the base communication URL for the LLM Model
LLM_API_KEY=
# This is the API Key for the LLM Model
LLM_MODEL=
# This is the name of the large language model you want to use
LLM_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected large language model
LLM_MODEL_QPM=
# This is the maximum number of requests per minute for the large language model (15000 RPM)
LLM_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the large language model (1,200,000 TPM)
EMBEDDING_BASE_URL=
# This is the base communication URL for the Embedding Model
EMBEDDING_API_KEY=
# This is the API Key for the Embedding Model
EMBEDDING_MODEL=
# This is the name of the general-purpose text embedding model you selected
EMBEDDING_DIMENSION=
# This is the embedding dimension of the text vectors generated by the selected embedding model
EMBEDDING_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected text embedding model
EMBEDDING_MODEL_RPS=
# This is the maximum number of requests per second for the embedding model (30 RPS)
EMBEDDING_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the embedding model (1,200,000 TPM)
EMBEDDING_MODEL_INPUT_STRING_ARRAY_LENGTH=
# This number represents how many text strings the cloud provider's embedding API can process in one batch. For example, if this value is 10 and the embedding model's max token is 8192, then the maximum supported text capacity is 8192*10=81920. If batch processing is not supported, set it to 1.
MILVUS_CLUSTER_ENDPOINT=
# This is the cluster endpoint URL of Milvus/Zilliz
MILVUS_CLUSTER_TOKEN=
# This is the authentication token for connecting to the Milvus/Zilliz cluster
DATABASE_URL=
# This is the PostgreSQL database URL used to store user spaces, knowledge bases, file metadata, etc.
MINIO_ENDPOINT=
# This is the backend IP address of your MinIO server (not the MinIO Console URL)
MINIO_ACCESS_KEY=
# This is the access key you created in the MinIO Console
MINIO_SECRET_KEY=
# This is the secret key you created in the MinIO Console
MINIO_SECURE=
# This indicates whether HTTPS should be used when connecting to MinIO, set to True or Falsecd DeepRAG/deepragpoetry installif u dont have poetry, please install it in your environment
then initialize the ORM and Database
cd Deeprag/deeprag/src/deeprag/db/prismacp origin_schema.prisma schema.prismacd Deeprag/deeprag/src/deeprag prisma db push --schema ./db/prisma/schema.prismaprisma db execute --file ./db/prisma/create_auto_increment.sql --schema ./db/prisma/schema.prismaprisma db pull --schema ./db/prisma/schema.prismaprisma generate --schema ./db/prisma/schema.prismacd /DeepRAG/deeprag/src/deeprag/apiuvicorn main:app --host 0.0.0.0 --port 8000if u want to self-host deeprag system by docker, following instructions are needed.
This is the first env file you need to fill out.
cd /DeepRAGcp docker.env.example docker.envPOSTGRES_USER=
# Specifies the username for the PostgreSQL database
POSTGRES_PASSWORD=
# Specifies the password for the PostgreSQL database
POSTGRES_DB=
# Specifies the name of the database to be created in PostgreSQL
PG_EXPOSED_PORT=
# Specifies the port on the host machine that PostgreSQL will listen on
MINIO_EXPOSED_PORT=
# Specifies the port on the host machine that MinIO object storage will listen on
MINIO_CONSOLE_EXPOSED_PORT=
# Specifies the port on the host machine for accessing the MinIO management console
MINIO_ROOT_USER=
# Specifies the root username for MinIO object storage
MINIO_ROOT_PASSWORD=
# Specifies the root password for MinIO object storage
DEEPRAG_APP_PORT=
# Specifies the port on the host machine for the DeepRAG applicationThis is the second env file u need to fill out.
cd /DeepRAG/deeprag/src/deepragcp .env.example .envLLM_BASE_URL=
# This is the base communication URL for the LLM Model
LLM_API_KEY=
# This is the API Key for the LLM Model
LLM_MODEL=
# This is the name of the large language model you want to use
LLM_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected large language model
LLM_MODEL_QPM=
# This is the maximum number of requests per minute for the large language model (15000 RPM)
LLM_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the large language model (1,200,000 TPM)
EMBEDDING_BASE_URL=
# This is the base communication URL for the Embedding Model
EMBEDDING_API_KEY=
# This is the API Key for the Embedding Model
EMBEDDING_MODEL=
# This is the name of the general-purpose text embedding model you selected
EMBEDDING_DIMENSION=
# This is the embedding dimension of the text vectors generated by the selected embedding model
EMBEDDING_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected text embedding model
EMBEDDING_MODEL_RPS=
# This is the maximum number of requests per second for the embedding model (30 RPS)
EMBEDDING_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the embedding model (1,200,000 TPM)
EMBEDDING_MODEL_INPUT_STRING_ARRAY_LENGTH=
# This number represents how many text strings the cloud provider's embedding API can process in one batch. For example, if this value is 10 and the embedding model's max token is 8192, then the maximum supported text capacity is 8192*10=81920. If batch processing is not supported, set it to 1.
MILVUS_CLUSTER_ENDPOINT=
# This is the cluster endpoint URL of Milvus/Zilliz
MILVUS_CLUSTER_TOKEN=
# This is the authentication token for connecting to the Milvus/Zilliz cluster
DATABASE_URL=
# This is the PostgreSQL database URL used to store user spaces, knowledge bases, file metadata, etc.
MINIO_ENDPOINT=
# This is the backend IP address of your MinIO server (not the MinIO Console URL)
MINIO_ACCESS_KEY=
# This is the access key you created in the MinIO Console
MINIO_SECRET_KEY=
# This is the secret key you created in the MinIO Console
MINIO_SECURE=
# This indicates whether HTTPS should be used when connecting to MinIO, set to True or Falseas u are concerned, two variants (DATABASE_URL and MINIO_ENDPOINT) do not need to be filled in.

Why not ? Cuz the two variants have been filled in by docker-compose.yml
cd Deeprag/deeprag/src/deeprag/db/prismacp origin_schema.prisma schema.prismacd /DeepRAGdocker compose --env-file docker.env up -dthen run the three docker commands to check container service.
docker logs -f deeprag-appdocker logs -f deeprag-dbdocker logs -f deeprag-minioif u can see the circumstances similar to the following instructions, u are absolutely successful




