Skip to content

fangyisheng/DeepRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

216 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepRAG: A Fast and Efficient Retrieval-Augmented Generation (RAG) Framework for LLMs

DeepRAG is a lightweight, high-performance RAG framework designed for building advanced LLM-powered applications. It supports integration with large language models such as DeepSeek-v3 , Qwen , Llama 3 , and more — whether from cloud APIs or self-hosted backends like SGLang or vLLM .

With support for powerful vector databases like Zilliz/Milvus , efficient data parsing pipelines, and modular architecture, DeepRAG makes it easy to build, scale, and customize retrieval-augmented generation systems for real-world applications.

DeepRAG Logo

A Fast and Efficient LLM-RAG Python project based DeepSeek-v3 or other LLM models using Poetry

Initilization of DeepRAG:

Let me introduce the RAG frame developed by myself to help u process your documents

Source Code Deploying

Step 1: You should prepare the LLM model and Embedding API KEY and BASE_URL from the cloud provider or your self-hosted SGLang VLLM ....

For example, u can get the mentioned things from Deepseek official website or Aliyun website. OtherWise, if high-performnce computing device such as Nvidia or Ascend or Google TPU or Intel (these brands mentioned has no sortinhgs ) is around u ,u can self-host the LLM or Embedding Model by SGLang or vllm. For the further information about self-hosting , vllm and SGLang docs website is all u need. Trust me, it is easy.

Step 2: Zilliz/Milvus vector database is required

u can self-host the milvus vector database by docker to setup the milvus_uri and milvus_token which .env file needs as the docker shells for standalone and distributed deploying milvus vector database are different, the shells example wont be provided in the readme. For the further information on how to self-host the milvus vector database, the milvus docs website is your best choice.

As u always concerned,why u dont use the open-source vector database. Cuz I want to build a hight-performance RAG system and self-hosted vector database may bring u some unpredictable problems annoying. Given the Zilliz vector database, u can go to the Zilliz domain website to register your own account. For the economics consideration, by the way ,if u just dev or learn the RAG system, just subscribe the Free cluster in Zilliz. Free, and relative high-performance and low failure rate.

Step 3:PG database and Minio is required

pg database is everywhere in cloud providers.but it mays costs some money. if u dont want to purchase cloud database, just self-host it by docker or podman. let me just show u a case.

docker run --name my-postgres \
  -e POSTGRES_USER=myuser \
  -e POSTGRES_PASSWORD=mypassword \
  -e POSTGRES_DB=mydb \
  -v pg_data:/var/lib/postgresql/data \
  -p 5432:5432 \
  -d postgres
docker run -d --name minio --restart always -p 9002:9000 -p 9001:9001 -v minio_data:/data -e "MINIO_ROOT_USER=minioadmin" -e "MINIO_ROOT_PASSWORD=minioadmin" minio/minio server /data --console-address ":9001"

Step4: complete writing the .env file

git clone https://github.com/fangyisheng/DeepRAG.git
cd DeepRAG/deeprag/src/deeprag
cp .env.example .env

let me take the .env.example file to help u understand how to write them

LLM_BASE_URL=
# This is the base communication URL for the LLM Model
LLM_API_KEY=
# This is the API Key for the LLM Model
LLM_MODEL=
# This is the name of the large language model you want to use
LLM_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected large language model
LLM_MODEL_QPM=
# This is the maximum number of requests per minute for the large language model (15000 RPM)
LLM_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the large language model (1,200,000 TPM)
EMBEDDING_BASE_URL=
# This is the base communication URL for the Embedding Model
EMBEDDING_API_KEY=
# This is the API Key for the Embedding Model
EMBEDDING_MODEL=
# This is the name of the general-purpose text embedding model you selected
EMBEDDING_DIMENSION=
# This is the embedding dimension of the text vectors generated by the selected embedding model
EMBEDDING_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected text embedding model
EMBEDDING_MODEL_RPS=
# This is the maximum number of requests per second for the embedding model (30 RPS)
EMBEDDING_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the embedding model (1,200,000 TPM)
EMBEDDING_MODEL_INPUT_STRING_ARRAY_LENGTH=
# This number represents how many text strings the cloud provider's embedding API can process in one batch. For example, if this value is 10 and the embedding model's max token is 8192, then the maximum supported text capacity is 8192*10=81920. If batch processing is not supported, set it to 1.
MILVUS_CLUSTER_ENDPOINT=
# This is the cluster endpoint URL of Milvus/Zilliz
MILVUS_CLUSTER_TOKEN=
# This is the authentication token for connecting to the Milvus/Zilliz cluster
DATABASE_URL=
# This is the PostgreSQL database URL used to store user spaces, knowledge bases, file metadata, etc.
MINIO_ENDPOINT=
# This is the backend IP address of your MinIO server (not the MinIO Console URL)
MINIO_ACCESS_KEY=
# This is the access key you created in the MinIO Console
MINIO_SECRET_KEY=
# This is the secret key you created in the MinIO Console
MINIO_SECURE=
# This indicates whether HTTPS should be used when connecting to MinIO, set to True or False

**Step5: initialize the deeprag module and initialize the ORM and Database. **

cd DeepRAG/deeprag
poetry install

if u dont have poetry, please install it in your environment

then initialize the ORM and Database

cd Deeprag/deeprag/src/deeprag/db/prisma
cp origin_schema.prisma schema.prisma
cd Deeprag/deeprag/src/deeprag 
prisma db push --schema ./db/prisma/schema.prisma
prisma db execute --file ./db/prisma/create_auto_increment.sql --schema ./db/prisma/schema.prisma
prisma db pull  --schema ./db/prisma/schema.prisma
prisma generate  --schema ./db/prisma/schema.prisma

Step 6: Start the Uvicorn service if you want to access the Deeprag system via the HTTP API.

cd /DeepRAG/deeprag/src/deeprag/api
uvicorn main:app --host 0.0.0.0 --port  8000

Self-host by docker

if u want to self-host deeprag system by docker, following instructions are needed.

Step 1: fill out two env-files

This is the first env file you need to fill out.

cd /DeepRAG
cp docker.env.example docker.env
POSTGRES_USER=
# Specifies the username for the PostgreSQL database
POSTGRES_PASSWORD=
# Specifies the password for the PostgreSQL database
POSTGRES_DB=
# Specifies the name of the database to be created in PostgreSQL
PG_EXPOSED_PORT=
# Specifies the port on the host machine that PostgreSQL will listen on
MINIO_EXPOSED_PORT=
# Specifies the port on the host machine that MinIO object storage will listen on
MINIO_CONSOLE_EXPOSED_PORT=
# Specifies the port on the host machine for accessing the MinIO management console
MINIO_ROOT_USER=
# Specifies the root username for MinIO object storage
MINIO_ROOT_PASSWORD=
# Specifies the root password for MinIO object storage
DEEPRAG_APP_PORT=
# Specifies the port on the host machine for the DeepRAG application

This is the second env file u need to fill out.

cd /DeepRAG/deeprag/src/deeprag
cp .env.example .env
LLM_BASE_URL=
# This is the base communication URL for the LLM Model
LLM_API_KEY=
# This is the API Key for the LLM Model
LLM_MODEL=
# This is the name of the large language model you want to use
LLM_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected large language model
LLM_MODEL_QPM=
# This is the maximum number of requests per minute for the large language model (15000 RPM)
LLM_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the large language model (1,200,000 TPM)
EMBEDDING_BASE_URL=
# This is the base communication URL for the Embedding Model
EMBEDDING_API_KEY=
# This is the API Key for the Embedding Model
EMBEDDING_MODEL=
# This is the name of the general-purpose text embedding model you selected
EMBEDDING_DIMENSION=
# This is the embedding dimension of the text vectors generated by the selected embedding model
EMBEDDING_MODEL_MAX_TOKEN=
# This is the maximum context length of the selected text embedding model
EMBEDDING_MODEL_RPS=
# This is the maximum number of requests per second for the embedding model (30 RPS)
EMBEDDING_MODEL_TPM=
# This is the maximum number of tokens processed per minute for the embedding model (1,200,000 TPM)
EMBEDDING_MODEL_INPUT_STRING_ARRAY_LENGTH=
# This number represents how many text strings the cloud provider's embedding API can process in one batch. For example, if this value is 10 and the embedding model's max token is 8192, then the maximum supported text capacity is 8192*10=81920. If batch processing is not supported, set it to 1.
MILVUS_CLUSTER_ENDPOINT=
# This is the cluster endpoint URL of Milvus/Zilliz
MILVUS_CLUSTER_TOKEN=
# This is the authentication token for connecting to the Milvus/Zilliz cluster
DATABASE_URL=
# This is the PostgreSQL database URL used to store user spaces, knowledge bases, file metadata, etc.
MINIO_ENDPOINT=
# This is the backend IP address of your MinIO server (not the MinIO Console URL)
MINIO_ACCESS_KEY=
# This is the access key you created in the MinIO Console
MINIO_SECRET_KEY=
# This is the secret key you created in the MinIO Console
MINIO_SECURE=
# This indicates whether HTTPS should be used when connecting to MinIO, set to True or False

as u are concerned, two variants (DATABASE_URL and MINIO_ENDPOINT) do not need to be filled in. image

Why not ? Cuz the two variants have been filled in by docker-compose.yml

image

Step 2: Create the prisma schema*

cd Deeprag/deeprag/src/deeprag/db/prisma
cp origin_schema.prisma schema.prisma

Step 3: Start the Deeprag container service using the Docker Compose configuration file.

cd /DeepRAG
docker compose --env-file docker.env up -d

then run the three docker commands to check container service.

docker logs -f deeprag-app
docker logs -f deeprag-db
docker logs -f deeprag-minio

if u can see the circumstances similar to the following instructions, u are absolutely successful image

image

image

About

A Fast and Efficient LLM-RAG Python project based DeepSeek-v3 or other LLM models using Poetry

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages