| Title: | Doc Bot |
| Goal: | Deploy RHOAI LLM tutorial and learn the inner workings of the system. |
| Output: | Being able to feed and deploy a RAG based chatbot or at least the backend/API of it. |
| Timing: | 2 to 3h |
| Notes: | We'll start with the RHOAI Insurance Claim RHPDS demo. |
Chat with your documentation lab from AI on OpenShift.
Deploy vLLM Model Serving instance in the OpenAI compatible API mode, either:
You must first make sure that you have properly installed the necessary component of the Single-Model Serving stack, as documented here.
From the documentation:
For deploying large models such as large language models (LLMs), OpenShift AI includes a single-model serving platform that is based on the KServe component. Because each model is deployed on its own model server, the single-model serving platform helps you to deploy, monitor, scale, and maintain large models that require increased resources.
- KServe: A Kubernetes custom resource definition (CRD) that orchestrates model serving for all types of models. KServe includes model-serving runtimes that implement the loading of given types of model servers. KServe also handles the lifecycle of the deployment object, storage access, and networking setup.
- Red Hat OpenShift Serverless: A cloud-native development model that allows for serverless deployments of models. OpenShift Serverless is based on the open source Knative project.
- Red Hat OpenShift Service Mesh: A service mesh networking layer that manages traffic flows and enforces access policies. OpenShift Service Mesh is based on the open source Istio project.
TODO
If you need to delete RHOAI go here
Install the following operators prior to install OpenShift AI operator:
- OpenShift Serverless
- OpenShift Service Mesh
Now, in order to deploy KServe using the same certificate OpenShift cluster is using for routers, do the following:
Intructions from: https://ai-on-openshift.io/odh-rhoai/single-stack-serving-certificate/
Get the name of the secret used for routing tasks in your OpenShift cluster:
export INGRESS_SECRET_NAME=$(oc get ingresscontroller default -n openshift-ingress-operator -o json | jq -r .spec.defaultCertificate.name)
oc get secret ${INGRESS_SECRET_NAME} -n openshift-ingress -o yaml > rhods-internal-primary-cert-bundle-secret.yamlClean rhods-internal-primary-cert-bundle-secret.yaml, change name to rhods-internal-primary-cert-bundle-secret and type to kubernetes.io/tls, it should look like:
kind: Secret
apiVersion: v1
metadata:
name: rhods-internal-primary-cert-bundle-secret
data:
tls.crt: >-
LS0tLS1CRUd...
tls.key: >-
LS0tLS1CRUd...
type: kubernetes.io/tlsCreate the secret in istio-system:
oc apply -n istio-system -f rhods-internal-primary-cert-bundle-secret.yamlCheck the secret is in place:
oc get secret rhods-internal-primary-cert-bundle-secret -n istio-systemInstall OpenShift AI operator if you want to use stable versions use stable.
cat << EOF| oc create -f -
---
apiVersion: v1
kind: Namespace
metadata:
name: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: rhods-operator
namespace: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: rhods-operator
namespace: redhat-ods-operator
spec:
name: rhods-operator
channel: stable-2.10
source: redhat-operators
sourceNamespace: openshift-marketplace
installPlanApproval: Manual
EOFInstall OpenShift AI operator if you want to use the latest versions use fast.
cat << EOF| oc create -f -
---
apiVersion: v1
kind: Namespace
metadata:
name: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: rhods-operator
namespace: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: rhods-operator
namespace: redhat-ods-operator
spec:
name: rhods-operator
channel: fast
source: redhat-operators
sourceNamespace: openshift-marketplace
installPlanApproval: Manual
startingCSV: rhods-operator.2.11.0
EOFThere should be only one install plan! If there are two delete one of them...
oc get installplans -n redhat-ods-operator -o namePatch it in order to approve it:
for installplan in $(oc get installplans -n redhat-ods-operator -o name); do
oc patch $installplan -n redhat-ods-operator --type merge --patch '{"spec":{"approved":true}}'
doneFinally create the DSC:
- Use this one if you want to use the same certificate as the OpenShift routes in your cluster for kserve.
cat << EOF| oc create -f -
---
kind: DataScienceCluster
apiVersion: datasciencecluster.opendatahub.io/v1
metadata:
name: default-dsc
labels:
app.kubernetes.io/name: datasciencecluster
app.kubernetes.io/instance: default-dsc
app.kubernetes.io/part-of: rhods-operator
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/created-by: rhods-operator
spec:
components:
codeflare:
managementState: Managed
dashboard:
managementState: Managed
datasciencepipelines:
managementState: Managed
kserve:
managementState: Managed
serving:
ingressGateway:
certificate:
type: OpenshiftDefaultIngress
managementState: Managed
name: knative-serving
modelmeshserving:
managementState: Managed
kueue:
managementState: Managed
ray:
managementState: Managed
workbenches:
managementState: Managed
EOF- Or set the certificate manually y you want to use any other certificate.
cat << EOF| oc create -f -
---
kind: DataScienceCluster
apiVersion: datasciencecluster.opendatahub.io/v1
metadata:
name: default-dsc
labels:
app.kubernetes.io/name: datasciencecluster
app.kubernetes.io/instance: default-dsc
app.kubernetes.io/part-of: rhods-operator
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/created-by: rhods-operator
spec:
components:
codeflare:
managementState: Managed
dashboard:
managementState: Managed
datasciencepipelines:
managementState: Managed
kserve:
managementState: Managed
serving:
ingressGateway:
certificate:
# Intructions from: https://ai-on-openshift.io/odh-rhoai/single-stack-serving-certificate/
# You have to copy the secret create it in istio-system namespace
secretName: rhods-internal-primary-cert-bundle-secret
type: Provided
managementState: Managed
name: knative-serving
modelmeshserving:
managementState: Managed
kueue:
managementState: Managed
ray:
managementState: Managed
workbenches:
managementState: Managed
EOFOnce the stack is installed, adding the runtime is pretty straightforward:
- As an admin, in the OpenShift AI Dashboard, open the menu
Settings -> Serving runtimes. - Click on
Add serving runtime. - For the type of model serving platforms this runtime supports, select
Single model serving platform. - Upload the file
./vllm_runtime/vllm-runtime.yaml, or clickStart from scratchand copy/paste its content.
The runtime is now available when deploying a model.
In both cases, deploy the model mistralai/Mistral-7B-Instruct-v0.2. The model can be found here.
Some hints:
Get a PAT from HF and put it in .hf-token.
export HF_USERNAME=<YOU_USER_NAME>
export HF_TOKEN=$(cat .hf-token)
export MODEL_ROOT="mistralai"
export MODEL_ID="Mistral-7B-Instruct-v0.2"
export HF_TMPDIR="tmp"
mkdir -p ${HF_TMPDIR} && cd ${HF_TMPDIR}
git lfs install
git clone https://${HF_USERNAME}:${HF_TOKEN}@huggingface.co/${MODEL_ROOT}/${MODEL_ID}
rm -rf ${MODEL_ID}/.gitFlan T5
export HF_USERNAME=<YOU_USER_NAME>
export HF_TOKEN=$(cat .hf-token)
export MODEL_ROOT="google"
export MODEL_ID="flan-t5-small"
export HF_TMPDIR="tmp"
mkdir -p ${HF_TMPDIR} && cd ${HF_TMPDIR}
git lfs install
git clone https://${HF_USERNAME}:${HF_TOKEN}@huggingface.co/${MODEL_ROOT}/${MODEL_ID}
rm -rf ${MODEL_ID}/.gitUpload to an S3 bucket. Run this commands from ${HF_TMPDIR}:
export AWS_S3_BUCKET=models
export AWS_ACCESS_KEY_ID=minio
export AWS_SECRET_ACCESS_KEY=minio123
export AWS_DEFAULT_REGION=none # Any value is fine
export AWS_S3_ENDPOINT=localhost:9000 # e.g., http://localhost:9000
export AWS_S3_CUSTOM_DOMAIN=${AWS_S3_ENDPOINT}
export AWS_S3_USE_PATH_STYLE=1
# Create a bucket
aws s3api create-bucket --bucket ${AWS_S3_BUCKET} --endpoint-url "https://${AWS_S3_ENDPOINT}"
# Upload the model to files folder in the bucket
aws s3 sync ${MODEL_ID} s3://${AWS_S3_BUCKET}/${MODEL_ROOT}/${MODEL_ID}/ --endpoint-url "https://${AWS_S3_ENDPOINT}" This runtime can be used in the exact same way as the out of the box ones. This is the sequence of actions to use our custom vLLM runtime:
Remember that the custom runtime has to be installed before proceeding with this instructions.
- Create a project in RHOAI
- Create a connection to the S3 bucket that contains the model files.
- Deploy the model from the Dashboard.
Make sure you have added a GPU to your GPU configuration, that you have enough VRAM (GPU memory) to load the model, and that you have enough standard memory (RAM). Although the model loads into the GPU, RAM is still used for the pre-loading operations.
- Once the model is loaded, you can access the inference endpoint provided through the dashboard.
Click on Create data science project.
Give it a name.
Or run this:
PROJECT_NAME="vllm-mistral-7b"
DISPLAY_NAME="vLLM Mistral 7B"
DESCRIPTION="Mistral Model run using vLLM"
oc new-project $PROJECT_NAME --display-name="$DISPLAY_NAME" --description="$DESCRIPTION"
oc label namespace $PROJECT_NAME opendatahub.io/dashboard=trueClick on Add data connection.
Name for your connection: mistral
Or run this:
CONNECTION_NAME=mistral
AWS_S3_ENDPOINT=http://minio.ic-shared-minio.svc:9000
oc create secret generic -n ${PROJECT_NAME} aws-connection-${CONNECTION_NAME} \
--from-literal=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
--from-literal=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
--from-literal=AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION} \
--from-literal=AWS_S3_ENDPOINT=${AWS_S3_ENDPOINT} \
--from-literal=AWS_S3_BUCKET=${AWS_S3_BUCKET}
oc label secret aws-connection-${CONNECTION_NAME} -n $PROJECT_NAME opendatahub.io/dashboard=true
oc label secret aws-connection-${CONNECTION_NAME} -n $PROJECT_NAME opendatahub.io/managed=true
oc annotate secret aws-connection-${CONNECTION_NAME} -n $PROJECT_NAME opendatahub.io/connection-type=s3
oc annotate secret aws-connection-${CONNECTION_NAME} -n $PROJECT_NAME openshift.io/display-name=${CONNECTION_NAME}Click on Deploy Model in Single-model serving platform.
Use the following data to deploy your model:
- Model name: mistral-7b
- Serving runtime: mistral-7b
- Model framework (name - version): pytorch
- Model server size: Custom
- CPUs requested: 6 cores
- Memory requested: 24 Gi
- CPUs limit: 8 cores
- Memory limit: 24 Gi
- Accelerator: NVIDIA GPU
- Number of accelerators: 1
- Existing data connection:
- Name: mistral
- Path: mistralai/Mistral-7B-Instruct-v0.2
Path should be ${MODEL_ROOT}/${MODEL_ID}.
Or you can run this command:
oc create -n ${PROJECT_NAME} -f ./vllm_runtime/vllm-mistral-7b-instance.yamlIt's worth mentioning that we have added a node selector to move this workload to a node with A10G gpu.
apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: annotations: openshift.io/display-name: mistral-7b serving.knative.openshift.io/enablePassthrough: 'true' sidecar.istio.io/inject: 'true' sidecar.istio.io/rewriteAppHTTPProbers: 'true' name: mistral-7b labels: opendatahub.io/dashboard: 'true' spec: predictor: maxReplicas: 1 minReplicas: 1 model: modelFormat: name: pytorch name: '' resources: limits: nvidia.com/gpu: '1' requests: nvidia.com/gpu: '1' runtime: mistral-7b storage: key: aws-connection-mistral path: mistralai/Mistral-7B-Instruct-v0.2 tolerations: - effect: NoSchedule key: nvidia.com/gpu operator: Exists nodeSelector: nvidia.com/gpu.product: NVIDIA-A10G # HERE
Wait until the model has been correctly deployed.
S3:
oc logs deploy/mistral-7b-predictor-00001-deployment -c storage-initializer -n ${PROJECT_NAME}
Check if the predictor url is ready:
oc get inferenceservice/mistral-7b -n $PROJECT_NAME -o json | jq -r .status.urlThis implementation of the runtime provides an OpenAI compatible API. So any tool or library that can connect to OpenAI services will be able to consume the endpoint.
Python and Curl examples are provided here.
You can also find a notebook example using Langchain to query vLLM in this repo here.
Also, vLLM provides a full Swagger UI where you can get the full documentation of the API (methods, parameters), and try it directly without any coding,... It is accessible at the address https://your-endpoint-address/docs.
Example tested on vLLM on RHOAI 2.8:
Get model ID:
MODEL_NAME="mistral-7b"
VLLM_PREDICTOR_URL=$(oc get inferenceservice/${MODEL_NAME} -n $PROJECT_NAME -o json | jq -r .status.url)
RUNTIME_MODEL_ID=$(curl -ks -X 'GET' "${VLLM_PREDICTOR_URL}/v1/models" -H 'accept: application/json' | jq -r .data[0].id )
echo ${RUNTIME_MODEL_ID}Prompt:
curl -s -X 'POST' \
"${VLLM_PREDICTOR_URL}/v1/completions" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "'${RUNTIME_MODEL_ID}'",
"prompt": "San Francisco is a",
"max_tokens": 25,
"temperature": 0
}'Run ./bootstrap/gpu-machineset.sh and choose (2) Tesla T4 Multi GPU and make sure you get this after typing 2.
You have to be
cluster-adminfor this to succeed.
### Selected GPU instance type: Tesla T4 Multi GPU
### Selected GPU instance type: g4dn.12xlarge
Now choose the AWS region and AZ you want the machine set to deploy workers with GPUs:
If starting of the RHPDS Claim Service demo you should probably use
us-east-2andaz1.
### Enter the AWS region (default: us-west-2): us-east-2
### Select the availability zone (az1, az2, az3):
1) az1
2) az2
3) az3
Please enter your choice: 1
### Creating new machineset worker-gpu-g4dn.12xlarge-us-east-2a.
machineset.machine.openshift.io/worker-gpu-g4dn.12xlarge-us-east-2a created
--- New machineset worker-gpu-g4dn.12xlarge-us-east-2a created.
Run this command to check if the machineset is in place.
MACHINESET_NAME=$(oc get machineset -n openshift-machine-api -o json | jq -r '.items[] | select(.metadata.name | test("^worker-gpu")) | .metadata.name')
echo ${MACHINESET_NAME}It should return something like: "worker-gpu-g4dn.12xlarge-us-east-2a"
Check if our new nodes are resdy:
oc get machineset/${MACHINESET_NAME} -n openshift-machine-apiYou should get something like this:
NAME DESIRED CURRENT READY AVAILABLE AGE
worker-gpu-g4dn.12xlarge-us-east-2a 1 1 1 1 8m31s
PROJECT_NAME="vllm-mistral-2x-t4"
DISPLAY_NAME="vLLM Mistral 2x T4"
DESCRIPTION="Mistral Model run using vLLM with 2x T4"
oc new-project $PROJECT_NAME --display-name="$DISPLAY_NAME" --description="$DESCRIPTION"
oc label namespace $PROJECT_NAME opendatahub.io/dashboard=trueClick on Add data connection.
Name for your connection: mistral
Or run this:
CONNECTION_NAME=mistral
AWS_S3_ENDPOINT=http://minio.ic-shared-minio.svc:9000
oc create secret generic -n ${PROJECT_NAME} aws-connection-${CONNECTION_NAME} \
--from-literal=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
--from-literal=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
--from-literal=AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION} \
--from-literal=AWS_S3_ENDPOINT=${AWS_S3_ENDPOINT} \
--from-literal=AWS_S3_BUCKET=${AWS_S3_BUCKET}
oc label secret aws-connection-${CONNECTION_NAME} -n $PROJECT_NAME opendatahub.io/dashboard=true
oc label secret aws-connection-${CONNECTION_NAME} -n $PROJECT_NAME opendatahub.io/managed=true
oc annotate secret aws-connection-${CONNECTION_NAME} -n $PROJECT_NAME opendatahub.io/connection-type=s3
oc annotate secret aws-connection-${CONNECTION_NAME} -n $PROJECT_NAME openshift.io/display-name=${CONNECTION_NAME}oc create -n ${PROJECT_NAME} -f ./vllm_runtime/vllm-mistral-7b-multigpu-instance.yamlWait until the model has been correctly deployed.
S3:
oc logs deploy/mistral-7b-2x-t4-predictor-00001-deployment -c storage-initializer -n ${PROJECT_NAME}
Check if the predictor url is ready:
oc get inferenceservice/mistral-7b-2x-t4 -n $PROJECT_NAME -o json | jq -r .status.urlRun this command to check if the pod has two T4s:
T4_POD=$(oc get pod -n $PROJECT_NAME -o json | jq -r .items[0].metadata.name)
oc rsh ${T4_POD} nvidia-smiYou should get something like this:
Thu Jun 6 17:37:44 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 37C P0 26W / 70W | 11145MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 37C P0 26W / 70W | 11145MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
This implementation of the runtime provides an OpenAI compatible API. So any tool or library that can connect to OpenAI services will be able to consume the endpoint.
Python and Curl examples are provided here.
You can also find a notebook example using Langchain to query vLLM in this repo here.
Also, vLLM provides a full Swagger UI where you can get the full documentation of the API (methods, parameters), and try it directly without any coding,... It is accessible at the address https://your-endpoint-address/docs.
Example tested on vLLM on RHOAI 2.8:
Get model ID:
MODEL_NAME="mistral-7b-2x-t4"
VLLM_PREDICTOR_URL=$(oc get inferenceservice/${MODEL_NAME} -n $PROJECT_NAME -o json | jq -r .status.url)
RUNTIME_MODEL_ID=$(curl -ks -X 'GET' "${VLLM_PREDICTOR_URL}/v1/models" -H 'accept: application/json' | jq -r .data[0].id )
echo ${RUNTIME_MODEL_ID}Prompt:
curl -s -X 'POST' \
"${VLLM_PREDICTOR_URL}/v1/completions" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "'${RUNTIME_MODEL_ID}'",
"prompt": "San Francisco is a",
"max_tokens": 25,
"temperature": 0
}'For our RAG we will need a Vector Database to store the Embeddings of the different documents. In this example we are using Milvus.
Deployment instructions specific to OpenShift are available here.
After you follow those instructions you should have a Milvus instance ready to be populated with documents.
- Access to the OpenShift cluster.
- A default StorageClass must be configured.
We'll deploy a default installation of Milvus, either standalone or in cluster mode, with authentication enabled. The provided openshift-values.yaml file can be changed to configure the installation.
- The default Milvus deployment leverages Minio to store logs and index files. This can be replaced by another S3 storage system
- Default configuration uses Pulsar for managing logs of recent changes, outputting stream logs, and providing log subscriptions. This can be replaced by Kafka
To modify those components, as well many other configuration parameters, please refer to the configuration documentation and modify the values file according to your needs.
Milvus can be deployed in Standalone or Cluster mode. Cluster mode, leveraging Pulsar, etcd and Minio for data persistency, will bring redundancy, as well as easy scale up and down of the different components.
Although Milvus features an operator to easily deploy it in a Kubernetes environment, this method has not been tested yet, while waiting for the different corrections to be made to the deployment code for OpenShift specificities.
Instead, this deployment method is based on the Offline installation that purely rely on Helm Charts.
- Log into your OpenShift cluster, and create a new project to host your Milvus installation:
MILVUS_PROJECT_NAME=milvus
oc new-project ${MILVUS_PROJECT_NAME}- Add and update Milvus Helm repository locally:
helm repo add milvus https://zilliztech.github.io/milvus-helm/
helm repo update-
Fetch the file
openshift-values.yamlfrom this repo. This file is really important as it sets specific values for OpenShift compatibility. You can also modify some of the values in this file to adapt the deployment to your requirements, notably modify the Minio admin user and password.wget https://raw.githubusercontent.com/alpha-hack-program/doc-bot/main/vector-databases/milvus/openshift-values.yaml
-
Create the manifest:
-
For Milvus standalone:
helm template -f openshift-values.yaml vectordb --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false milvus/milvus > milvus_manifest_standalone.yaml -
For Milvus cluster:
helm template -f openshift-values.yaml vectordb milvus/milvus > milvus_manifest_cluster.yaml
-
-
VERY IMPORTANT: you must patch the generated manifest, as some settings are incompatible with OpenShift. Those commands are using the yq tool (beware, the real one, not the Python version):
-
For Milvus Standalone:
yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.securityContext) = {}' -i milvus_manifest_standalone.yaml yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.containers[0].securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_standalone.yaml yq '(select(.kind == "Deployment" and .metadata.name == "vectordb-minio") | .spec.template.spec.securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_standalone.yaml
-
For Milvus Cluster:
yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.securityContext) = {}' -i milvus_manifest_cluster.yaml yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.containers[0].securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_cluster.yaml yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-minio") | .spec.template.spec.securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_cluster.yaml
-
-
Deploy Milvus (eventually change the name of the manifest)!
-
For Milvus Standalone:
oc apply -n ${MILVUS_PROJECT_NAME} -f milvus_manifest_standalone.yaml -
For Milvus Cluster:
oc apply -n ${MILVUS_PROJECT_NAME} -f milvus_manifest_cluster.yaml
-
-
To deploy the management UI for Milvus, called Attu, apply the file attu-deployment.yaml:
oc apply -f https://raw.githubusercontent.com/alpha-hack-program/doc-bot/main/vector-databases/milvus/attu-deployment.yamlNOTE: Attu deployment could have been done through the Helm chart, but this would not properly create the access Route.
Milvus is now deployed, with authentication enabled. The default and only admin user is root, with the default password Milvus. Please see the following section to modify this root access and create the necessary users and roles.
Milvus implements a full RBAC system to control access to its databases and collections. It is recommended to:
- Change the default password
- Create collections
- Create users and roles to give read/write or read access to the different collections
All of this can be done either using the PyMilvus library, or the Attu UI deployed in the last step.
Milvus features an awesome documentation, detailing all the configuration, maintenance, and operations procedure. This is a must-read to grasp all aspects of its architecture and usage.
Several example notebooks are available to show how to use Milvus:
- Collection creation and document ingestion using Langchain: Langchain-Milvus-Ingest.ipynb
- Collection creation and document ingestion using Langchain with Nomic AI Embeddings: Langchain-Milvus-Ingest-nomic.ipynb
- Query a collection using Langchain: Langchain-Milvus-Query.ipynb
- Query a collection using Langchain with Nomic AI Embeddings: Langchain-Milvus-Query-nomic.ipynb
NOTE: Install OpenShift GitOps prior to run the following commands.
TODO! NOTE: This procedure deploys Milvus and all the other assets needed but the bucket models needs to be in place in the minio instance in
ic-shared-minionamespace...
- How do you know how many GPU is consumed?
- NVIDIA GPU Operator
- How to install vLLM as a RHOAI runtime
- How to install vLLM as deployment
- https://www.3blue1brown.com/lessons/attention
- https://huggingface.co/learn/cookbook/en/advanced_rag
- https://github.com/alvarolop/ocp-ai/blob/main/openshift/rhoai-configuration/OdhDashboardConfig.yaml#L32-L34
- https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.8/html-single/serving_models/index#about-model-serving_about-model-serving
- https://aws.amazon.com/ec2/instance-types/
- https://aws.amazon.com/ec2/pricing/on-demand/
- https://github.com/alvarolop/rhoai-gitops/tree/main
RAG:
- https://huggingface.co/learn/cookbook/advanced_rag
- https://huggingface.co/blog/ray-rag
- https://ai.meta.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/
Infra:
RAG Evaluation:
- https://docs.ragas.io/en/latest/index.html
- https://mlflow.org/docs/latest/llms/llm-evaluate/notebooks/rag-evaluation.html
- https://www.trulens.org/
- https://docs.ragas.io/en/latest/getstarted/evaluation.html#get-started-evaluation
- https://github.com/rcarrat-AI/unir_master_thesis/blob/main/notebooks/TFM_Experiments_LLM_vs_RAG.ipynb
GPUs:
- https://github.com/rh-aiservices-bu/gpu-partitioning-guide
- https://github.com/rh-aiservices-bu/multi-gpu-llms
GitOps Iniciatives:
- https://gitlab.consulting.redhat.com/denizbank-ai/mlops-cluster-config
- https://github.com/rh-aiservices-bu/parasol-insurance
- https://github.com/redhat-na-ssa/demo-ai-gitops-catalog
KServe certs:
Embeddings:
Dataset Embeddings:
Uninstall RHOAI:
Milvus:
python3 -m venv .venv
source .venv/bin/activateexport NAMESPACE_NAME="vllm-mistral-7b"
export PIPELINE_NAME="rag-ingestion"
# NAMESPACE_NAME="fraud-detection"
# export PIPELINE_NAME="fraud"
export SVC="https://ds-pipeline-dspa-${NAMESPACE_NAME}.apps.cluster-hxbsq.sandbox1380.opentlc.com"
export AUTH_HEADER="Bearer $(oc whoami -t)"
export METHOD="GET"
export CONTENT_TYPE="application/json"
export REQUEST_BODY="{}"
export PIPELINES=$(curl -s -X "${METHOD}" -H "Authorization: ${AUTH_HEADER}" ${SVC}/apis/v2beta1/pipelines)
echo $PIPELINES | jq .
PIPELINE=$(echo ${PIPELINES} | jq --arg name ${PIPELINE_NAME} '.pipelines[] | select(.display_name == $name)')
echo ${PIPELINE} | jq .
export PIPELINE_ID=$(echo ${PIPELINE} | jq -r .pipeline_id)
echo ${PIPELINE_ID}CONTENT_TYPE="application/json"
REQUEST_BODY="{}"
# PIPELINE=$(curl -s -X "GET" -H "Authorization: ${AUTH_HEADER}" ${SVC}/apis/v2beta1/pipelines/${PIPELINE_ID} | jq)
# PIPELINE_NAME=$(echo ${PIPELINE} | jq -r .display_name)
PIPELINE_RUN=$(
curl -s -H "Content-Type: ${CONTENT_TYPE}" -H "Authorization: ${AUTH_HEADER}" -X POST ${SVC}/apis/v1beta1/runs \
-d @- << EOF
{
"name":"${PIPELINE_NAME}_run",
"pipeline_spec":{
"pipeline_id":"${PIPELINE_ID}"
}
}
EOF
)
PIPELINE_RUN=$(
curl -s -H "Content-Type: ${CONTENT_TYPE}" -H "Authorization: ${AUTH_HEADER}" -X POST ${SVC}/apis/v2beta1/runs \
-d @- << EOF
{
"display_name":"${PIPELINE_NAME}_run",
"description":"This is run from curl",
"runtime_config": {
"parameters": {}
},
"pipeline_version_reference":{
"pipeline_id":"${PIPELINE_ID}"
}
}
EOF
)
echo ${PIPELINE_RUN} | jq .
PIPELINE_RUN_ID=$(echo ${PIPELINE_RUN} | jq -r .run_id)apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
name: doc-bot
namespace: openshift-gitops
spec:
destination:
namespace: vllm-mistral-7b
server: 'https://kubernetes.default.svc'
project: default
source:
helm:
parameters:
- name: instanceName
value: vllm-mistral-7b
- name: dataScienceProjectDisplayName
value: vllm-mistral-7b
- name: dataScienceProjectNamespace
value: vllm-mistral-7b
path: gitops/doc-bot
repoURL: 'git@github.com:alpha-hack-program/doc-bot.git'
targetRevision: main
syncPolicy:
automated:
selfHeal: true



