diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000..47dc3e3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md deleted file mode 100644 index 8b5a062..0000000 --- a/CONTRIBUTING.md +++ /dev/null @@ -1 +0,0 @@ -Refer to the [AGENTS.md](./AGENTS.md) file for instructions. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 120000 index 0000000..47dc3e3 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/website/.prettierrc b/website/.prettierrc index f1ec27f..d84176e 100644 --- a/website/.prettierrc +++ b/website/.prettierrc @@ -4,6 +4,6 @@ "bracketSameLine": true, "printWidth": 80, "proseWrap": "preserve", - "singleQuote": true, + "singleQuote": false, "trailingComma": "all" } diff --git a/website/AGENTS.md b/website/AGENTS.md index 6c7fcc9..3312e61 100644 --- a/website/AGENTS.md +++ b/website/AGENTS.md @@ -59,6 +59,9 @@ image: - **Variables**: - Format as `` (camelCase, no quotes). - Highlight lines containing variables in code blocks (e.g., ` ```yaml {2} `). +- **No line number references in text**: + - Do not refer to specific line numbers in the descriptive text (e.g., avoid "Replace the value on line 4"). + - Instead, refer to the content or field names (e.g., "Replace the `tags` value"). ### Admonitions diff --git a/website/docs/getting-started/prerequisites.mdx b/website/docs/getting-started/prerequisites.mdx index 609542f..bf5cacf 100644 --- a/website/docs/getting-started/prerequisites.mdx +++ b/website/docs/getting-started/prerequisites.mdx @@ -1,10 +1,10 @@ --- -title: 'Prerequisites' +title: Prerequisites sidebar_position: 1 --- -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; +import Tabs from "@theme/Tabs"; +import TabItem from "@theme/TabItem"; This document introduces the prerequisites for the MoAI Inference Framework and provides instructions on how to install them. @@ -16,11 +16,11 @@ To follow this document, you need to understand the configuration of the Kuberne To install the MoAI Inference Framework, you must have -* Kubernetes 1.29 or later -* At least one worker node equipped with accelerators supported by the MoAI Inference Framework (e.g., AMD GPUs) -* `cluster-admin` privilege for the Kubernetes cluster -* A StorageClass defined in the Kubernetes cluster (required for storing the monitoring metrics, model weights, etc.) -* A Docker private registry accessible from the Kubernetes cluster +- Kubernetes 1.29 or later +- At least one worker node equipped with accelerators supported by the MoAI Inference Framework (e.g., AMD GPUs) +- `cluster-admin` privilege for the Kubernetes cluster +- A StorageClass defined in the Kubernetes cluster (required for storing the monitoring metrics, model weights, etc.) +- A Docker private registry accessible from the Kubernetes cluster --- @@ -124,7 +124,7 @@ kubectl create secret -n amd-gpu \ --docker-password= ``` -Then, create a `gpu-operator-values.yaml` file with the following content. **Please replace `` on line 7 with the URL of your private registry**. You may also change the image name `amdgpu-driver`, if necessary, according to your private registry's policies. +Then, create a `gpu-operator-values.yaml` file with the following content. **Please replace `` with the URL of your private registry**. You may also change the image name `amdgpu-driver`, if necessary, according to your private registry's policies. ```yaml title="gpu-operator-values.yaml" {7} deviceConfig: @@ -224,7 +224,7 @@ device node GUID This section describes how to install the **rdma-shared-device-plugin**. See [k8s-rdma-shared-dev-plugin / README](https://github.com/Mellanox/k8s-rdma-shared-dev-plugin/blob/master/README.md) for more details. -First, create a `rdma-shared-device-plugin.yaml` file as follows. **You need to replace `` on line 21 with your RDMA NIC's network interface name**. If multiple NICs are installed on the server, you must list all interface names (e.g., `"devices": ["ib0", "ib1"]`). +First, create a `rdma-shared-device-plugin.yaml` file as follows. **You need to replace `` with your RDMA NIC's network interface name**. If multiple NICs are installed on the server, you must list all interface names (e.g., `"devices": ["ib0", "ib1"]`). :::info You can check the network interface names using the `ip addr` command. diff --git a/website/docs/getting-started/quickstart.mdx b/website/docs/getting-started/quickstart.mdx index 9be9e02..c3f8bec 100644 --- a/website/docs/getting-started/quickstart.mdx +++ b/website/docs/getting-started/quickstart.mdx @@ -1,10 +1,10 @@ --- -title: 'Quickstart' +title: Quickstart sidebar_position: 2 --- -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; +import Tabs from "@theme/Tabs"; +import TabItem from "@theme/TabItem"; This quickstart launches two vLLM instances (pods) of the Llama 3.2 1B Instruct model and serves them through a single endpoint as an example. Please make sure to install all [prerequisites](./prerequisites.mdx), including the following versions of the components, before starting this quickstart guide. @@ -164,7 +164,7 @@ spec: gatewayClassName: istio infrastructure: parametersRef: - group: '' + group: "" kind: ConfigMap name: mif-gateway-infrastructure listeners: @@ -241,7 +241,7 @@ helm repo add moreh https://moreh-dev.github.io/helm-charts helm repo update moreh ``` -This quickstart uses a simple scheduling rule that selects the vLLM pod with fewer queued requests between the two pods. Create a `heimdall-values.yaml` file as shown below and deploy the Heimdall scheduler using this file. **Note that you need to set `gatewayClassName` on line 20 to `kgateway` if you are using Kgateway as the gateway controller**. +This quickstart uses a simple scheduling rule that selects the vLLM pod with fewer queued requests between the two pods. Create a `heimdall-values.yaml` file as shown below and deploy the Heimdall scheduler using this file. **Note that you need to set `gatewayClassName` to `kgateway` if you are using Kgateway as the gateway controller**. ```yaml title="heimdall-values.yaml" global: @@ -313,7 +313,7 @@ To enable the vLLM pods to download model parameters from Hugging Face, you must In production environments, it is common to download the model parameters to a storage volume in advance and load them at runtime. Refer to the [Hugging Face model management with persistent volume](/best_practices/hf_model_management_with_pv) for more details. ::: -Create a `vllm-llama3-1b-instruct-tp2.yaml` file with the following contents. **Please replace `` on line 20 with your Hugging Face token that has accepted the model license**. +Create a `vllm-llama3-1b-instruct-tp2.yaml` file with the following contents. **Please replace `` with your Hugging Face token that has accepted the model license**. ```yaml title="vllm-llama3-1b-instruct-tp2.yaml" {20} apiVersion: odin.moreh.io/v1alpha1 @@ -338,9 +338,9 @@ spec: value: ``` -- `replicas` on line 6 specifies the number of vLLM pods. -- `inferencePoolRefs` on line 7-8 specifies the Heimdall's InferencePool where this vLLM pod will register to. -- `templateRefs` on line 9-11 specifies the Odin Template resources; `vllm` is a runtime base, and `vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2` is a model-specific template. +- The `replicas` field specifies the number of vLLM pods. +- The `inferencePoolRefs` field specifies the Heimdall's InferencePool where this vLLM pod will register to. +- The `templateRefs` field specifies the Odin Template resources; `vllm` is a runtime base, and `vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2` is a model-specific template. After that, you can deploy the Odin InferenceService by running the following command: diff --git a/website/docs/reference/supported-devices.mdx b/website/docs/reference/supported-devices.mdx index f5ee36e..acd656d 100644 --- a/website/docs/reference/supported-devices.mdx +++ b/website/docs/reference/supported-devices.mdx @@ -1,5 +1,5 @@ --- -title: 'Supported devices' +title: Supported devices sidebar_position: 3 ---