moreh-dev · hhk7734 · Feb 14, 2026 · Feb 14, 2026
@@ -0,0 +1 @@
+AGENTS.md
@@ -0,0 +1 @@
+AGENTS.md
@@ -4,6 +4,6 @@
   "bracketSameLine": true,
   "printWidth": 80,
   "proseWrap": "preserve",
-  "singleQuote": true,
+  "singleQuote": false,
   "trailingComma": "all"
 }
@@ -59,6 +59,9 @@ image:
 - **Variables**:
   - Format as `<variableName>` (camelCase, no quotes).
   - Highlight lines containing variables in code blocks (e.g., ` ```yaml {2} `).
+- **No line number references in text**:
+  - Do not refer to specific line numbers in the descriptive text (e.g., avoid "Replace the value on line 4").
+  - Instead, refer to the content or field names (e.g., "Replace the `tags` value").
 
 ### Admonitions
 

@@ -1,10 +1,10 @@
 ---
-title: 'Prerequisites'
+title: Prerequisites
 sidebar_position: 1
 ---
 
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
 
 This document introduces the prerequisites for the MoAI Inference Framework and provides instructions on how to install them.
 
@@ -16,11 +16,11 @@ To follow this document, you need to understand the configuration of the Kuberne
 
 To install the MoAI Inference Framework, you must have
 
-* Kubernetes 1.29 or later
-* At least one worker node equipped with accelerators supported by the MoAI Inference Framework (e.g., AMD GPUs)
-* `cluster-admin` privilege for the Kubernetes cluster
-* A StorageClass defined in the Kubernetes cluster (required for storing the monitoring metrics, model weights, etc.)
-* A Docker private registry accessible from the Kubernetes cluster
+- Kubernetes 1.29 or later
+- At least one worker node equipped with accelerators supported by the MoAI Inference Framework (e.g., AMD GPUs)
+- `cluster-admin` privilege for the Kubernetes cluster
+- A StorageClass defined in the Kubernetes cluster (required for storing the monitoring metrics, model weights, etc.)
+- A Docker private registry accessible from the Kubernetes cluster
 
 ---
 
@@ -124,7 +124,7 @@ kubectl create secret -n amd-gpu \
     --docker-password=<password>
 ```
 
-Then, create a `gpu-operator-values.yaml` file with the following content. **Please replace `<registry>` on line 7 with the URL of your private registry**. You may also change the image name `amdgpu-driver`, if necessary, according to your private registry's policies.
+Then, create a `gpu-operator-values.yaml` file with the following content. **Please replace `<registry>` with the URL of your private registry**. You may also change the image name `amdgpu-driver`, if necessary, according to your private registry's policies.
 
 ```yaml title="gpu-operator-values.yaml" {7}
 deviceConfig:
@@ -224,7 +224,7 @@ device           node GUID
 
 This section describes how to install the **rdma-shared-device-plugin**. See [k8s-rdma-shared-dev-plugin / README](https://github.com/Mellanox/k8s-rdma-shared-dev-plugin/blob/master/README.md) for more details.
 
-First, create a `rdma-shared-device-plugin.yaml` file as follows. **You need to replace `<device>` on line 21 with your RDMA NIC's network interface name**. If multiple NICs are installed on the server, you must list all interface names (e.g., `"devices": ["ib0", "ib1"]`).
+First, create a `rdma-shared-device-plugin.yaml` file as follows. **You need to replace `<device>` with your RDMA NIC's network interface name**. If multiple NICs are installed on the server, you must list all interface names (e.g., `"devices": ["ib0", "ib1"]`).
 
 :::info
 You can check the network interface names using the `ip addr` command.

@@ -1,10 +1,10 @@
 ---
-title: 'Quickstart'
+title: Quickstart
 sidebar_position: 2
 ---
 
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
 
 This quickstart launches two vLLM instances (pods) of the Llama 3.2 1B Instruct model and serves them through a single endpoint as an example. Please make sure to install all [prerequisites](./prerequisites.mdx), including the following versions of the components, before starting this quickstart guide.
 
@@ -164,7 +164,7 @@ spec:
   gatewayClassName: istio
   infrastructure:
     parametersRef:
-      group: ''
+      group: ""
       kind: ConfigMap
       name: mif-gateway-infrastructure
   listeners:
@@ -241,7 +241,7 @@ helm repo add moreh https://moreh-dev.github.io/helm-charts
 helm repo update moreh
 ```
 
-This quickstart uses a simple scheduling rule that selects the vLLM pod with fewer queued requests between the two pods. Create a `heimdall-values.yaml` file as shown below and deploy the Heimdall scheduler using this file. **Note that you need to set `gatewayClassName` on line 20 to `kgateway` if you are using Kgateway as the gateway controller**.
+This quickstart uses a simple scheduling rule that selects the vLLM pod with fewer queued requests between the two pods. Create a `heimdall-values.yaml` file as shown below and deploy the Heimdall scheduler using this file. **Note that you need to set `gatewayClassName` to `kgateway` if you are using Kgateway as the gateway controller**.
 
 ```yaml title="heimdall-values.yaml"
 global:
@@ -313,7 +313,7 @@ To enable the vLLM pods to download model parameters from Hugging Face, you must
 In production environments, it is common to download the model parameters to a storage volume in advance and load them at runtime. Refer to the [Hugging Face model management with persistent volume](/best_practices/hf_model_management_with_pv) for more details.
 :::
 
-Create a `vllm-llama3-1b-instruct-tp2.yaml` file with the following contents. **Please replace `<huggingFaceToken>` on line 20 with your Hugging Face token that has accepted the model license**.
+Create a `vllm-llama3-1b-instruct-tp2.yaml` file with the following contents. **Please replace `<huggingFaceToken>` with your Hugging Face token that has accepted the model license**.
 
 ```yaml title="vllm-llama3-1b-instruct-tp2.yaml" {20}
 apiVersion: odin.moreh.io/v1alpha1
@@ -338,9 +338,9 @@ spec:
               value: <huggingFaceToken>
 ```
 
-- `replicas` on line 6 specifies the number of vLLM pods.
-- `inferencePoolRefs` on line 7-8 specifies the Heimdall's InferencePool where this vLLM pod will register to.
-- `templateRefs` on line 9-11 specifies the Odin Template resources; `vllm` is a runtime base, and `vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2` is a model-specific template.
+- The `replicas` field specifies the number of vLLM pods.
+- The `inferencePoolRefs` field specifies the Heimdall's InferencePool where this vLLM pod will register to.
+- The `templateRefs` field specifies the Odin Template resources; `vllm` is a runtime base, and `vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2` is a model-specific template.
 
 After that, you can deploy the Odin InferenceService by running the following command:
 

@@ -1,5 +1,5 @@
 ---
-title: 'Supported devices'
+title: Supported devices
 sidebar_position: 3
 ---
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		AGENTS.md
hhk7734 marked this conversation as resolved. Show resolved Hide resolved