Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CLAUDE.md
1 change: 0 additions & 1 deletion CONTRIBUTING.md

This file was deleted.

1 change: 1 addition & 0 deletions CONTRIBUTING.md
2 changes: 1 addition & 1 deletion website/.prettierrc
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@
"bracketSameLine": true,
"printWidth": 80,
"proseWrap": "preserve",
"singleQuote": true,
"singleQuote": false,
"trailingComma": "all"
}
3 changes: 3 additions & 0 deletions website/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ image:
- **Variables**:
- Format as `<variableName>` (camelCase, no quotes).
- Highlight lines containing variables in code blocks (e.g., ` ```yaml {2} `).
- **No line number references in text**:
- Do not refer to specific line numbers in the descriptive text (e.g., avoid "Replace the value on line 4").
- Instead, refer to the content or field names (e.g., "Replace the `tags` value").

### Admonitions

Expand Down
20 changes: 10 additions & 10 deletions website/docs/getting-started/prerequisites.mdx
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: 'Prerequisites'
title: Prerequisites
sidebar_position: 1
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

This document introduces the prerequisites for the MoAI Inference Framework and provides instructions on how to install them.

Expand All @@ -16,11 +16,11 @@ To follow this document, you need to understand the configuration of the Kuberne

To install the MoAI Inference Framework, you must have

* Kubernetes 1.29 or later
* At least one worker node equipped with accelerators supported by the MoAI Inference Framework (e.g., AMD GPUs)
* `cluster-admin` privilege for the Kubernetes cluster
* A StorageClass defined in the Kubernetes cluster (required for storing the monitoring metrics, model weights, etc.)
* A Docker private registry accessible from the Kubernetes cluster
- Kubernetes 1.29 or later
- At least one worker node equipped with accelerators supported by the MoAI Inference Framework (e.g., AMD GPUs)
- `cluster-admin` privilege for the Kubernetes cluster
- A StorageClass defined in the Kubernetes cluster (required for storing the monitoring metrics, model weights, etc.)
- A Docker private registry accessible from the Kubernetes cluster

---

Expand Down Expand Up @@ -124,7 +124,7 @@ kubectl create secret -n amd-gpu \
--docker-password=<password>
```

Then, create a `gpu-operator-values.yaml` file with the following content. **Please replace `<registry>` on line 7 with the URL of your private registry**. You may also change the image name `amdgpu-driver`, if necessary, according to your private registry's policies.
Then, create a `gpu-operator-values.yaml` file with the following content. **Please replace `<registry>` with the URL of your private registry**. You may also change the image name `amdgpu-driver`, if necessary, according to your private registry's policies.

```yaml title="gpu-operator-values.yaml" {7}
deviceConfig:
Expand Down Expand Up @@ -224,7 +224,7 @@ device node GUID

This section describes how to install the **rdma-shared-device-plugin**. See [k8s-rdma-shared-dev-plugin / README](https://github.com/Mellanox/k8s-rdma-shared-dev-plugin/blob/master/README.md) for more details.

First, create a `rdma-shared-device-plugin.yaml` file as follows. **You need to replace `<device>` on line 21 with your RDMA NIC's network interface name**. If multiple NICs are installed on the server, you must list all interface names (e.g., `"devices": ["ib0", "ib1"]`).
First, create a `rdma-shared-device-plugin.yaml` file as follows. **You need to replace `<device>` with your RDMA NIC's network interface name**. If multiple NICs are installed on the server, you must list all interface names (e.g., `"devices": ["ib0", "ib1"]`).

:::info
You can check the network interface names using the `ip addr` command.
Expand Down
18 changes: 9 additions & 9 deletions website/docs/getting-started/quickstart.mdx
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: 'Quickstart'
title: Quickstart
sidebar_position: 2
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

This quickstart launches two vLLM instances (pods) of the Llama 3.2 1B Instruct model and serves them through a single endpoint as an example. Please make sure to install all [prerequisites](./prerequisites.mdx), including the following versions of the components, before starting this quickstart guide.

Expand Down Expand Up @@ -164,7 +164,7 @@ spec:
gatewayClassName: istio
infrastructure:
parametersRef:
group: ''
group: ""
kind: ConfigMap
name: mif-gateway-infrastructure
listeners:
Expand Down Expand Up @@ -241,7 +241,7 @@ helm repo add moreh https://moreh-dev.github.io/helm-charts
helm repo update moreh
```

This quickstart uses a simple scheduling rule that selects the vLLM pod with fewer queued requests between the two pods. Create a `heimdall-values.yaml` file as shown below and deploy the Heimdall scheduler using this file. **Note that you need to set `gatewayClassName` on line 20 to `kgateway` if you are using Kgateway as the gateway controller**.
This quickstart uses a simple scheduling rule that selects the vLLM pod with fewer queued requests between the two pods. Create a `heimdall-values.yaml` file as shown below and deploy the Heimdall scheduler using this file. **Note that you need to set `gatewayClassName` to `kgateway` if you are using Kgateway as the gateway controller**.

```yaml title="heimdall-values.yaml"
global:
Expand Down Expand Up @@ -313,7 +313,7 @@ To enable the vLLM pods to download model parameters from Hugging Face, you must
In production environments, it is common to download the model parameters to a storage volume in advance and load them at runtime. Refer to the [Hugging Face model management with persistent volume](/best_practices/hf_model_management_with_pv) for more details.
:::

Create a `vllm-llama3-1b-instruct-tp2.yaml` file with the following contents. **Please replace `<huggingFaceToken>` on line 20 with your Hugging Face token that has accepted the model license**.
Create a `vllm-llama3-1b-instruct-tp2.yaml` file with the following contents. **Please replace `<huggingFaceToken>` with your Hugging Face token that has accepted the model license**.

```yaml title="vllm-llama3-1b-instruct-tp2.yaml" {20}
apiVersion: odin.moreh.io/v1alpha1
Expand All @@ -338,9 +338,9 @@ spec:
value: <huggingFaceToken>
```

- `replicas` on line 6 specifies the number of vLLM pods.
- `inferencePoolRefs` on line 7-8 specifies the Heimdall's InferencePool where this vLLM pod will register to.
- `templateRefs` on line 9-11 specifies the Odin Template resources; `vllm` is a runtime base, and `vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2` is a model-specific template.
- The `replicas` field specifies the number of vLLM pods.
- The `inferencePoolRefs` field specifies the Heimdall's InferencePool where this vLLM pod will register to.
- The `templateRefs` field specifies the Odin Template resources; `vllm` is a runtime base, and `vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2` is a model-specific template.

After that, you can deploy the Odin InferenceService by running the following command:

Expand Down
2 changes: 1 addition & 1 deletion website/docs/reference/supported-devices.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: 'Supported devices'
title: Supported devices
sidebar_position: 3
---

Expand Down