Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions docs/developers/dynamic-mig.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Dynamic MIG Implementation
title: NVIDIA GPU MPS and MIG dynamic slice plugin
linktitle: Dynamic MIG Implementation
---

## NVIDIA GPU MPS and MIG dynamic slice plugin

## Special Thanks

This feature will not be implemented without the help of @sailorvii.
Expand Down
2 changes: 0 additions & 2 deletions docs/developers/mindmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,4 @@
title: HAMi mind map
---

## Mind map

![HAMi VGPU mind map showing project structure and components](../resources/HAMI-VGPU-mind-map-English.png)
2 changes: 0 additions & 2 deletions docs/developers/protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
title: Protocol design
---

## Protocol Implementation

### Device Registration

<img src="https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/protocol_register.png" width="600px" alt="HAMi device registration protocol diagram showing node annotation process" />
Expand Down
6 changes: 3 additions & 3 deletions docs/developers/scheduling.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ use can set Pod annotation to change this default policy, use `hami.io/node-sche

This is a GPU cluster, having two node, the following story takes this cluster as a prerequisite.

![scheduler-policy-story.png](../resources/scheduler-policy-story.png)
![HAMi scheduler policy story diagram, showing node and GPU resource distribution](../resources/scheduler-policy-story.png)

#### Story 1

Expand Down Expand Up @@ -83,7 +83,7 @@ GPU spread, use different GPU cards when possible, egs:

### Node-scheduler-policy

![node-scheduler-policy-demo.png](../resources/node-scheduler-policy-demo.png)
![HAMi node scheduler policy diagram, showing Binpack and Spread node selection](../resources/node-scheduler-policy-demo.png)

#### Binpack

Expand Down Expand Up @@ -131,7 +131,7 @@ So, in `Spread` policy we can select `Node2`.

### GPU-scheduler-policy

![gpu-scheduler-policy-demo.png](../resources/gpu-scheduler-policy-demo.png)
![HAMi GPU scheduler policy diagram, comparing Binpack and Spread scores on each card](../resources/gpu-scheduler-policy-demo.png)

#### Binpack

Expand Down
2 changes: 1 addition & 1 deletion docs/installation/how-to-use-volcano-vgpu.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
linktitle: Volcano vGPU
title: Volcano vGPU device plugin for Kubernetes
linktitle: Use Volcano vGPU
---

:::note
Expand Down
3 changes: 3 additions & 0 deletions docs/userguide/ascend-device/device-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
title: Ascend device template
---

Ascend device templates define how a physical Ascend card is sliced into virtual instances that HAMi can schedule.
Each template describes the available memory, AI cores and optional CPU resources for a given card model.
When a Pod requests Ascend resources, HAMi selects a suitable template according to the requested memory and compute.

```yaml
vnpus:
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/configure.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Configuration
title: Global Config
linktitle: Configuration
---

## Global Config

## Device Configs: ConfigMap

:::note
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/hygon-device/specify-device-core-usage.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Allocate device core usage
title: Allocate device core to container
linktitle: Allocate device core usage
---

## Allocate device core to container

Allocate a percentage of device core resources by specify resource `hygon.com/dcucores`.
Optional, each unit of `hygon.com/dcucores` equals to 1% device cores.

Expand Down
2 changes: 0 additions & 2 deletions docs/userguide/hygon-device/specify-device-uuid-to-use.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
title: Assign to certain device
---

## Assign to certain device type

Sometimes a task may wish to run on a certain DCU, it can fill the `hygon.com/use-gpuuuid` field in pod annotation. HAMi scheduler will try to fit in device with that uuid.

For example, a task with the following annotation will be assigned to the device with uuid `DCU-123456`
Expand Down
2 changes: 0 additions & 2 deletions docs/userguide/kueue/how-to-use-kueue.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
title: How to use kueue on HAMi
---

## Using Kueue with HAMi

This guide will help you use Kueue to manage HAMi vGPU resources, including enabling Deployment support, configuring ResourceTransformation, and creating workloads that request vGPU resources.

## Prerequisites
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
title: Allocate a whole xpu card
---

## Allocate exclusive device

To allocate a whole xpu device, you need to only assign `kunlunxin.com/xpu` without other fields. You can allocate multiple XPUs for a container.

```yaml
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/monitoring/device-allocation.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Cluster device allocation
title: Cluster device allocation endpoint
linktitle: Cluster device allocation
---

## Cluster device allocation endpoint

You can get the overview of cluster device allocation and limit by visiting `{scheduler node ip}:31993/metrics`, or add it to a prometheus endpoint, as the command below:

```bash
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/monitoring/real-time-device-usage.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Real-time device usage
title: Real-time device usage endpoint
linktitle: Real-time device usage
---

## Real-time device usage endpoint

You can get the real-time device memory and core utilization by visiting `{GPU node node ip}:31992/metrics`, or add it to a prometheus endpoint, as the command below:

```bash
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/mthreads-device/specify-device-core-usage.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Allocate device core usage
title: Allocate device core to container
linktitle: Allocate device core usage
---

## Allocate device core to container

Allocate a part of device core resources by specify resource `mthreads.com/sgpu-core`.
Optional, each unit of `mthreads.com/smlu-core` equals to 1/16 device cores.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Allocate device memory
title: Allocate device memory to container
linktitle: Allocate device memory
---

Allocate a percentage size of device memory by specify resources such as `mthreads.com/sgpu-memory`.
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/nvidia-device/examples/allocate-device-core.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Allocate device core resource
title: Allocate device core to container
linktitle: Allocate device core resource
---

## Allocate device core to container

To allocate a certain part of device core resource, you need only to assign the `nvidia.com/gpucores` without other resource fields.

```yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Allocate certain device memory
title: Allocate certain device memory to container
linktitle: Allocate certain device memory
---

## Allocate certain device memory to container

To allocate a certain size of GPU device memory, you need only to assign `nvidia.com/gpumem` besides `nvidia.com/gpu`.

```yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Allocate device memory by percentage
title: Allocate a part of device memory by percentage to container
linktitle: Allocate device memory by percentage
---

## Allocate a part of device memory by percentage to container

To allocate a certain size of GPU device memory by percentage, you need only to assign `nvidia.com/gpumem-percentage` besides `nvidia.com/gpu`.

```yaml
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
title: Assign task to a certain type
---

## Overview

To assign a task to a certain GPU type, you need only to assign the `nvidia.com/use-gputype` in annotations field.

```yaml
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/nvidia-device/examples/use-exclusive-card.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Use exclusive GPU
title: Allocate device core to container
linktitle: Use exclusive GPU
---

## Allocate device core to container

To use GPU in an exclusive mode, which is the default behaviour of nvidia-k8s-device-plugin, you need only to assign the `nvidia.com/gpu` without other resource fields.

```yaml
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/nvidia-device/specify-device-core-usage.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Allocate device core usage
title: Allocate device core to container
linktitle: Allocate device core usage
---

## Allocate device core to container

Allocate a percentage of device core resources by specify resource `nvidia.com/gpucores`.
Optional, each unit of `nvidia.com/gpucores` equals to 1% device cores.

Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/nvidia-device/specify-device-memory-usage.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Allocate device memory
title: Allocate device memory to container
linktitle: Allocate device memory
---

## Allocate device memory to container

Allocate a certain size of device memory by specify resources such as `nvidia.com/gpumem`.
Optional, Each unit of `nvidia.com/gpumem` equals to 1M.

Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/nvidia-device/specify-device-uuid-to-use.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Assign to certain device
title: Assign to certain device type
linktitle: Assign to certain device
---

## Assign to certain device type

Sometimes a task may wish to run on a certain GPU, it can fill the `nvidia.com/use-gpuuuid` field in pod annotation. HAMi scheduler will try to fit in device with that uuid.

For example, a task with the following annotation will be assigned to the device with uuid `GPU-123456`
Expand Down
1 change: 1 addition & 0 deletions docs/userguide/nvidia-device/using-resourcequota.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Using Extended ResourceQuota for NVIDIA Devices
linktitle: ResourceQuota
translated: true
---

Expand Down
4 changes: 4 additions & 0 deletions docs/userguide/vastai/examples/default-use.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
title: Allocate Vastai Device
---

This example shows how to request a single Vastai device in a plain Kubernetes Pod.
The Pod simply runs a long‑running container image provided by Vastaitech and asks for one `vastaitech.com/va` device through the `resources.limits` section.
You can use this as a starting point and adjust the image and resource limits to fit your own workloads.

```yaml
apiVersion: v1
kind: Pod
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
title: Default vGPU Job
---

## Job description

vGPU can be requested by both set "volcano.sh/vgpu-number", "volcano.sh/vgpu-cores" and "volcano.sh/vgpu-memory" in resources.limits

```yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
---
title: Exclusive gpu usage
title: Exclusive GPU usage
---

## Job description

To allocate an exclusive GPU, you need only assign `volcano.sh/vgpu-number` without any other `volcano.sh/xxx` fields, as the example below:

```yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: Use Volcano vGPU
title: Volcano vGPU device plugin for Kubernetes
linktitle: Use Volcano vGPU
---

## Volcano vGPU device plugin for Kubernetes

:::note

You *DON'T* need to install HAMi when using volcano-vgpu, only use
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,4 @@ title: HAMi 路线图
translated: true
---

## 思维导图

![HAMi VGPU 思维导图,显示项目结构和组件](https://github.com/Project-HAMi/HAMi/blob/master/docs/mind-map/HAMI-VGPU-mind-map-Chinese.png?raw=true)
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ title: 协议设计
translated: true
---

## 协议实现

### 设备注册

为了进行更准确的调度,HAMi 调度器需要在设备注册时感知设备的规格,包括 UUID、显存、计算能力、型号、numa 数量等。
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ translated: true

这是一个 GPU 集群,拥有两个节点,以下故事以此集群为前提。

![scheduler-policy-story.png](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/scheduler-policy-story.png)
![HAMi 调度策略故事示意图,展示节点与 GPU 资源分布](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/scheduler-policy-story.png)

#### 故事 1

Expand Down Expand Up @@ -82,7 +82,7 @@ GPU spread,尽可能使用不同的 GPU 卡,例如:

### Node-scheduler-policy

![node-scheduler-policy-demo.png](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/node-scheduler-policy-demo.png)
![HAMi 节点调度策略示意图,展示 Binpack 与 Spread 节点选择流程](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/node-scheduler-policy-demo.png)

#### Binpack

Expand Down Expand Up @@ -128,7 +128,7 @@ Node2 score: ((1+2)/4) * 10= 7.5

### GPU-scheduler-policy

![gpu-scheduler-policy-demo.png](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/gpu-scheduler-policy-demo.png)
![HAMi GPU 调度策略示意图,展示在单卡上的 Binpack 与 Spread 评分对比](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/gpu-scheduler-policy-demo.png)

#### Binpack

Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
---
title: HAMi DRA
title: Kubernetes 的 HAMi DRA
linktitle: HAMi DRA
translated: true
---

## Kubernetes 的 HAMi DRA

## 介绍

HAMi 已经提供了对 K8s [DRA](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)(动态资源分配)功能的支持。
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Volcano Ascend vNPU 使用指南
linktitle: Volcano Ascend vNPU
translated: true
---

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
---
title: Volcano vGPU 使用指南
linktitle: Volcano vGPU
translated: true
---

## Kubernetes 的 Volcano vgpu 设备插件
:::note

**注意**:

Expand All @@ -14,6 +15,8 @@ translated: true

Volcano vgpu 仅在 volcano > 1.9 中可用

:::

## 快速开始

### 配置调度器
Expand Down
Loading
Loading