diff --git a/docs/developers/dynamic-mig.md b/docs/developers/dynamic-mig.md index b58571a1..ebe587f2 100644 --- a/docs/developers/dynamic-mig.md +++ b/docs/developers/dynamic-mig.md @@ -1,9 +1,8 @@ --- -title: Dynamic MIG Implementation +title: NVIDIA GPU MPS and MIG dynamic slice plugin +linktitle: Dynamic MIG Implementation --- -## NVIDIA GPU MPS and MIG dynamic slice plugin - ## Special Thanks This feature will not be implemented without the help of @sailorvii. diff --git a/docs/developers/mindmap.md b/docs/developers/mindmap.md index 54c2b90c..9d27905e 100644 --- a/docs/developers/mindmap.md +++ b/docs/developers/mindmap.md @@ -2,6 +2,4 @@ title: HAMi mind map --- -## Mind map - ![HAMi VGPU mind map showing project structure and components](../resources/HAMI-VGPU-mind-map-English.png) \ No newline at end of file diff --git a/docs/developers/protocol.md b/docs/developers/protocol.md index e33281ee..514e59ae 100644 --- a/docs/developers/protocol.md +++ b/docs/developers/protocol.md @@ -2,8 +2,6 @@ title: Protocol design --- -## Protocol Implementation - ### Device Registration HAMi device registration protocol diagram showing node annotation process diff --git a/docs/developers/scheduling.md b/docs/developers/scheduling.md index aebd30d4..dfc70442 100644 --- a/docs/developers/scheduling.md +++ b/docs/developers/scheduling.md @@ -15,7 +15,7 @@ use can set Pod annotation to change this default policy, use `hami.io/node-sche This is a GPU cluster, having two node, the following story takes this cluster as a prerequisite. -![scheduler-policy-story.png](../resources/scheduler-policy-story.png) +![HAMi scheduler policy story diagram, showing node and GPU resource distribution](../resources/scheduler-policy-story.png) #### Story 1 @@ -83,7 +83,7 @@ GPU spread, use different GPU cards when possible, egs: ### Node-scheduler-policy -![node-scheduler-policy-demo.png](../resources/node-scheduler-policy-demo.png) +![HAMi node scheduler policy diagram, showing Binpack and Spread node selection](../resources/node-scheduler-policy-demo.png) #### Binpack @@ -131,7 +131,7 @@ So, in `Spread` policy we can select `Node2`. ### GPU-scheduler-policy -![gpu-scheduler-policy-demo.png](../resources/gpu-scheduler-policy-demo.png) +![HAMi GPU scheduler policy diagram, comparing Binpack and Spread scores on each card](../resources/gpu-scheduler-policy-demo.png) #### Binpack diff --git a/docs/installation/how-to-use-volcano-vgpu.md b/docs/installation/how-to-use-volcano-vgpu.md index 36879d6f..80167c76 100644 --- a/docs/installation/how-to-use-volcano-vgpu.md +++ b/docs/installation/how-to-use-volcano-vgpu.md @@ -1,6 +1,6 @@ --- -linktitle: Volcano vGPU title: Volcano vGPU device plugin for Kubernetes +linktitle: Use Volcano vGPU --- :::note diff --git a/docs/userguide/ascend-device/device-template.md b/docs/userguide/ascend-device/device-template.md index 2a3f39f2..14022d30 100644 --- a/docs/userguide/ascend-device/device-template.md +++ b/docs/userguide/ascend-device/device-template.md @@ -2,6 +2,9 @@ title: Ascend device template --- +Ascend device templates define how a physical Ascend card is sliced into virtual instances that HAMi can schedule. +Each template describes the available memory, AI cores and optional CPU resources for a given card model. +When a Pod requests Ascend resources, HAMi selects a suitable template according to the requested memory and compute. ```yaml vnpus: diff --git a/docs/userguide/configure.md b/docs/userguide/configure.md index 8b674129..08ca757a 100644 --- a/docs/userguide/configure.md +++ b/docs/userguide/configure.md @@ -1,9 +1,8 @@ --- -title: Configuration +title: Global Config +linktitle: Configuration --- -## Global Config - ## Device Configs: ConfigMap :::note diff --git a/docs/userguide/hygon-device/specify-device-core-usage.md b/docs/userguide/hygon-device/specify-device-core-usage.md index 8cb91247..7e03433a 100644 --- a/docs/userguide/hygon-device/specify-device-core-usage.md +++ b/docs/userguide/hygon-device/specify-device-core-usage.md @@ -1,9 +1,8 @@ --- -title: Allocate device core usage +title: Allocate device core to container +linktitle: Allocate device core usage --- -## Allocate device core to container - Allocate a percentage of device core resources by specify resource `hygon.com/dcucores`. Optional, each unit of `hygon.com/dcucores` equals to 1% device cores. diff --git a/docs/userguide/hygon-device/specify-device-uuid-to-use.md b/docs/userguide/hygon-device/specify-device-uuid-to-use.md index 2b7fa485..1ed2fee7 100644 --- a/docs/userguide/hygon-device/specify-device-uuid-to-use.md +++ b/docs/userguide/hygon-device/specify-device-uuid-to-use.md @@ -2,8 +2,6 @@ title: Assign to certain device --- -## Assign to certain device type - Sometimes a task may wish to run on a certain DCU, it can fill the `hygon.com/use-gpuuuid` field in pod annotation. HAMi scheduler will try to fit in device with that uuid. For example, a task with the following annotation will be assigned to the device with uuid `DCU-123456` diff --git a/docs/userguide/kueue/how-to-use-kueue.md b/docs/userguide/kueue/how-to-use-kueue.md index b67a3a97..065571fd 100644 --- a/docs/userguide/kueue/how-to-use-kueue.md +++ b/docs/userguide/kueue/how-to-use-kueue.md @@ -2,8 +2,6 @@ title: How to use kueue on HAMi --- -## Using Kueue with HAMi - This guide will help you use Kueue to manage HAMi vGPU resources, including enabling Deployment support, configuring ResourceTransformation, and creating workloads that request vGPU resources. ## Prerequisites diff --git a/docs/userguide/kunlunxin-device/examples/allocate-whole-xpu.md b/docs/userguide/kunlunxin-device/examples/allocate-whole-xpu.md index f234b79c..a0d004cb 100644 --- a/docs/userguide/kunlunxin-device/examples/allocate-whole-xpu.md +++ b/docs/userguide/kunlunxin-device/examples/allocate-whole-xpu.md @@ -2,8 +2,6 @@ title: Allocate a whole xpu card --- -## Allocate exclusive device - To allocate a whole xpu device, you need to only assign `kunlunxin.com/xpu` without other fields. You can allocate multiple XPUs for a container. ```yaml diff --git a/docs/userguide/monitoring/device-allocation.md b/docs/userguide/monitoring/device-allocation.md index 7d31ec36..0304cf1a 100644 --- a/docs/userguide/monitoring/device-allocation.md +++ b/docs/userguide/monitoring/device-allocation.md @@ -1,9 +1,8 @@ --- -title: Cluster device allocation +title: Cluster device allocation endpoint +linktitle: Cluster device allocation --- -## Cluster device allocation endpoint - You can get the overview of cluster device allocation and limit by visiting `{scheduler node ip}:31993/metrics`, or add it to a prometheus endpoint, as the command below: ```bash diff --git a/docs/userguide/monitoring/real-time-device-usage.md b/docs/userguide/monitoring/real-time-device-usage.md index 338e09e9..45fb67ef 100644 --- a/docs/userguide/monitoring/real-time-device-usage.md +++ b/docs/userguide/monitoring/real-time-device-usage.md @@ -1,9 +1,8 @@ --- -title: Real-time device usage +title: Real-time device usage endpoint +linktitle: Real-time device usage --- -## Real-time device usage endpoint - You can get the real-time device memory and core utilization by visiting `{GPU node node ip}:31992/metrics`, or add it to a prometheus endpoint, as the command below: ```bash diff --git a/docs/userguide/mthreads-device/specify-device-core-usage.md b/docs/userguide/mthreads-device/specify-device-core-usage.md index 5c3312f9..a982cfe6 100644 --- a/docs/userguide/mthreads-device/specify-device-core-usage.md +++ b/docs/userguide/mthreads-device/specify-device-core-usage.md @@ -1,9 +1,8 @@ --- -title: Allocate device core usage +title: Allocate device core to container +linktitle: Allocate device core usage --- -## Allocate device core to container - Allocate a part of device core resources by specify resource `mthreads.com/sgpu-core`. Optional, each unit of `mthreads.com/smlu-core` equals to 1/16 device cores. diff --git a/docs/userguide/mthreads-device/specify-device-memory-usage.md b/docs/userguide/mthreads-device/specify-device-memory-usage.md index 6a696da1..fe2629b1 100644 --- a/docs/userguide/mthreads-device/specify-device-memory-usage.md +++ b/docs/userguide/mthreads-device/specify-device-memory-usage.md @@ -1,5 +1,6 @@ --- -title: Allocate device memory +title: Allocate device memory to container +linktitle: Allocate device memory --- Allocate a percentage size of device memory by specify resources such as `mthreads.com/sgpu-memory`. diff --git a/docs/userguide/nvidia-device/examples/allocate-device-core.md b/docs/userguide/nvidia-device/examples/allocate-device-core.md index 54e7338b..442f0b3f 100644 --- a/docs/userguide/nvidia-device/examples/allocate-device-core.md +++ b/docs/userguide/nvidia-device/examples/allocate-device-core.md @@ -1,9 +1,8 @@ --- -title: Allocate device core resource +title: Allocate device core to container +linktitle: Allocate device core resource --- -## Allocate device core to container - To allocate a certain part of device core resource, you need only to assign the `nvidia.com/gpucores` without other resource fields. ```yaml diff --git a/docs/userguide/nvidia-device/examples/allocate-device-memory.md b/docs/userguide/nvidia-device/examples/allocate-device-memory.md index 30982d88..37a30eb4 100644 --- a/docs/userguide/nvidia-device/examples/allocate-device-memory.md +++ b/docs/userguide/nvidia-device/examples/allocate-device-memory.md @@ -1,9 +1,8 @@ --- -title: Allocate certain device memory +title: Allocate certain device memory to container +linktitle: Allocate certain device memory --- -## Allocate certain device memory to container - To allocate a certain size of GPU device memory, you need only to assign `nvidia.com/gpumem` besides `nvidia.com/gpu`. ```yaml diff --git a/docs/userguide/nvidia-device/examples/allocate-device-memory2.md b/docs/userguide/nvidia-device/examples/allocate-device-memory2.md index e874b7a7..5a318774 100644 --- a/docs/userguide/nvidia-device/examples/allocate-device-memory2.md +++ b/docs/userguide/nvidia-device/examples/allocate-device-memory2.md @@ -1,9 +1,8 @@ --- -title: Allocate device memory by percentage +title: Allocate a part of device memory by percentage to container +linktitle: Allocate device memory by percentage --- -## Allocate a part of device memory by percentage to container - To allocate a certain size of GPU device memory by percentage, you need only to assign `nvidia.com/gpumem-percentage` besides `nvidia.com/gpu`. ```yaml diff --git a/docs/userguide/nvidia-device/examples/specify-card-type-to-use.md b/docs/userguide/nvidia-device/examples/specify-card-type-to-use.md index ccd14669..a2f0072a 100644 --- a/docs/userguide/nvidia-device/examples/specify-card-type-to-use.md +++ b/docs/userguide/nvidia-device/examples/specify-card-type-to-use.md @@ -2,8 +2,6 @@ title: Assign task to a certain type --- -## Overview - To assign a task to a certain GPU type, you need only to assign the `nvidia.com/use-gputype` in annotations field. ```yaml diff --git a/docs/userguide/nvidia-device/examples/use-exclusive-card.md b/docs/userguide/nvidia-device/examples/use-exclusive-card.md index 86bd89d2..255b26e3 100644 --- a/docs/userguide/nvidia-device/examples/use-exclusive-card.md +++ b/docs/userguide/nvidia-device/examples/use-exclusive-card.md @@ -1,9 +1,8 @@ --- -title: Use exclusive GPU +title: Allocate device core to container +linktitle: Use exclusive GPU --- -## Allocate device core to container - To use GPU in an exclusive mode, which is the default behaviour of nvidia-k8s-device-plugin, you need only to assign the `nvidia.com/gpu` without other resource fields. ```yaml diff --git a/docs/userguide/nvidia-device/specify-device-core-usage.md b/docs/userguide/nvidia-device/specify-device-core-usage.md index 1642dffc..4ce9e11e 100644 --- a/docs/userguide/nvidia-device/specify-device-core-usage.md +++ b/docs/userguide/nvidia-device/specify-device-core-usage.md @@ -1,9 +1,8 @@ --- -title: Allocate device core usage +title: Allocate device core to container +linktitle: Allocate device core usage --- -## Allocate device core to container - Allocate a percentage of device core resources by specify resource `nvidia.com/gpucores`. Optional, each unit of `nvidia.com/gpucores` equals to 1% device cores. diff --git a/docs/userguide/nvidia-device/specify-device-memory-usage.md b/docs/userguide/nvidia-device/specify-device-memory-usage.md index cea6eed3..5c272f4e 100644 --- a/docs/userguide/nvidia-device/specify-device-memory-usage.md +++ b/docs/userguide/nvidia-device/specify-device-memory-usage.md @@ -1,9 +1,8 @@ --- -title: Allocate device memory +title: Allocate device memory to container +linktitle: Allocate device memory --- -## Allocate device memory to container - Allocate a certain size of device memory by specify resources such as `nvidia.com/gpumem`. Optional, Each unit of `nvidia.com/gpumem` equals to 1M. diff --git a/docs/userguide/nvidia-device/specify-device-uuid-to-use.md b/docs/userguide/nvidia-device/specify-device-uuid-to-use.md index 67a28ee9..aec20a82 100644 --- a/docs/userguide/nvidia-device/specify-device-uuid-to-use.md +++ b/docs/userguide/nvidia-device/specify-device-uuid-to-use.md @@ -1,9 +1,8 @@ --- -title: Assign to certain device +title: Assign to certain device type +linktitle: Assign to certain device --- -## Assign to certain device type - Sometimes a task may wish to run on a certain GPU, it can fill the `nvidia.com/use-gpuuuid` field in pod annotation. HAMi scheduler will try to fit in device with that uuid. For example, a task with the following annotation will be assigned to the device with uuid `GPU-123456` diff --git a/docs/userguide/nvidia-device/using-resourcequota.md b/docs/userguide/nvidia-device/using-resourcequota.md index f4ef4e76..5c1a5dd8 100644 --- a/docs/userguide/nvidia-device/using-resourcequota.md +++ b/docs/userguide/nvidia-device/using-resourcequota.md @@ -1,5 +1,6 @@ --- title: Using Extended ResourceQuota for NVIDIA Devices +linktitle: ResourceQuota translated: true --- diff --git a/docs/userguide/vastai/examples/default-use.md b/docs/userguide/vastai/examples/default-use.md index b8aebcae..25e6f41f 100644 --- a/docs/userguide/vastai/examples/default-use.md +++ b/docs/userguide/vastai/examples/default-use.md @@ -2,6 +2,10 @@ title: Allocate Vastai Device --- +This example shows how to request a single Vastai device in a plain Kubernetes Pod. +The Pod simply runs a long‑running container image provided by Vastaitech and asks for one `vastaitech.com/va` device through the `resources.limits` section. +You can use this as a starting point and adjust the image and resource limits to fit your own workloads. + ```yaml apiVersion: v1 kind: Pod diff --git a/docs/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md b/docs/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md index cd585796..db3416b4 100644 --- a/docs/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md +++ b/docs/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md @@ -2,8 +2,6 @@ title: Default vGPU Job --- -## Job description - vGPU can be requested by both set "volcano.sh/vgpu-number", "volcano.sh/vgpu-cores" and "volcano.sh/vgpu-memory" in resources.limits ```yaml diff --git a/docs/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md b/docs/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md index 21feb72c..f47185f4 100644 --- a/docs/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md +++ b/docs/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md @@ -1,9 +1,7 @@ --- -title: Exclusive gpu usage +title: Exclusive GPU usage --- -## Job description - To allocate an exclusive GPU, you need only assign `volcano.sh/vgpu-number` without any other `volcano.sh/xxx` fields, as the example below: ```yaml diff --git a/docs/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md b/docs/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md index c7d8fd84..dc9e6597 100644 --- a/docs/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md +++ b/docs/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md @@ -1,9 +1,8 @@ --- -title: Use Volcano vGPU +title: Volcano vGPU device plugin for Kubernetes +linktitle: Use Volcano vGPU --- -## Volcano vGPU device plugin for Kubernetes - :::note You *DON'T* need to install HAMi when using volcano-vgpu, only use diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/developers/mindmap.md b/i18n/zh/docusaurus-plugin-content-docs/current/developers/mindmap.md index fbe20073..b177ed83 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/developers/mindmap.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/developers/mindmap.md @@ -3,6 +3,4 @@ title: HAMi 路线图 translated: true --- -## 思维导图 - ![HAMi VGPU 思维导图,显示项目结构和组件](https://github.com/Project-HAMi/HAMi/blob/master/docs/mind-map/HAMI-VGPU-mind-map-Chinese.png?raw=true) \ No newline at end of file diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/developers/protocol.md b/i18n/zh/docusaurus-plugin-content-docs/current/developers/protocol.md index b35cd2f6..9d1af29b 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/developers/protocol.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/developers/protocol.md @@ -3,8 +3,6 @@ title: 协议设计 translated: true --- -## 协议实现 - ### 设备注册 为了进行更准确的调度,HAMi 调度器需要在设备注册时感知设备的规格,包括 UUID、显存、计算能力、型号、numa 数量等。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/developers/scheduling.md b/i18n/zh/docusaurus-plugin-content-docs/current/developers/scheduling.md index a3dcb751..f57b5699 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/developers/scheduling.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/developers/scheduling.md @@ -15,7 +15,7 @@ translated: true 这是一个 GPU 集群,拥有两个节点,以下故事以此集群为前提。 -![scheduler-policy-story.png](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/scheduler-policy-story.png) +![HAMi 调度策略故事示意图,展示节点与 GPU 资源分布](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/scheduler-policy-story.png) #### 故事 1 @@ -82,7 +82,7 @@ GPU spread,尽可能使用不同的 GPU 卡,例如: ### Node-scheduler-policy -![node-scheduler-policy-demo.png](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/node-scheduler-policy-demo.png) +![HAMi 节点调度策略示意图,展示 Binpack 与 Spread 节点选择流程](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/node-scheduler-policy-demo.png) #### Binpack @@ -128,7 +128,7 @@ Node2 score: ((1+2)/4) * 10= 7.5 ### GPU-scheduler-policy -![gpu-scheduler-policy-demo.png](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/gpu-scheduler-policy-demo.png) +![HAMi GPU 调度策略示意图,展示在单卡上的 Binpack 与 Spread 评分对比](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/gpu-scheduler-policy-demo.png) #### Binpack diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-hami-dra.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-hami-dra.md index 6835d0c5..9da3d16d 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-hami-dra.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-hami-dra.md @@ -1,10 +1,9 @@ --- -title: HAMi DRA +title: Kubernetes 的 HAMi DRA +linktitle: HAMi DRA translated: true --- -## Kubernetes 的 HAMi DRA - ## 介绍 HAMi 已经提供了对 K8s [DRA](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)(动态资源分配)功能的支持。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-ascend.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-ascend.md index b113d0f8..7fec497c 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-ascend.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-ascend.md @@ -1,5 +1,6 @@ --- title: Volcano Ascend vNPU 使用指南 +linktitle: Volcano Ascend vNPU translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-vgpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-vgpu.md index e1202bed..03a0a883 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-vgpu.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-vgpu.md @@ -1,9 +1,10 @@ --- title: Volcano vGPU 使用指南 +linktitle: Volcano vGPU translated: true --- -## Kubernetes 的 Volcano vgpu 设备插件 +:::note **注意**: @@ -14,6 +15,8 @@ translated: true Volcano vgpu 仅在 volcano > 1.9 中可用 +::: + ## 快速开始 ### 配置调度器 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/prerequisites.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/prerequisites.md index 67cdc13e..34ed2f87 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/prerequisites.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/prerequisites.md @@ -3,7 +3,7 @@ title: 前提条件 translated: true --- -## 先决条件 +在安装HAMi之前,请确保您的环境中已正确安装以下工具和依赖项: - NVIDIA 驱动版本 >= 440 - nvidia-docker 版本 > 2.0 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/ascend-device/device-template.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/ascend-device/device-template.md index a0d3792d..98d462b8 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/ascend-device/device-template.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/ascend-device/device-template.md @@ -3,6 +3,10 @@ title: Ascend 设备模板 translated: true --- +Ascend 设备模板用于定义一块物理 Ascend 卡如何被切分成多个可租用的虚拟实例,供 HAMi 调度使用。 +每个模板描述了对应卡型的可用显存、AI Core 以及可选的 CPU 资源。 +当 Pod 申请 Ascend 相关资源时,HAMi 会根据请求的显存和算力,从这些模板中选择最合适的一种进行分配。 + ```yaml vnpus: - chipName: 910B diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/ascend-device/enable-ascend-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/ascend-device/enable-ascend-sharing.md index ed124de7..2a5f9f59 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/ascend-device/enable-ascend-sharing.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/ascend-device/enable-ascend-sharing.md @@ -4,8 +4,6 @@ linktitle: Ascend 共享 translated: true --- -## 启用 Ascend 共享 - 基于虚拟化模板支持显存切片,自动使用可用的租赁模板。有关详细信息,请查看[设备模板](./device-template.md)。 ## 先决条件 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/awsneuron-device/examples/allocate-neuron-core.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/awsneuron-device/examples/allocate-neuron-core.md index 0ea745ce..acd2d506 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/awsneuron-device/examples/allocate-neuron-core.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/awsneuron-device/examples/allocate-neuron-core.md @@ -1,5 +1,5 @@ --- -title: 分配AWS Neuron核心资源 +title: 分配 AWS Neuron 核心资源 --- 如需分配1/2个neuron设备,您可以通过分配neuroncore来实现,如下例所示: diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/awsneuron-device/examples/allocate-neuron-device.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/awsneuron-device/examples/allocate-neuron-device.md index 4af17cad..2793e7d4 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/awsneuron-device/examples/allocate-neuron-device.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/awsneuron-device/examples/allocate-neuron-device.md @@ -1,5 +1,5 @@ --- -title: 分配AWS Neuron核心 +title: 分配 AWS Neuron 核心 --- 如需独占分配一个或多个aws neuron设备,可通过`aws.amazon.com/neuron`进行资源分配: diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/examples/allocate-core-and-memory.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/examples/allocate-core-and-memory.md index 1de75829..11163b13 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/examples/allocate-core-and-memory.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/examples/allocate-core-and-memory.md @@ -4,8 +4,6 @@ linktitle: 分配核心和显存 translated: true --- -## 为容器分配设备核心和显存资源 - 要分配设备核心资源的某一部分,您只需在容器中使用 `cambricon.com/vmlu` 指定所需的寒武纪 MLU 数量,并分配 `cambricon.com/mlu370.smlu.vmemory` 和 `cambricon.com/mlu370.smlu.vcore`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/examples/allocate-exclusive.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/examples/allocate-exclusive.md index 8f2ad239..f1b4ce64 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/examples/allocate-exclusive.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/examples/allocate-exclusive.md @@ -4,8 +4,6 @@ linktitle: 独占设备 translated: true --- -## 分配独占设备 - 要分配整个寒武纪设备,您只需分配 `cambricon.com/vmlu`,无需其他字段。 ``` diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/specify-device-memory-usage.md index c92bc4b5..c525c7e0 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/specify-device-memory-usage.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/specify-device-memory-usage.md @@ -4,8 +4,6 @@ linktitle: 指定显存 translated: true --- -## 为容器分配设备显存 - 通过指定资源如 `cambricon.com/mlu.smlu.vmemory` 来分配设备显存的百分比大小。可选项,每个 `cambricon.com/mlu.smlu.vmemory` 单位等于设备显存的 1%。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/specify-device-type-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/specify-device-type-to-use.md index d6f61651..e1929a09 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/specify-device-type-to-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/cambricon-device/specify-device-type-to-use.md @@ -4,8 +4,6 @@ linktitle: 指定设备类型 translated: true --- -## 分配到特定设备类型 - 您需要在 `cambricon-device-plugin` 中添加参数 `- --enable-device-type` 以支持设备类型规范。当设置此选项时,不同类型的 MLU 将生成不同的资源名称,例如 `cambricon.com/mlu370.smlu.vcore` 和 `cambricon.com/mlu370.smlu.vmemory`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/configure.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/configure.md index ba2e147c..ee09096c 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/configure.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/configure.md @@ -1,5 +1,6 @@ --- title: 全局配置 +linktitle: 配置 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/allocate-core-and-memory.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/allocate-core-and-memory.md index e3ea48b5..b9dbed18 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/allocate-core-and-memory.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/allocate-core-and-memory.md @@ -4,8 +4,6 @@ linktitle: 分配核心和显存 translated: true --- -## 为容器分配设备核心和显存资源 - 要分配设备核心资源的某一部分,您只需在容器中使用 `hygon.com/dcunum` 请求的海光 DCU 数量,并分配 `hygon.com/dcucores` 和 `hygon.com/dcumem`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/allocate-exclusive.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/allocate-exclusive.md index dc526e6c..2f1cc689 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/allocate-exclusive.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/allocate-exclusive.md @@ -4,8 +4,6 @@ linktitle: 独占设备 translated: true --- -## 分配独占设备 - 要分配整个海光 DCU 设备,您只需分配 `hygon.com/dcunum`,无需其他字段。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/specify-certain-cards.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/specify-certain-cards.md index 53d825a8..4e81504a 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/specify-certain-cards.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/examples/specify-certain-cards.md @@ -4,8 +4,6 @@ linktitle: 指定 DCU translated: true --- -## 将任务分配给特定的 DCU - 要将任务分配给特定的 DCU,只需在注释字段中分配 `hygon.com/use-gpuuuid` ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/specify-device-memory-usage.md index e56642ad..a819449c 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/specify-device-memory-usage.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/specify-device-memory-usage.md @@ -4,8 +4,6 @@ linktitle: 指定显存 translated: true --- -## 为容器分配设备显存 - 通过指定诸如 `hygon.com/dcumem` 之类的资源来分配设备显存的百分比大小。可选项,每个 `hygon.com/dcumem` 单位等于 1M 设备显存。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/specify-device-uuid-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/specify-device-uuid-to-use.md index bd521c69..2e9dc4a5 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/specify-device-uuid-to-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/hygon-device/specify-device-uuid-to-use.md @@ -4,8 +4,6 @@ linktitle: 指定设备 translated: true --- -## 分配到特定设备 - 有时任务可能希望在某个特定的DCU上运行,可以在pod注释中填写`hygon.com/use-gpuuuid`字段。HAMi调度器将尝试匹配具有该UUID的设备。 例如,具有以下注释的任务将被分配到UUID为`DCU-123456`的设备上 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/iluvatar-device/enable-illuvatar-gpu-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/iluvatar-device/enable-illuvatar-gpu-sharing.md index 884c82f7..365ebdc4 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/iluvatar-device/enable-illuvatar-gpu-sharing.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/iluvatar-device/enable-illuvatar-gpu-sharing.md @@ -4,7 +4,7 @@ linktitle: GPU 共享 translated: true --- -## 启用天数智芯 GPU 共享 +## 简介 本组件支持复用天数智芯 GPU 设备 (MR-V100、BI-V150、BI-V100),并为此提供以下几种与 vGPU 类似的复用功能,包括: diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kueue/how-to-use-kueue.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kueue/how-to-use-kueue.md index 1244e9d3..98429de4 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kueue/how-to-use-kueue.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kueue/how-to-use-kueue.md @@ -2,8 +2,6 @@ title: 如何在 HAMi 上使用 Kueue --- -## 在 HAMi 中使用 Kueue - 本指南将帮助你使用 Kueue 来管理 HAMi vGPU 资源,包括启用 Deployment 支持、配置 ResourceTransformation,以及创建请求 vGPU 资源的工作负载。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kunlunxin-device/enable-kunlunxin-schedule.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kunlunxin-device/enable-kunlunxin-schedule.md index 326f6ec1..4cf23d6b 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kunlunxin-device/enable-kunlunxin-schedule.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kunlunxin-device/enable-kunlunxin-schedule.md @@ -3,8 +3,6 @@ title: 启用昆仑芯 GPU 拓扑感知调度 linktitle: 拓扑感知调度 --- -## 启用昆仑芯 GPU 拓扑感知调度 - **昆仑芯 GPU 拓扑感知调度现在通过 `kunlunxin.com/xpu` 资源得到支持。** 当在单个 P800 服务器上配置多个 XPU 时,当 XPU 卡连接到同一 NUMA 节点或互相之间可以直接连接时,性能会显著提升。从而在服务器上的所有 XPU 之间形成拓扑,如下所示: diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kunlunxin-device/examples/allocate-whole-xpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kunlunxin-device/examples/allocate-whole-xpu.md index 7265088d..5e1fb3da 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kunlunxin-device/examples/allocate-whole-xpu.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/kunlunxin-device/examples/allocate-whole-xpu.md @@ -1,10 +1,7 @@ --- title: 分配整个 xpu 卡 -linktitle: 分配整个卡 --- -## 分配整个 xpu 卡 - 要分配整个 xpu 设备,您只需要分配 `kunlunxin.com/xpu`,无需其他字段。您可以为容器分配多个 XPU。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-gpu/enable-metax-gpu-schedule.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-gpu/enable-metax-gpu-schedule.md index 0c05a0a5..941215d9 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-gpu/enable-metax-gpu-schedule.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-gpu/enable-metax-gpu-schedule.md @@ -4,8 +4,6 @@ linktitle: 拓扑感知调度 translated: true --- -## 启用沐曦 GPU 拓扑感知调度 - **HAMi 现在通过在沐曦 GPU 之间实现拓扑感知来支持 metax.com/gpu**: 当在单个服务器上配置多个 GPU 时,GPU 卡根据它们是否连接到同一个 PCIe 交换机或 MetaXLink 而存在远近关系。这在服务器上的所有卡之间形成了一个拓扑,如下图所示: diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-gpu/specify-spread-task.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-gpu/specify-spread-task.md index ca9489c7..8c95b3b1 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-gpu/specify-spread-task.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-gpu/specify-spread-task.md @@ -1,5 +1,6 @@ --- title: 扩展调度策略 +linktitle: Spread 策略 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-sgpu/enable-metax-gpu-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-sgpu/enable-metax-gpu-sharing.md index 37964b11..f5d93464 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-sgpu/enable-metax-gpu-sharing.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-sgpu/enable-metax-gpu-sharing.md @@ -4,8 +4,6 @@ linktitle: GPU 共享 translated: true --- -## 启用沐曦 GPU 共享 - **HAMi 目前支持复用沐曦 GPU 设备,提供与 vGPU 类似的复用功能**,包括: - **GPU 共享**: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-sgpu/examples/default-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-sgpu/examples/default-use.md index 3ee67a64..d615f06d 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-sgpu/examples/default-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/metax-device/metax-sgpu/examples/default-use.md @@ -1,5 +1,6 @@ --- title: 为容器分配设备核心和显存资源 +linktitle: 分配核心和显存 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/device-allocation.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/device-allocation.md index 187d6afc..972bd392 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/device-allocation.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/device-allocation.md @@ -1,10 +1,9 @@ --- -title: 集群设备分配 +title: 集群设备分配端点 +linktitle: 集群设备分配 translated: true --- -## 集群设备分配端点 - 您可以通过访问 `{scheduler node ip}:31993/metrics` 获取集群设备分配和限制的概览,或者将其添加到 Prometheus 端点,如下命令所示: ```bash diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/real-time-device-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/real-time-device-usage.md index d4fa0771..7a420d1d 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/real-time-device-usage.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/real-time-device-usage.md @@ -1,10 +1,9 @@ --- -title: 实时设备使用 +title: 实时设备使用端点 +linktitle: 实时设备使用 translated: true --- -## 实时设备使用端点 - 您可以通过访问 `{GPU 节点 IP}:31992/metrics` 获取实时设备显存和核心使用情况,或者将其添加到 Prometheus 端点,如下命令所示: ```bash diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/examples/allocate-core-and-memory.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/examples/allocate-core-and-memory.md index f3b555fe..9db0fd48 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/examples/allocate-core-and-memory.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/examples/allocate-core-and-memory.md @@ -4,8 +4,6 @@ linktitle: 分配核心和显存 translated: true --- -## 为容器分配设备核心和显存资源 - 要分配设备核心资源的一部分,您只需在容器中使用 `mthreads.com/vgpu` 请求的寒武纪 MLU 数量的同时,分配 `mthreads.com/sgpu-memory` 和 `mthreads.com/sgpu-core`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/examples/allocate-exclusive.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/examples/allocate-exclusive.md index a2e37e0e..a6aa958e 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/examples/allocate-exclusive.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/examples/allocate-exclusive.md @@ -4,8 +4,6 @@ linktitle: 独占设备 translated: true --- -## 分配独占设备 - 要分配整个寒武纪设备,您只需分配 `mthreads.com/vgpu` 而无需其他字段。您可以为一个容器分配多个 GPU。 ``` diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/specify-device-memory-usage.md index ccc6aa0f..bd897ca0 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/specify-device-memory-usage.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/mthreads-device/specify-device-memory-usage.md @@ -4,8 +4,6 @@ linktitle: 指定显存 translated: true --- -## 为容器分配设备显存 - 通过指定诸如 `mthreads.com/sgpu-memory` 之类的资源来分配设备显存的百分比大小。可选项,每个 `mthreads.com/sgpu-memory` 单位等于 512M 的设备显存。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/dynamic-resource-allocation.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/dynamic-resource-allocation.md index 20d09ad7..68701886 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/dynamic-resource-allocation.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/dynamic-resource-allocation.md @@ -3,8 +3,6 @@ title: 动态资源分配 translated: true --- -## 动态资源分配 - ## 介绍 HAMi 已经在 NVIDIA 设备上支持了 K8s [DRA](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)(动态资源分配)功能。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-core.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-core.md index 14798629..25ff3891 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-core.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-core.md @@ -4,8 +4,6 @@ linktitle: 分配核心 translated: true --- -## 为容器分配设备核心资源 - 要分配设备核心资源的某一部分,您只需分配 `nvidia.com/gpucores`,无需其他资源字段。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-memory.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-memory.md index 5c50a51c..4e2c35c3 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-memory.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-memory.md @@ -4,8 +4,6 @@ linktitle: 分配显存 translated: true --- -## 为容器分配特定设备显存 - 要分配特定大小的 GPU 设备显存,您只需在 `nvidia.com/gpu` 之外分配 `nvidia.com/gpumem`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-memory2.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-memory2.md index d2bf9d69..8ae30f69 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-memory2.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/allocate-device-memory2.md @@ -4,8 +4,6 @@ linktitle: 按百分比分配显存 translated: true --- -## 按百分比分配设备显存给容器 - 要按百分比分配一定大小的 GPU 设备显存,您只需在 `nvidia.com/gpu` 之外分配 `nvidia.com/gpumem-percentage`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/dynamic-mig-example.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/dynamic-mig-example.md index 2db4add3..51516e21 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/dynamic-mig-example.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/dynamic-mig-example.md @@ -3,7 +3,7 @@ title: 将任务分配给 mig 实例 translated: true --- -## 此示例将为 A100-40GB-PCIE 设备分配 2g.10gb * 2 或为 A100-80GB-XSM 设备分配 1g.10gb * 2。 +此示例将为 A100-40GB-PCIE 设备分配 `2g.10gb * 2` 或为 A100-80GB-XSM 设备分配 `1g.10gb * 2`。 ```yaml apiVersion: v1 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/specify-card-type-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/specify-card-type-to-use.md index eea7c36c..e873411a 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/specify-card-type-to-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/specify-card-type-to-use.md @@ -4,8 +4,6 @@ linktitle: 指定卡类型 translated: true --- -## 分配任务到特定类型 - 要将任务分配到特定的 GPU 类型,只需在注释字段中分配 `nvidia.com/use-gputype`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/specify-certain-card.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/specify-certain-card.md index ee795553..b3844cb2 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/specify-certain-card.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/specify-certain-card.md @@ -4,8 +4,6 @@ linktitle: 指定 GPU translated: true --- -## 将任务分配给特定的 GPU - 要将任务分配给特定的 GPU,只需在注释字段中分配 `nvidia.com/use-gpuuuid`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/use-exclusive-card.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/use-exclusive-card.md index 4a0717bf..6f46cf20 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/use-exclusive-card.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/examples/use-exclusive-card.md @@ -1,11 +1,9 @@ --- -title: 使用独占 GPU -linktitle: 独占 GPU +title: 将设备核心分配给容器 +linktitle: 使用独占 GPU translated: true --- -## 使用独占 GPU - 要以独占模式使用 GPU,这是 nvidia-k8s-device-plugin 的默认行为,您只需分配 `nvidia.com/gpu` 而无需其他资源字段。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-memory-usage.md index 2ce679b2..29d34b66 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-memory-usage.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-memory-usage.md @@ -4,8 +4,6 @@ linktitle: 指定显存 translated: true --- -## 为容器分配设备显存 - 通过指定资源如 `nvidia.com/gpumem` 来分配一定大小的设备显存。可选项,每个 `nvidia.com/gpumem` 单位等于 1M。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-type-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-type-to-use.md index 89956045..5886d67a 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-type-to-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-type-to-use.md @@ -4,8 +4,6 @@ linktitle: 指定设备类型 translated: true --- -## 分配到特定设备类型 - 有时任务可能希望在某种类型的 GPU 上运行,可以在 Pod 注释中填写 `nvidia.com/use-gputype` 字段。HAMi 调度器将检查 `nvidia-smi -L` 返回的设备类型是否包含注释的内容。 例如,具有以下注释的任务将被分配到 A100 或 V100 GPU: diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-uuid-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-uuid-to-use.md index a4cef143..bc89f00a 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-uuid-to-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/specify-device-uuid-to-use.md @@ -4,8 +4,6 @@ linktitle: 指定设备 translated: true --- -## 分配到特定设备 - 有时任务可能希望在某个特定的GPU上运行,可以在pod注释中填写`nvidia.com/use-gpuuuid`字段。HAMi调度器将尝试匹配具有该UUID的设备。 例如,具有以下注释的任务将被分配到UUID为`GPU-123456`的设备上 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/using-resourcequota.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/using-resourcequota.md index 8e28e02b..1f07e766 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/using-resourcequota.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/nvidia-device/using-resourcequota.md @@ -1,10 +1,9 @@ --- title: 为 NVIDIA 设备使用扩展的 resourcequota +linktitle: ResourceQuota translated: true --- -## 扩展的 resourcequota - HAMi 基于原生调度器扩展,你可以使用原生的 resourcequota 对资源进行限制。对于 NVIDIA 设备,HAMi 支持了在扩展场景下的 resourcequota。对于请求多个设备的任务,原生 resourcequota 会单独计算每个资源的请求量,而扩展的 resourcequota 会根据设备数量计算实际的资源请求量。例如,以下任务请求两个 GPU 和 2000MB 的 GPU 显存,它在 HAMi scheduler 中会被正确计算为 4000MB 的资源请求量。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/vastai/enable-vastai-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/vastai/enable-vastai-sharing.md new file mode 100644 index 00000000..3c9a02a8 --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/vastai/enable-vastai-sharing.md @@ -0,0 +1,235 @@ +--- +title: 启用 Vastai 设备共享 +--- + +## 介绍 + +HAMi 现在支持共享 `vastaitech.com/va`(Vastaitech)设备,并提供以下能力: + +- ***支持整卡模式和 Die 模式***:当前仅支持整卡模式(Full-Card mode)和 Die 模式(Die mode)。 +- ***Die 模式拓扑感知***:在 Die 模式下申请多个资源时,调度器会尽量将它们分配到同一块 AIC 上。 +- ***设备 UUID 选择***:可以通过注解指定或排除某些特定设备。 + +## 使用 Vastai 设备 + +### 启用 Vastai 设备共享 + +#### 给节点打标签 + +``` +kubectl label node {vastai-node} vastai=on +``` + +#### 部署 `vastai-device-plugin` + +##### 整卡模式(Full Card Mode) + +``` +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: hami-vastai +rules: + - apiGroups: [""] + resources: ["pods"] + verbs: ["get", "list", "update", "watch", "patch"] + - apiGroups: [""] + resources: ["nodes"] + verbs: ["get", "update", "patch"] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: hami-vastai +subjects: + - kind: ServiceAccount + name: hami-vastai + namespace: kube-system +roleRef: + kind: ClusterRole + name: hami-vastai + apiGroup: rbac.authorization.k8s.io +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: hami-vastai + namespace: kube-system + labels: + app.kubernetes.io/component: "hami-vastai" +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: vastai-device-plugin-daemonset + namespace: kube-system + labels: + app.kubernetes.io/component: hami-vastai-device-plugin +spec: + selector: + matchLabels: + app.kubernetes.io/component: hami-vastai-device-plugin + hami.io/webhook: ignore + updateStrategy: + type: RollingUpdate + template: + metadata: + labels: + app.kubernetes.io/component: hami-vastai-device-plugin + hami.io/webhook: ignore + spec: + priorityClassName: "system-node-critical" + serviceAccountName: hami-vastai + nodeSelector: + vastai-device: "vastai" + containers: + - image: projecthami/vastai-device-plugin:latest + imagePullPolicy: Always + name: vastai-device-plugin-dp + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + args: ["--fail-on-init-error=false", "--pass-device-specs=true"] + securityContext: + privileged: true + volumeMounts: + - name: device-plugin + mountPath: /var/lib/kubelet/device-plugins + - name: libvaml-lib + mountPath: /usr/lib/libvaml.so + - name: libvaml-lib64 + mountPath: /usr/lib64/libvaml.so + volumes: + - name: device-plugin + hostPath: + path: /var/lib/kubelet/device-plugins + - name: libvaml-lib + hostPath: + path: /usr/lib/libvaml.so + - name: libvaml-lib64 + hostPath: + path: /usr/lib64/libvaml.so + nodeSelector: + vastai: "on" +``` + +##### Die 模式(Die Mode) + +``` +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: hami-vastai +rules: + - apiGroups: [""] + resources: ["pods"] + verbs: ["get", "list", "update", "watch", "patch"] + - apiGroups: [""] + resources: ["nodes"] + verbs: ["get", "update", "patch"] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: hami-vastai +subjects: + - kind: ServiceAccount + name: hami-vastai + namespace: kube-system +roleRef: + kind: ClusterRole + name: hami-vastai + apiGroup: rbac.authorization.k8s.io +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: hami-vastai + namespace: kube-system + labels: + app.kubernetes.io/component: "hami-vastai" +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: vastai-device-plugin-daemonset + namespace: kube-system + labels: + app.kubernetes.io/component: hami-vastai-device-plugin +spec: + selector: + matchLabels: + app.kubernetes.io/component: hami-vastai-device-plugin + hami.io/webhook: ignore + updateStrategy: + type: RollingUpdate + template: + metadata: + labels: + app.kubernetes.io/component: hami-vastai-device-plugin + hami.io/webhook: ignore + spec: + priorityClassName: "system-node-critical" + serviceAccountName: hami-vastai + nodeSelector: + vastai-device: "vastai" + containers: + - image: projecthami/vastai-device-plugin:latest + imagePullPolicy: Always + name: vastai-device-plugin-dp + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + args: ["--fail-on-init-error=false", "--pass-device-specs=true", "--device-strategy=die", "--rename-on-die=false"] + securityContext: + privileged: true + volumeMounts: + - name: device-plugin + mountPath: /var/lib/kubelet/device-plugins + - name: libvaml-lib + mountPath: /usr/lib/libvaml.so + - name: libvaml-lib64 + mountPath: /usr/lib64/libvaml.so + volumes: + - name: device-plugin + hostPath: + path: /var/lib/kubelet/device-plugins + - name: libvaml-lib + hostPath: + path: /usr/lib/libvaml.so + - name: libvaml-lib64 + hostPath: + path: /usr/lib64/libvaml.so + nodeSelector: + vastai: "on" +``` + +### 运行 Vastai 作业 + +``` +apiVersion: v1 +kind: Pod +metadata: + name: vastai-pod +spec: + restartPolicy: Never + containers: + - name: vastai-container + image: harbor.vastaitech.com/ai_deliver/vllm_vacc:VVI-25.12.SP2 + command: ["sleep", "infinity"] + resources: + limits: + vastaitech.com/va: "1" +``` + +## 注意事项 + +1. 在申请 Vastai 资源时,**不能**指定显存大小。 +2. `vastai-device-plugin` **不会**自动把 `vasmi` 挂载到容器内。如果你需要在容器内使用 `vasmi` 命令,请手动将其挂载到容器中。 + diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/vastai/examples/default-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/vastai/examples/default-use.md new file mode 100644 index 00000000..738aecec --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/vastai/examples/default-use.md @@ -0,0 +1,23 @@ +--- +title: 申请 Vastai 设备 +--- + +下面的示例展示了如何在一个普通的 Kubernetes Pod 中申请一个 Vastai 设备。 +该 Pod 使用 Vastaitech 提供的镜像,以长时间运行的方式启动容器,并通过 `resources.limits` 中声明一个 `vastaitech.com/va` 设备。 +你可以在此基础上替换镜像、命令或资源配额,以适配自己的业务场景。 + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: vastai-pod +spec: + containers: + - name: test + image: harbor.vastaitech.com/ai_deliver/vllm_vacc:VVI-25.12.SP2 + command: ["sleep", "infinity"] + resources: + limits: + vastaitech.com/va: 1 +``` + diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md index 3aba9d07..799a3528 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md @@ -1,10 +1,9 @@ --- -title: 默认 vGPU Job +title: 默认 vgpu 作业 +linktitle: 默认作业 translated: true --- -## Job 描述 - vGPU 可以通过在 resource.limit 中设置 "volcano.sh/vgpu-number"、"volcano.sh/vgpu-cores" 和 "volcano.sh/vgpu-memory" 来请求。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md index 1fbf4c25..91cc1a4f 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md @@ -1,10 +1,9 @@ --- title: 使用独占 GPU +linktitle: 独占 GPU translated: true --- -## Job 描述 - 要分配一个独占的 GPU,您只需分配 `volcano.sh/vgpu-number`,而无需其他 `volcano.sh/xxx` 字段,如下例所示: ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md index 40b9a3ca..ed13deb4 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md @@ -1,10 +1,9 @@ --- -title: 如何使用 Volcano vGPU +title: Volcano vgpu 设备插件用于 Kubernetes +linktitle: 如何使用 Volcano vGPU translated: true --- -## Volcano vgpu 设备插件用于 Kubernetes - :::note 使用 volcano-vgpu 时,**不需要** 安装 HAMi,仅使用 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/monitor.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/monitor.md index b9ef8d32..f2513033 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/monitor.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/nvidia-gpu/monitor.md @@ -1,5 +1,6 @@ --- title: 监控 Volcano vGPU +linktitle: 监控 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/contributor/ladder.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/contributor/ladder.md index c6a03484..072249a5 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/contributor/ladder.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/contributor/ladder.md @@ -3,8 +3,6 @@ title: 贡献者阶梯 translated: true --- -## 贡献者阶梯 - 您好!我们很高兴您想了解更多关于我们项目贡献者阶梯的信息!这个贡献者阶梯概述了项目中的不同贡献者角色,以及与之相关的责任和特权。社区成员通常从“阶梯”的第一级开始,并随着他们在项目中的参与度增加而逐步提升。我们的项目成员乐于帮助您在贡献者阶梯上进步。 以下每个贡献者角色都分为三种类型的列表。“责任”是指贡献者应履行的事项。“要求”是指一个人需要满足的资格条件,而“特权”是指该级别的贡献者有权享有的事项。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/mindmap.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/mindmap.md index 77ba70c2..16060bed 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/mindmap.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/mindmap.md @@ -3,6 +3,4 @@ title: HAMi 路线图 translated: true --- -## 思维导图 - ![HAMi VGPU 思维导图,显示项目结构和组件](https://github.com/Project-HAMi/HAMi/blob/master/docs/mind-map/HAMI-VGPU-mind-map-Chinese.png?raw=true) diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/protocol.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/protocol.md index 6d15a2e6..2309089d 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/protocol.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/protocol.md @@ -3,8 +3,6 @@ title: 协议设计 translated: true --- -## 协议实现 - ### 设备注册 为了进行更准确的调度,HAMi 调度器需要在设备注册时感知设备的规格,包括 UUID、显存、计算能力、型号、numa 数量等。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/scheduling.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/scheduling.md index a3dcb751..f57b5699 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/scheduling.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/developers/scheduling.md @@ -15,7 +15,7 @@ translated: true 这是一个 GPU 集群,拥有两个节点,以下故事以此集群为前提。 -![scheduler-policy-story.png](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/scheduler-policy-story.png) +![HAMi 调度策略故事示意图,展示节点与 GPU 资源分布](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/scheduler-policy-story.png) #### 故事 1 @@ -82,7 +82,7 @@ GPU spread,尽可能使用不同的 GPU 卡,例如: ### Node-scheduler-policy -![node-scheduler-policy-demo.png](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/node-scheduler-policy-demo.png) +![HAMi 节点调度策略示意图,展示 Binpack 与 Spread 节点选择流程](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/node-scheduler-policy-demo.png) #### Binpack @@ -128,7 +128,7 @@ Node2 score: ((1+2)/4) * 10= 7.5 ### GPU-scheduler-policy -![gpu-scheduler-policy-demo.png](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/gpu-scheduler-policy-demo.png) +![HAMi GPU 调度策略示意图,展示在单卡上的 Binpack 与 Spread 评分对比](https://github.com/Project-HAMi/HAMi/raw/master/docs/develop/imgs/gpu-scheduler-policy-demo.png) #### Binpack diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-hami-dra.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-hami-dra.md index 6835d0c5..9da3d16d 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-hami-dra.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-hami-dra.md @@ -1,10 +1,9 @@ --- -title: HAMi DRA +title: Kubernetes 的 HAMi DRA +linktitle: HAMi DRA translated: true --- -## Kubernetes 的 HAMi DRA - ## 介绍 HAMi 已经提供了对 K8s [DRA](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)(动态资源分配)功能的支持。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-volcano-ascend.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-volcano-ascend.md index 4ff3e22e..a3d93461 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-volcano-ascend.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-volcano-ascend.md @@ -1,10 +1,9 @@ --- -title: Volcano Ascend vNPU +title: Volcano Ascend vNPU 使用指南 +linktitle: Volcano Ascend vNPU translated: true --- -## Volcano 中 Ascend 设备使用指南 - ## 介绍 Volcano 通过 `ascend-device-plugin` 支持 Ascend 310 和 Ascend 910 的 vNPU 功能。同时支持管理异构 Ascend 集群(包含多种 Ascend 类型的集群,例如 910A、910B2、910B3、310p)。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-volcano-vgpu.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-volcano-vgpu.md index f6c3efb2..f5c91aa0 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-volcano-vgpu.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/how-to-use-volcano-vgpu.md @@ -1,8 +1,11 @@ --- -title: Volcano vGPU +title: Volcano vGPU 使用指南 +linktitle: Volcano vGPU translated: true --- +:::note + **注意**: 使用 volcano-vgpu 时,您*不需要*安装 HAMi,只需使用 @@ -12,6 +15,9 @@ translated: true Volcano vgpu 仅在 volcano > 1.9 中可用 +::: + + ## 快速开始 ### 配置调度器 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/prerequisites.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/prerequisites.md index 67cdc13e..34ed2f87 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/prerequisites.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/installation/prerequisites.md @@ -3,7 +3,7 @@ title: 前提条件 translated: true --- -## 先决条件 +在安装HAMi之前,请确保您的环境中已正确安装以下工具和依赖项: - NVIDIA 驱动版本 >= 440 - nvidia-docker 版本 > 2.0 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/ascend-device/device-template.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/ascend-device/device-template.md index 9261be9c..f56bc6ff 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/ascend-device/device-template.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/ascend-device/device-template.md @@ -3,6 +3,10 @@ title: Ascend 设备模板 translated: true --- +Ascend 设备模板用于定义一块物理 Ascend 卡如何被切分成多个可租用的虚拟实例,供 HAMi 调度使用。 +每个模板描述了对应卡型的可用显存、AI Core 以及可选的 CPU 资源。 +当 Pod 申请 Ascend 相关资源时,HAMi 会根据请求的显存和算力,从这些模板中选择最合适的一种进行分配。 + ```yaml vnpus: - chipName: 910B diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/ascend-device/enable-ascend-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/ascend-device/enable-ascend-sharing.md index 7a7a48fe..9e2ace61 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/ascend-device/enable-ascend-sharing.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/ascend-device/enable-ascend-sharing.md @@ -1,5 +1,6 @@ --- title: 启用 Ascend 共享 +linktitle: Ascend 共享 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/awsneuron-device/examples/allocate-neuron-core.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/awsneuron-device/examples/allocate-neuron-core.md index 0ea745ce..acd2d506 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/awsneuron-device/examples/allocate-neuron-core.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/awsneuron-device/examples/allocate-neuron-core.md @@ -1,5 +1,5 @@ --- -title: 分配AWS Neuron核心资源 +title: 分配 AWS Neuron 核心资源 --- 如需分配1/2个neuron设备,您可以通过分配neuroncore来实现,如下例所示: diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/awsneuron-device/examples/allocate-neuron-device.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/awsneuron-device/examples/allocate-neuron-device.md index 4af17cad..2793e7d4 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/awsneuron-device/examples/allocate-neuron-device.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/awsneuron-device/examples/allocate-neuron-device.md @@ -1,5 +1,5 @@ --- -title: 分配AWS Neuron核心 +title: 分配 AWS Neuron 核心 --- 如需独占分配一个或多个aws neuron设备,可通过`aws.amazon.com/neuron`进行资源分配: diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/examples/allocate-core-and-memory.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/examples/allocate-core-and-memory.md index fa08c0f3..11163b13 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/examples/allocate-core-and-memory.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/examples/allocate-core-and-memory.md @@ -1,10 +1,9 @@ --- title: 为容器分配设备核心和显存资源 +linktitle: 分配核心和显存 translated: true --- -## 为容器分配设备核心和显存 - 要分配设备核心资源的某一部分,您只需在容器中使用 `cambricon.com/vmlu` 指定所需的寒武纪 MLU 数量,并分配 `cambricon.com/mlu370.smlu.vmemory` 和 `cambricon.com/mlu370.smlu.vcore`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/examples/allocate-exclusive.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/examples/allocate-exclusive.md index b2fcd8b4..f1b4ce64 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/examples/allocate-exclusive.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/examples/allocate-exclusive.md @@ -1,10 +1,9 @@ --- title: 分配独占设备 +linktitle: 独占设备 translated: true --- -## 分配独占设备 - 要分配整个寒武纪设备,您只需分配 `cambricon.com/vmlu`,无需其他字段。 ``` diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/specify-device-memory-usage.md index c92bc4b5..c525c7e0 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/specify-device-memory-usage.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/specify-device-memory-usage.md @@ -4,8 +4,6 @@ linktitle: 指定显存 translated: true --- -## 为容器分配设备显存 - 通过指定资源如 `cambricon.com/mlu.smlu.vmemory` 来分配设备显存的百分比大小。可选项,每个 `cambricon.com/mlu.smlu.vmemory` 单位等于设备显存的 1%。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/specify-device-type-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/specify-device-type-to-use.md index d6f61651..e1929a09 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/specify-device-type-to-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/cambricon-device/specify-device-type-to-use.md @@ -4,8 +4,6 @@ linktitle: 指定设备类型 translated: true --- -## 分配到特定设备类型 - 您需要在 `cambricon-device-plugin` 中添加参数 `- --enable-device-type` 以支持设备类型规范。当设置此选项时,不同类型的 MLU 将生成不同的资源名称,例如 `cambricon.com/mlu370.smlu.vcore` 和 `cambricon.com/mlu370.smlu.vmemory`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/configure.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/configure.md index 7cd5501a..345b03a4 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/configure.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/configure.md @@ -1,10 +1,9 @@ --- -title: 配置 +title: 全局配置 +linktitle: 配置 translated: true --- -## 全局配置 - ## 设备配置:ConfigMap :::note diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/device-supported.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/device-supported.md index bd82234c..d11cb466 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/device-supported.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/device-supported.md @@ -1,5 +1,5 @@ --- -title: 支持HAMi的设备 +title: HAMi 支持的设备 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/allocate-core-and-memory.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/allocate-core-and-memory.md index fa7cd013..b9dbed18 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/allocate-core-and-memory.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/allocate-core-and-memory.md @@ -1,10 +1,9 @@ --- title: 为容器分配设备核心和显存资源 +linktitle: 分配核心和显存 translated: true --- -## 为容器分配设备核心和显存 - 要分配设备核心资源的某一部分,您只需在容器中使用 `hygon.com/dcunum` 请求的海光 DCU 数量,并分配 `hygon.com/dcucores` 和 `hygon.com/dcumem`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/allocate-exclusive.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/allocate-exclusive.md index f79ce319..2f1cc689 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/allocate-exclusive.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/allocate-exclusive.md @@ -1,10 +1,9 @@ --- title: 分配独占设备 +linktitle: 独占设备 translated: true --- -## 分配独占设备 - 要分配整个海光 DCU 设备,您只需分配 `hygon.com/dcunum`,无需其他字段。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/specify-certain-cards.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/specify-certain-cards.md index afa0045b..4e81504a 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/specify-certain-cards.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/examples/specify-certain-cards.md @@ -1,10 +1,9 @@ --- title: 将任务分配给特定的 DCU +linktitle: 指定 DCU translated: true --- -## 将任务分配给特定的 DCU - 要将任务分配给特定的 DCU,只需在注释字段中分配 `hygon.com/use-gpuuuid` ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/specify-device-memory-usage.md index 780bc5ba..e9c0b852 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/specify-device-memory-usage.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/specify-device-memory-usage.md @@ -4,8 +4,6 @@ linktitle: 指定显存 translated: true --- -## 为容器分配设备显存 - 通过指定诸如 `hygon.com/dcumem` 之类的资源来分配设备显存的百分比大小。可选项,每个 `hygon.com/dcumem` 单位等于 1M 设备显存。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/specify-device-uuid-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/specify-device-uuid-to-use.md index bd521c69..2e9dc4a5 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/specify-device-uuid-to-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/hygon-device/specify-device-uuid-to-use.md @@ -4,8 +4,6 @@ linktitle: 指定设备 translated: true --- -## 分配到特定设备 - 有时任务可能希望在某个特定的DCU上运行,可以在pod注释中填写`hygon.com/use-gpuuuid`字段。HAMi调度器将尝试匹配具有该UUID的设备。 例如,具有以下注释的任务将被分配到UUID为`DCU-123456`的设备上 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kueue/how-to-use-kueue.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kueue/how-to-use-kueue.md index 1244e9d3..98429de4 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kueue/how-to-use-kueue.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kueue/how-to-use-kueue.md @@ -2,8 +2,6 @@ title: 如何在 HAMi 上使用 Kueue --- -## 在 HAMi 中使用 Kueue - 本指南将帮助你使用 Kueue 来管理 HAMi vGPU 资源,包括启用 Deployment 支持、配置 ResourceTransformation,以及创建请求 vGPU 资源的工作负载。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kunlunxin-device/enable-kunlunxin-schedule.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kunlunxin-device/enable-kunlunxin-schedule.md index d5d2add6..4cf23d6b 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kunlunxin-device/enable-kunlunxin-schedule.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kunlunxin-device/enable-kunlunxin-schedule.md @@ -1,5 +1,6 @@ --- title: 启用昆仑芯 GPU 拓扑感知调度 +linktitle: 拓扑感知调度 --- **昆仑芯 GPU 拓扑感知调度现在通过 `kunlunxin.com/xpu` 资源得到支持。** diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kunlunxin-device/examples/allocate-whole-xpu.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kunlunxin-device/examples/allocate-whole-xpu.md index b0400544..5e1fb3da 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kunlunxin-device/examples/allocate-whole-xpu.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/kunlunxin-device/examples/allocate-whole-xpu.md @@ -2,8 +2,6 @@ title: 分配整个 xpu 卡 --- -## 分配独占设备 - 要分配整个 xpu 设备,您只需要分配 `kunlunxin.com/xpu`,无需其他字段。您可以为容器分配多个 XPU。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/enable-metax-gpu-schedule.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/enable-metax-gpu-schedule.md index 0c05a0a5..941215d9 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/enable-metax-gpu-schedule.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/enable-metax-gpu-schedule.md @@ -4,8 +4,6 @@ linktitle: 拓扑感知调度 translated: true --- -## 启用沐曦 GPU 拓扑感知调度 - **HAMi 现在通过在沐曦 GPU 之间实现拓扑感知来支持 metax.com/gpu**: 当在单个服务器上配置多个 GPU 时,GPU 卡根据它们是否连接到同一个 PCIe 交换机或 MetaXLink 而存在远近关系。这在服务器上的所有卡之间形成了一个拓扑,如下图所示: diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/specify-binpack-task.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/specify-binpack-task.md index a97ad224..b6f89214 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/specify-binpack-task.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/specify-binpack-task.md @@ -4,8 +4,6 @@ linktitle: Binpack 策略 translated: true --- -## Binpack 调度策略 - 为了在最小化拓扑损失的情况下分配 沐曦设备,您只需将 `metax-tech.com/gpu` 与注释 `hami.io/node-scheduler-policy: "binpack"` 一起分配。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/specify-spread-task.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/specify-spread-task.md index 9440a48b..8c95b3b1 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/specify-spread-task.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-gpu/specify-spread-task.md @@ -4,8 +4,6 @@ linktitle: Spread 策略 translated: true --- -## 扩展调度策略 - 为了分配性能最佳的 沐曦设备,您只需将 `metax-tech.com/gpu` 与注释 `hami.io/node-scheduler-policy: "spread"` 一起分配 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-sgpu/enable-metax-gpu-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-sgpu/enable-metax-gpu-sharing.md index 37964b11..f5d93464 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-sgpu/enable-metax-gpu-sharing.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-sgpu/enable-metax-gpu-sharing.md @@ -4,8 +4,6 @@ linktitle: GPU 共享 translated: true --- -## 启用沐曦 GPU 共享 - **HAMi 目前支持复用沐曦 GPU 设备,提供与 vGPU 类似的复用功能**,包括: - **GPU 共享**: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-sgpu/examples/default-use.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-sgpu/examples/default-use.md index 3ee67a64..d615f06d 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-sgpu/examples/default-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/metax-device/metax-sgpu/examples/default-use.md @@ -1,5 +1,6 @@ --- title: 为容器分配设备核心和显存资源 +linktitle: 分配核心和显存 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/monitoring/device-allocation.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/monitoring/device-allocation.md index 98ac7b40..6831a098 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/monitoring/device-allocation.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/monitoring/device-allocation.md @@ -1,10 +1,9 @@ --- -title: 集群设备分配 +title: 集群设备分配端点 +linktitle: 集群设备分配 translated: true --- -## 集群设备分配端点 - 您可以通过访问 `{scheduler node ip}:31993/metrics` 获取集群设备分配和限制的概览,或者将其添加到 Prometheus 端点,如下命令所示: ```bash diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/monitoring/real-time-device-usage.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/monitoring/real-time-device-usage.md index 2342d6fe..bcb85724 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/monitoring/real-time-device-usage.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/monitoring/real-time-device-usage.md @@ -1,10 +1,9 @@ --- -title: 实时设备使用 +title: 实时设备使用端点 +linktitle: 实时设备使用 translated: true --- -## 实时设备使用端点 - 您可以通过访问 `{GPU 节点 IP}:31992/metrics` 获取实时设备显存和核心使用情况,或者将其添加到 Prometheus 端点,如下命令所示: ```bash diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/examples/allocate-core-and-memory.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/examples/allocate-core-and-memory.md index 545df768..9db0fd48 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/examples/allocate-core-and-memory.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/examples/allocate-core-and-memory.md @@ -1,10 +1,9 @@ --- title: 为容器分配设备核心和显存资源 +linktitle: 分配核心和显存 translated: true --- -## 为容器分配设备核心和显存 - 要分配设备核心资源的一部分,您只需在容器中使用 `mthreads.com/vgpu` 请求的寒武纪 MLU 数量的同时,分配 `mthreads.com/sgpu-memory` 和 `mthreads.com/sgpu-core`。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/examples/allocate-exclusive.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/examples/allocate-exclusive.md index 94d15cdf..a6aa958e 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/examples/allocate-exclusive.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/examples/allocate-exclusive.md @@ -1,10 +1,9 @@ --- title: 分配独占设备 +linktitle: 独占设备 translated: true --- -## 分配独占设备 - 要分配整个寒武纪设备,您只需分配 `mthreads.com/vgpu` 而无需其他字段。您可以为一个容器分配多个 GPU。 ``` diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/specify-device-memory-usage.md index ccc6aa0f..bd897ca0 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/specify-device-memory-usage.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/mthreads-device/specify-device-memory-usage.md @@ -4,8 +4,6 @@ linktitle: 指定显存 translated: true --- -## 为容器分配设备显存 - 通过指定诸如 `mthreads.com/sgpu-memory` 之类的资源来分配设备显存的百分比大小。可选项,每个 `mthreads.com/sgpu-memory` 单位等于 512M 的设备显存。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-core.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-core.md index 3a5f7db2..ebc5c53d 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-core.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-core.md @@ -1,10 +1,9 @@ --- title: 为容器分配设备核心资源 +linktitle: 分配核心 translated: true --- -## 将设备核心分配给容器 - 要分配设备核心资源的某一部分,您只需分配 `nvidia.com/gpucores`,无需其他资源字段。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory.md index b6b0d8e8..4e2c35c3 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory.md @@ -1,5 +1,6 @@ --- title: 为容器分配特定设备显存 +linktitle: 分配显存 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory2.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory2.md index 0726c515..8ae30f69 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory2.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory2.md @@ -1,5 +1,6 @@ --- title: 按百分比分配设备显存给容器 +linktitle: 按百分比分配显存 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/specify-card-type-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/specify-card-type-to-use.md index 862f8f88..e873411a 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/specify-card-type-to-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/specify-card-type-to-use.md @@ -1,5 +1,6 @@ --- title: 分配任务到特定类型 +linktitle: 指定卡类型 translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/specify-certain-card.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/specify-certain-card.md index 5e22e532..b3844cb2 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/specify-certain-card.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/specify-certain-card.md @@ -1,5 +1,6 @@ --- title: 将任务分配给特定的 GPU +linktitle: 指定 GPU translated: true --- diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/use-exclusive-card.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/use-exclusive-card.md index 2d505377..6f46cf20 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/use-exclusive-card.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/nvidia-device/examples/use-exclusive-card.md @@ -1,7 +1,7 @@ --- +title: 将设备核心分配给容器 linktitle: 使用独占 GPU translated: true -title: 将设备核心分配给容器 --- 要以独占模式使用 GPU,这是 nvidia-k8s-device-plugin 的默认行为,您只需分配 `nvidia.com/gpu` 而无需其他资源字段。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md index 27e08c41..f81fdc68 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md @@ -4,8 +4,6 @@ linktitle: 默认作业 translated: true --- -## 默认 vgpu 作业 - VGPU 可以通过在 resource.limit 中设置 "volcano.sh/vgpu-number"、"volcano.sh/vgpu-cores" 和 "volcano.sh/vgpu-memory" 来请求。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md index a2db67d9..51fb3671 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md @@ -4,8 +4,6 @@ linktitle: 独占 GPU translated: true --- -## 使用独占 GPU - 要分配一个独占的GPU,您只需分配`volcano.sh/vgpu-number`,而无需其他`volcano.sh/xxx`字段,如下例所示: ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md index 337e5591..a67d805c 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md @@ -1,11 +1,10 @@ --- -title: 如何使用 Volcano vGPU +title: Volcano vgpu 设备插件用于 Kubernetes +linktitle: 如何使用 Volcano vGPU translated: true --- -## Volcano vgpu 设备插件用于 Kubernetes - -**注意**: +:::note 使用 volcano-vgpu 时,**不需要** 安装 HAMi,仅使用 [Volcano vgpu device-plugin](https://github.com/Project-HAMi/volcano-vgpu-device-plugin) 即可。它可以为由 volcano 管理的 NVIDIA 设备提供设备共享机制。 @@ -14,6 +13,8 @@ translated: true Volcano vgpu 仅在 volcano > 1.9 版本中可用。 +::: + ## 快速开始 ### 安装 Volcano diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/monitor.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/monitor.md index 287a4127..395f77c7 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/monitor.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/monitor.md @@ -4,8 +4,6 @@ linktitle: 监控 translated: true --- -## 监控 Volcano vGPU - volcano-scheduler-metrics 记录每个 GPU 的使用情况和限制,访问以下地址获取这些指标。 ```bash diff --git a/versioned_docs/version-v2.8.0/developers/dynamic-mig.md b/versioned_docs/version-v2.8.0/developers/dynamic-mig.md index 6577277a..ebe587f2 100644 --- a/versioned_docs/version-v2.8.0/developers/dynamic-mig.md +++ b/versioned_docs/version-v2.8.0/developers/dynamic-mig.md @@ -1,9 +1,8 @@ --- -title: Dynamic MIG Implementation +title: NVIDIA GPU MPS and MIG dynamic slice plugin +linktitle: Dynamic MIG Implementation --- -# NVIDIA GPU MPS and MIG dynamic slice plugin - ## Special Thanks This feature will not be implemented without the help of @sailorvii. diff --git a/versioned_docs/version-v2.8.0/developers/mindmap.md b/versioned_docs/version-v2.8.0/developers/mindmap.md index 583e1ec2..70cc811c 100644 --- a/versioned_docs/version-v2.8.0/developers/mindmap.md +++ b/versioned_docs/version-v2.8.0/developers/mindmap.md @@ -2,6 +2,4 @@ title: HAMi mind map --- -## Mind map - ![HAMi VGPU mind map showing project structure and components](../resources/HAMI-VGPU-mind-map-English.png) diff --git a/versioned_docs/version-v2.8.0/developers/protocol.md b/versioned_docs/version-v2.8.0/developers/protocol.md index 24f805df..c3dec99d 100644 --- a/versioned_docs/version-v2.8.0/developers/protocol.md +++ b/versioned_docs/version-v2.8.0/developers/protocol.md @@ -2,8 +2,6 @@ title: Protocol design --- -## Protocol Implementation - ### Device Registration HAMi project diagram diff --git a/versioned_docs/version-v2.8.0/developers/scheduling.md b/versioned_docs/version-v2.8.0/developers/scheduling.md index 41065d4e..b67fcb9c 100644 --- a/versioned_docs/version-v2.8.0/developers/scheduling.md +++ b/versioned_docs/version-v2.8.0/developers/scheduling.md @@ -15,7 +15,7 @@ use can set Pod annotation to change this default policy, use `hami.io/node-sche This is a GPU cluster, having two node, the following story takes this cluster as a prerequisite. -![scheduler-policy-story.png](../resources/scheduler-policy-story.png) +![HAMi scheduler policy story diagram, showing node and GPU resource distribution](../resources/scheduler-policy-story.png) #### Story 1 @@ -83,7 +83,7 @@ GPU spread, use different GPU cards when possible, egs: ### Node-scheduler-policy -![node-scheduler-policy-demo.png](../resources/node-scheduler-policy-demo.png) +![HAMi node scheduler policy diagram, showing Binpack and Spread node selection](../resources/node-scheduler-policy-demo.png) #### Binpack @@ -131,7 +131,7 @@ So, in `Spread` policy we can select `Node2`. ### GPU-scheduler-policy -![gpu-scheduler-policy-demo.png](../resources/gpu-scheduler-policy-demo.png) +![HAMi GPU scheduler policy diagram, comparing Binpack and Spread scores on each card](../resources/gpu-scheduler-policy-demo.png) #### Binpack diff --git a/versioned_docs/version-v2.8.0/installation/how-to-use-volcano-vgpu.md b/versioned_docs/version-v2.8.0/installation/how-to-use-volcano-vgpu.md index 89e51274..80167c76 100644 --- a/versioned_docs/version-v2.8.0/installation/how-to-use-volcano-vgpu.md +++ b/versioned_docs/version-v2.8.0/installation/how-to-use-volcano-vgpu.md @@ -1,9 +1,8 @@ --- -title: Volcano vGPU +title: Volcano vGPU device plugin for Kubernetes +linktitle: Use Volcano vGPU --- -# Volcano vGPU device plugin for Kubernetes - :::note You *DON'T* need to install HAMi when using volcano-vgpu, only use diff --git a/versioned_docs/version-v2.8.0/userguide/ascend-device/device-template.md b/versioned_docs/version-v2.8.0/userguide/ascend-device/device-template.md index 01e5c34f..9fc849bf 100644 --- a/versioned_docs/version-v2.8.0/userguide/ascend-device/device-template.md +++ b/versioned_docs/version-v2.8.0/userguide/ascend-device/device-template.md @@ -2,6 +2,9 @@ title: Ascend device template --- +Ascend device templates define how a physical Ascend card is sliced into virtual instances that HAMi can schedule. +Each template describes the available memory, AI cores and optional CPU resources for a given card model. +When a Pod requests Ascend resources, HAMi selects a suitable template according to the requested memory and compute. ```yaml vnpus: diff --git a/versioned_docs/version-v2.8.0/userguide/configure.md b/versioned_docs/version-v2.8.0/userguide/configure.md index ab90dfcd..3e49e8cf 100644 --- a/versioned_docs/version-v2.8.0/userguide/configure.md +++ b/versioned_docs/version-v2.8.0/userguide/configure.md @@ -1,9 +1,8 @@ --- -title: Configuration +title: Global Config +linktitle: Configuration --- -# Global Config - ## Device Configs: ConfigMap :::note diff --git a/versioned_docs/version-v2.8.0/userguide/hygon-device/specify-device-core-usage.md b/versioned_docs/version-v2.8.0/userguide/hygon-device/specify-device-core-usage.md index 86596e90..7e03433a 100644 --- a/versioned_docs/version-v2.8.0/userguide/hygon-device/specify-device-core-usage.md +++ b/versioned_docs/version-v2.8.0/userguide/hygon-device/specify-device-core-usage.md @@ -1,5 +1,6 @@ --- -title: Allocate device core usage +title: Allocate device core to container +linktitle: Allocate device core usage --- Allocate a percentage of device core resources by specify resource `hygon.com/dcucores`. diff --git a/versioned_docs/version-v2.8.0/userguide/kueue/how-to-use-kueue.md b/versioned_docs/version-v2.8.0/userguide/kueue/how-to-use-kueue.md index 396785c5..065571fd 100644 --- a/versioned_docs/version-v2.8.0/userguide/kueue/how-to-use-kueue.md +++ b/versioned_docs/version-v2.8.0/userguide/kueue/how-to-use-kueue.md @@ -2,8 +2,6 @@ title: How to use kueue on HAMi --- -# Using Kueue with HAMi - This guide will help you use Kueue to manage HAMi vGPU resources, including enabling Deployment support, configuring ResourceTransformation, and creating workloads that request vGPU resources. ## Prerequisites diff --git a/versioned_docs/version-v2.8.0/userguide/kunlunxin-device/examples/allocate-whole-xpu.md b/versioned_docs/version-v2.8.0/userguide/kunlunxin-device/examples/allocate-whole-xpu.md index fa27fdc6..ab101943 100644 --- a/versioned_docs/version-v2.8.0/userguide/kunlunxin-device/examples/allocate-whole-xpu.md +++ b/versioned_docs/version-v2.8.0/userguide/kunlunxin-device/examples/allocate-whole-xpu.md @@ -2,8 +2,6 @@ title: Allocate a whole xpu card --- -## Allocate exclusive device - To allocate a whole xpu device, you need to only assign `kunlunxin.com/xpu` without other fields. You can allocate multiple XPUs for a container. ```yaml diff --git a/versioned_docs/version-v2.8.0/userguide/monitoring/device-allocation.md b/versioned_docs/version-v2.8.0/userguide/monitoring/device-allocation.md index b7daa4fa..8fb12d60 100644 --- a/versioned_docs/version-v2.8.0/userguide/monitoring/device-allocation.md +++ b/versioned_docs/version-v2.8.0/userguide/monitoring/device-allocation.md @@ -1,9 +1,8 @@ --- -title: Cluster device allocation +title: Cluster device allocation endpoint +linktitle: Cluster device allocation --- -## Cluster device allocation endpoint - You can get the overview of cluster device allocation and limit by visiting `{scheduler node ip}:31993/metrics`, or add it to a prometheus endpoint, as the command below: ```bash diff --git a/versioned_docs/version-v2.8.0/userguide/monitoring/real-time-device-usage.md b/versioned_docs/version-v2.8.0/userguide/monitoring/real-time-device-usage.md index 7587e5df..0f8d75f9 100644 --- a/versioned_docs/version-v2.8.0/userguide/monitoring/real-time-device-usage.md +++ b/versioned_docs/version-v2.8.0/userguide/monitoring/real-time-device-usage.md @@ -1,9 +1,8 @@ --- -title: Real-time device usage +title: Real-time device usage endpoint +linktitle: Real-time device usage --- -## Real-time device usage endpoint - You can get the real-time device memory and core utilization by visiting `{GPU node node ip}:31992/metrics`, or add it to a prometheus endpoint, as the command below: ```bash diff --git a/versioned_docs/version-v2.8.0/userguide/mthreads-device/specify-device-core-usage.md b/versioned_docs/version-v2.8.0/userguide/mthreads-device/specify-device-core-usage.md index 5c3312f9..a982cfe6 100644 --- a/versioned_docs/version-v2.8.0/userguide/mthreads-device/specify-device-core-usage.md +++ b/versioned_docs/version-v2.8.0/userguide/mthreads-device/specify-device-core-usage.md @@ -1,9 +1,8 @@ --- -title: Allocate device core usage +title: Allocate device core to container +linktitle: Allocate device core usage --- -## Allocate device core to container - Allocate a part of device core resources by specify resource `mthreads.com/sgpu-core`. Optional, each unit of `mthreads.com/smlu-core` equals to 1/16 device cores. diff --git a/versioned_docs/version-v2.8.0/userguide/mthreads-device/specify-device-memory-usage.md b/versioned_docs/version-v2.8.0/userguide/mthreads-device/specify-device-memory-usage.md index 9774203d..fe2629b1 100644 --- a/versioned_docs/version-v2.8.0/userguide/mthreads-device/specify-device-memory-usage.md +++ b/versioned_docs/version-v2.8.0/userguide/mthreads-device/specify-device-memory-usage.md @@ -1,9 +1,8 @@ --- -title: Allocate device memory +title: Allocate device memory to container +linktitle: Allocate device memory --- -## Allocate device memory to container - Allocate a percentage size of device memory by specify resources such as `mthreads.com/sgpu-memory`. Optional, Each unit of `mthreads.com/sgpu-memory` equals to 512M of device memory. diff --git a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-core.md b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-core.md index 679b9935..442f0b3f 100644 --- a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-core.md +++ b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-core.md @@ -1,5 +1,6 @@ --- -title: Allocate device core resource +title: Allocate device core to container +linktitle: Allocate device core resource --- To allocate a certain part of device core resource, you need only to assign the `nvidia.com/gpucores` without other resource fields. diff --git a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory.md b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory.md index 21bfa34f..37a30eb4 100644 --- a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory.md +++ b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory.md @@ -1,5 +1,6 @@ --- -title: Allocate certain device memory +title: Allocate certain device memory to container +linktitle: Allocate certain device memory --- To allocate a certain size of GPU device memory, you need only to assign `nvidia.com/gpumem` besides `nvidia.com/gpu`. diff --git a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory2.md b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory2.md index 61ee6b53..5a318774 100644 --- a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory2.md +++ b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/allocate-device-memory2.md @@ -1,5 +1,6 @@ --- -title: Allocate device memory by percentage +title: Allocate a part of device memory by percentage to container +linktitle: Allocate device memory by percentage --- To allocate a certain size of GPU device memory by percentage, you need only to assign `nvidia.com/gpumem-percentage` besides `nvidia.com/gpu`. diff --git a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/dynamic-mig-example.md b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/dynamic-mig-example.md index 8d17b685..bd4907b6 100644 --- a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/dynamic-mig-example.md +++ b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/dynamic-mig-example.md @@ -2,7 +2,7 @@ title: Assign task to mig instance --- -This example will allocate 2g.10gb x 2 for A100-40GB-PCIE device or 1g.10gb x 2 for A100-80GB-XSM device. +This example will allocate `2g.10gb x 2` for A100-40GB-PCIE device or `1g.10gb x 2` for A100-80GB-XSM device. ```yaml apiVersion: v1 diff --git a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/use-exclusive-card.md b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/use-exclusive-card.md index 86bd89d2..255b26e3 100644 --- a/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/use-exclusive-card.md +++ b/versioned_docs/version-v2.8.0/userguide/nvidia-device/examples/use-exclusive-card.md @@ -1,9 +1,8 @@ --- -title: Use exclusive GPU +title: Allocate device core to container +linktitle: Use exclusive GPU --- -## Allocate device core to container - To use GPU in an exclusive mode, which is the default behaviour of nvidia-k8s-device-plugin, you need only to assign the `nvidia.com/gpu` without other resource fields. ```yaml diff --git a/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-core-usage.md b/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-core-usage.md index 1642dffc..4ce9e11e 100644 --- a/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-core-usage.md +++ b/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-core-usage.md @@ -1,9 +1,8 @@ --- -title: Allocate device core usage +title: Allocate device core to container +linktitle: Allocate device core usage --- -## Allocate device core to container - Allocate a percentage of device core resources by specify resource `nvidia.com/gpucores`. Optional, each unit of `nvidia.com/gpucores` equals to 1% device cores. diff --git a/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-memory-usage.md b/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-memory-usage.md index 99de00b5..e24b5c93 100644 --- a/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-memory-usage.md +++ b/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-memory-usage.md @@ -1,9 +1,8 @@ --- -title: Allocate device memory +title: Allocate device memory to container +linktitle: Allocate device memory --- -## Allocate device memory to container - Allocate a certain size of device memory by specify resources such as `nvidia.com/gpumem`. Optional, Each unit of `nvidia.com/gpumem` equals to 1M. diff --git a/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-uuid-to-use.md b/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-uuid-to-use.md index f6648d80..0247c504 100644 --- a/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-uuid-to-use.md +++ b/versioned_docs/version-v2.8.0/userguide/nvidia-device/specify-device-uuid-to-use.md @@ -1,5 +1,6 @@ --- -title: Assign to certain device +title: Assign to certain device type +linktitle: Assign to certain device --- Sometimes a task may wish to run on a certain GPU, it can fill the `nvidia.com/use-gpuuuid` field in pod annotation. HAMi scheduler will try to fit in device with that uuid. diff --git a/versioned_docs/version-v2.8.0/userguide/nvidia-device/using-resourcequota.md b/versioned_docs/version-v2.8.0/userguide/nvidia-device/using-resourcequota.md index f4ef4e76..5c1a5dd8 100644 --- a/versioned_docs/version-v2.8.0/userguide/nvidia-device/using-resourcequota.md +++ b/versioned_docs/version-v2.8.0/userguide/nvidia-device/using-resourcequota.md @@ -1,5 +1,6 @@ --- title: Using Extended ResourceQuota for NVIDIA Devices +linktitle: ResourceQuota translated: true --- diff --git a/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md b/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md index 81e4ac50..386b5607 100644 --- a/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md +++ b/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/default-use.md @@ -1,9 +1,7 @@ --- -title: Default vgpu job +title: Default vGPU job --- -## Job description - vGPU can be requested by both set "volcano.sh/vgpu-number", "volcano.sh/vgpu-cores" and "volcano.sh/vgpu-memory" in resources.limits ```yaml diff --git a/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md b/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md index 21feb72c..f47185f4 100644 --- a/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md +++ b/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/examples/use-exclusive-gpu.md @@ -1,9 +1,7 @@ --- -title: Exclusive gpu usage +title: Exclusive GPU usage --- -## Job description - To allocate an exclusive GPU, you need only assign `volcano.sh/vgpu-number` without any other `volcano.sh/xxx` fields, as the example below: ```yaml diff --git a/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md b/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md index 1658ca2b..dc9e6597 100644 --- a/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md +++ b/versioned_docs/version-v2.8.0/userguide/volcano-vgpu/nvidia-gpu/how-to-use-volcano-vgpu.md @@ -1,9 +1,8 @@ --- -title: Use Volcano vGPU +title: Volcano vGPU device plugin for Kubernetes +linktitle: Use Volcano vGPU --- -# Volcano vGPU device plugin for Kubernetes - :::note You *DON'T* need to install HAMi when using volcano-vgpu, only use