`或Markdown文章中的`# `。
-- 每个文档都有一个唯一的ID。默认情况下,文档ID是与根文档目录相关的文档名称(不带扩展名)。
+- 标题相当于 HTML 文档中的`
`或 Markdown 文章中的`# `。
+- 每个文档都有一个唯一的 ID。默认情况下,文档 ID 是与根文档目录相关的文档名称(不带扩展名)。
### 链接到其他文档
您可以通过添加以下任何链接轻松路由到其他地方:
-- 指向外部站点的绝对URL,如`https://github.com`或`https://k8s.io` - 您可以使用任何Markdown标记来实现这一点,因此
- - ``或
+- 指向外部站点的绝对 URL,如`https://github.com`或`https://k8s.io` - 您可以使用任何 Markdown 标记来实现这一点,因此
+ - `` 或
- `[kubernetes](https://k8s.io)`都可以。
-- 链接到Markdown文件或生成的路径。您可以使用相对路径索引相应的文件。
-- 链接到图片或其他资源。如果您的文章包含图片或其他资源,您可以在`/docs/resources`中创建相应的目录,并将文章相关文件放在该目录中。现在我们将关于HAMi的公共图片存储在`/docs/resources/general`中。您可以使用以下方式链接图片:
+- 链接到 Markdown 文件或生成的路径。您可以使用相对路径索引相应的文件。
+- 链接到图片或其他资源。如果您的文章包含图片或其他资源,您可以在`/docs/resources`中创建相应的目录,并将文章相关文件放在该目录中。现在我们将关于 HAMi 的公共图片存储在`/docs/resources/general`中。您可以使用以下方式链接图片:
- ``
### 目录组织
-Docusaurus 2使用侧边栏来管理文档。
+Docusaurus 2 使用侧边栏来管理文档。
创建侧边栏有助于:
@@ -148,18 +148,18 @@ items: [
],
```
-如果您添加了文档,您必须将其添加到`sidebars.js`中以使其正确显示。如果您不确定您的文档位于何处,可以在PR中询问社区成员。
+如果您添加了文档,您必须将其添加到`sidebars.js`中以使其正确显示。如果您不确定您的文档位于何处,可以在 PR 中询问社区成员。
### 关于中文文档
关于文档的中文版有两种情况:
- 您想将我们现有的英文文档翻译成中文。在这种情况下,您需要修改相应文件的内容,路径为[https://github.com/Project-HAMi/website/tree/main/i18n/zh/docusaurus-plugin-content-docs/current](https://github.com/Project-HAMi/website/tree/main/i18n/zh/docusaurus-plugin-content-docs/current)。该目录的组织与外层完全相同。`current.json`保存了文档目录的翻译。如果您想翻译目录名称,可以编辑它。
-- 您想贡献没有英文版的中文文档。欢迎任何类型的文章。在这种情况下,您可以先将文章和标题添加到主目录。文章内容可以先标记为TBD。然后将相应的中文内容添加到中文目录中。
+- 您想贡献没有英文版的中文文档。欢迎任何类型的文章。在这种情况下,您可以先将文章和标题添加到主目录。文章内容可以先标记为 TBD。然后将相应的中文内容添加到中文目录中。
## 调试文档
-现在您已经完成了文档。在您向`Project-HAMi/website`发起PR后,如果通过CI,您可以在网站上预览您的文档。
+现在您已经完成了文档。在您向`Project-HAMi/website`发起 PR 后,如果通过 CI,您可以在网站上预览您的文档。
点击红色标记的**Details**,您将进入网站的预览视图。
@@ -171,4 +171,4 @@ items: [
### 版本控制
-对于每个版本的新补充文档,我们将在每个版本的发布日期同步到最新版本,旧版本的文档将不再修改。对于文档中发现的勘误,我们将在每次发布时修复。
\ No newline at end of file
+对于每个版本的新补充文档,我们将在每个版本的发布日期同步到最新版本,旧版本的文档将不再修改。对于文档中发现的勘误,我们将在每次发布时修复。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/contributor/ladder.md b/i18n/zh/docusaurus-plugin-content-docs/current/contributor/ladder.md
index 12a28af0..6b297eed 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/contributor/ladder.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/contributor/ladder.md
@@ -5,7 +5,7 @@ translated: true
本文档介绍了在项目中参与和提升的不同方式。您可以在贡献者角色中看到项目中的不同角色。
-## 贡献者阶梯
+### 贡献者阶梯
您好!我们很高兴您想了解更多关于我们项目贡献者阶梯的信息!这个贡献者阶梯概述了项目中的不同贡献者角色,以及与之相关的责任和特权。社区成员通常从“阶梯”的第一级开始,并随着他们在项目中的参与度增加而逐步提升。我们的项目成员乐于帮助您在贡献者阶梯上进步。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/core-concepts/architecture.md b/i18n/zh/docusaurus-plugin-content-docs/current/core-concepts/architecture.md
index 7c0ec0ef..7c53878e 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/core-concepts/architecture.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/core-concepts/architecture.md
@@ -5,7 +5,7 @@ translated: true
HAMi 的整体架构如下所示:
-
+
HAMi 由以下组件组成:
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/core-concepts/introduction.md b/i18n/zh/docusaurus-plugin-content-docs/current/core-concepts/introduction.md
index 54f507fa..68901dc2 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/core-concepts/introduction.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/core-concepts/introduction.md
@@ -4,38 +4,41 @@ translated: true
slug: /
---
-## HAMi:异构 AI 计算虚拟化中间件 {#hami-heterogeneous-ai-computing-virtualization-middleware}
+HAMi(异构 AI 计算虚拟化中间件)是一个用于管理 Kubernetes 集群中异构 AI 计算设备的开源平台。前身为 k8s-vGPU-scheduler,HAMi 可在多个容器和工作负载之间实现设备共享。
-异构 AI 计算虚拟化中间件(HAMi),前身为 k8s-vGPU-scheduler,是一个专为管理 k8s 集群中异构 AI 计算设备而设计的"一体化"Helm Chart。它能够实现异构 AI 设备在多个任务间的共享能力。
+HAMi 是[云原生计算基金会(CNCF)](https://cncf.io/)的 [Sandbox 项目](https://landscape.cncf.io/?item=orchestration-management--scheduling-orchestration--hami),并被收录于 [CNCF 技术全景图](https://landscape.cncf.io/?item=orchestration-management--scheduling-orchestration--hami)和 [CNAI 技术全景图](https://landscape.cncf.io/?group=cnai&item=orchestration-management--scheduling-orchestration--hami)。
-HAMi 是[云原生计算基金会(CNCF)](https://cncf.io/)的 SandBox 项目,同时被收录于[CNCF 技术全景图 - 编排与调度类目](https://landscape.cncf.io/?item=orchestration-management--scheduling-orchestration--hami)及[CNAI 技术全景图](https://landscape.cncf.io/?group=cnai&item=cnai--general-orchestration--hami)。
+## 核心特性
-## 为什么选择 HAMi {#why-hami}
+### 设备共享
-- **设备共享**
- - 支持多种异构 AI 计算设备(如 NVIDIA GPU/CUDA)
- - 支持多设备容器的设备共享
+- **多设备支持**:兼容多种异构 AI 计算设备(GPU、NPU 等)
+- **共享访问**:多个容器可同时共享设备,提高资源利用率
-- **设备显存控制**
- - 容器内硬性显存限制
- - 支持动态设备显存分配
- - 支持按 MB 或百分比分配内存
+### 内存管理
-- **设备规格指定**
- - 支持指定特定类型的异构 AI 计算设备
- - 支持通过设备 UUID 指定具体设备
+- **硬限制**:在容器内强制执行严格的内存限制,防止资源冲突
+- **动态分配**:根据工作负载需求按需分配设备内存
+- **灵活单位**:支持按 MB 或占总设备内存百分比的方式指定内存分配
-- **开箱即用**
- - 对容器内任务透明无感
- - 通过 helm 一键安装/卸载,简洁环保
+### 设备规格
-- **开放中立**
- - 由互联网、金融、制造业、云服务商等多领域联合发起
- - 以 CNCF 开放治理为目标
+- **类型选择**:可请求特定类型的异构 AI 计算设备
+- **UUID 定向**:使用设备 UUID 精确指定特定设备
-## 后续步骤 {#whats-next}
+### 易用性
+
+- **对工作负载透明**:容器内无需修改代码
+- **简单部署**:使用 Helm 轻松安装和卸载,配置简单
+
+### 开放治理
+
+- **社区驱动**:由互联网、金融、制造业、云服务等多个领域的组织联合发起
+- **中立发展**:作为开源项目由 CNCF 管理
+
+## 后续步骤
推荐继续了解:
- 学习 HAMi 的[架构设计](./architecture.md)
-- 开始[安装 HAMi](../installation/prequisities.md)
+- 在您的 Kubernetes 集群中[安装 HAMi](../installation/prerequisites.md)
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/developers/Dynamic-mig.md b/i18n/zh/docusaurus-plugin-content-docs/current/developers/Dynamic-mig.md
index c0dfd957..87b27308 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/developers/Dynamic-mig.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/developers/Dynamic-mig.md
@@ -7,7 +7,7 @@ translated: true
没有 @sailorvii 的帮助,这个功能将无法实现。
-## 介绍
+### 介绍
NVIDIA GPU 内置的共享方法包括:时间片、MPS 和 MIG。时间片共享的上下文切换会浪费一些时间,所以我们选择了 MPS 和 MIG。GPU MIG 配置是可变的,用户可以在配置定义中获取 MIG 设备,但当前实现仅在用户需求之前定义了专用配置。这限制了 MIG 的使用。我们希望开发一个自动切片插件,并在用户需要时创建切片。
对于调度方法,将支持节点级别的 binpack 和 spread。参考 binpack 插件,我们考虑了 CPU、内存、GPU 显存和其他用户定义的资源。
@@ -100,7 +100,7 @@ data:
## 结构
-
+
## 示例
@@ -147,7 +147,7 @@ spec:
使用动态-mig 的 vGPU 任务的流程如下所示:
-
+
请注意,在提交任务后,deviceshare 插件将遍历 configMap `hami-scheduler-device` 中定义的模板,并找到第一个可用的模板来适配。您可以随时更改该 configMap 的内容,并重新启动 vc-scheduler 进行自定义。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/developers/HAMi-core-design.md b/i18n/zh/docusaurus-plugin-content-docs/current/developers/HAMi-core-design.md
index 67fb9670..37a52125 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/developers/HAMi-core-design.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/developers/HAMi-core-design.md
@@ -6,7 +6,7 @@ HAMi-core是一个为 CUDA 环境设计的 hook 库,作为容器内的 GPU 资
[HAMi](https://github.com/HAMi-project/HAMi) 和
[Volcano](https://github.com/volcano-sh/devices) 等项目采用。
-
+
## 功能特性
@@ -14,7 +14,7 @@ HAMi-core 提供以下核心功能:
1. 设备显存虚拟化
- 
+ 
2. 限制设备使用率
@@ -27,4 +27,4 @@ HAMi-core 提供以下核心功能:
HAMi-core 通过劫持 CUDA 运行时库(`libcudart.so`)与 CUDA 驱动库(`libcuda.so`)之间的
API 调用来实现其功能,如下图所示:
-
+
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/developers/kunlunxin-topology.md b/i18n/zh/docusaurus-plugin-content-docs/current/developers/kunlunxin-topology.md
index 13eae7f9..05b51b9f 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/developers/kunlunxin-topology.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/developers/kunlunxin-topology.md
@@ -4,39 +4,39 @@ title: 昆仑芯拓扑感知调度
## 背景
-当单个P800服务器配置多块XPU时,若GPU连接或位于同一NUMA节点内(如下图所示),可获得最优性能表现。这种配置会在服务器内所有GPU之间形成特定拓扑关系。
+当单个 P800 服务器配置多块 XPU 时,若 GPU 连接或位于同一 NUMA 节点内(如下图所示),可获得最优性能表现。这种配置会在服务器内所有 GPU 之间形成特定拓扑关系。
-
+
-当用户作业申请特定数量的`kunlunxin.com/xpu`资源时,Kubernetes会将pod调度到合适节点以最小化资源碎片并保持高性能。选定节点后,XPU设备会根据以下规则进行细粒度资源分配:
+当用户作业申请特定数量的`kunlunxin.com/xpu`资源时,Kubernetes 会将 pod 调度到合适节点以最小化资源碎片并保持高性能。选定节点后,XPU 设备会根据以下规则进行细粒度资源分配:
-1. 仅允许1、2、4或8卡分配方案
-2. 1/2/4卡分配不得跨NUMA节点
+1. 仅允许 1、2、4 或 8 卡分配方案
+2. 1/2/4 卡分配不得跨 NUMA 节点
3. 分配后应最小化资源碎片
## 过滤阶段
-过滤阶段识别所有符合分配条件的节点。针对每个节点,系统会筛选最优XPU组合方案并缓存,供评分阶段使用。筛选流程如下图所示:
+过滤阶段识别所有符合分配条件的节点。针对每个节点,系统会筛选最优 XPU 组合方案并缓存,供评分阶段使用。筛选流程如下图所示:
-
+
## 评分阶段
在评分阶段,所有通过过滤的节点会接受评估并打分以选择最优调度目标。我们引入**MTF**(最小填充分任务数)指标,用于量化节点在分配后容纳未来任务的能力。
-下表展示了XPU占用情况与对应MTF值的示例:
+下表展示了 XPU 占用情况与对应 MTF 值的示例:
-| XPU占用状态 | MTF | 说明 |
+| XPU 占用状态 | MTF | 说明 |
|----------------|-----|-------------|
| 11111111 | 0 | 完全占用,无法调度新任务 |
-| 00000000 | 1 | 可被一个8-XPU任务完全占用 |
-| 00000011 | 2 | 可调度一个4-XPU任务和一个2-XPU任务 |
-| 00000001 | 3 | 可容纳一个4-XPU、一个2-XPU和一个1-XPU任务 |
-| 00010001 | 4 | 可容纳两个2-XPU任务和两个1-XPU任务 |
+| 00000000 | 1 | 可被一个 8-XPU 任务完全占用 |
+| 00000011 | 2 | 可调度一个 4-XPU 任务和一个 2-XPU 任务 |
+| 00000001 | 3 | 可容纳一个 4-XPU、一个 2-XPU 和一个 1-XPU 任务 |
+| 00010001 | 4 | 可容纳两个 2-XPU 任务和两个 1-XPU 任务 |
-节点得分基于分配前后的**MTF差值**计算。差值越小表示适配度越高,得分也越高。具体评分逻辑如下:
+节点得分基于分配前后的**MTF 差值**计算。差值越小表示适配度越高,得分也越高。具体评分逻辑如下:
-| MTF差值 | 得分 | 示例 |
+| MTF 差值 | 得分 | 示例 |
|------------|-------|---------|
| -1 | 2000 | 00000111->00001111 |
| 0 | 1000 | 00000111->00110111 |
@@ -45,8 +45,8 @@ title: 昆仑芯拓扑感知调度
## 绑定阶段
-在绑定阶段,分配结果会以注解形式注入pod。例如:
+在绑定阶段,分配结果会以注解形式注入 pod。例如:
-```
+```text
BAIDU_COM_DEVICE_IDX=0,1,2,3
```
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/developers/mindmap.md b/i18n/zh/docusaurus-plugin-content-docs/current/developers/mindmap.md
index 8b5ea4e7..fbe20073 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/developers/mindmap.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/developers/mindmap.md
@@ -5,4 +5,4 @@ translated: true
## 思维导图
-
\ No newline at end of file
+
\ No newline at end of file
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/developers/protocol.md b/i18n/zh/docusaurus-plugin-content-docs/current/developers/protocol.md
index f196092c..b35cd2f6 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/developers/protocol.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/developers/protocol.md
@@ -11,11 +11,11 @@ translated: true
然而,device-plugin 设备注册 API 并未提供相应的参数获取,因此 HAMi-device-plugin 在注册时将这些补充信息存储在节点的注释中,以供调度器读取,如下图所示:
-
+
这里需要使用两个注释,其中一个是时间戳,如果超过指定的阈值,则认为对应节点上的设备无效。另一个是设备注册信息。一个具有 2 个 32G-V100 GPU 的节点可以注册如下所示:
-```
+```yaml
hami.io/node-handshake: Requesting_2024.05.14 07:07:33
hami.io/node-nvidia-register: 'GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec,10,32768,100,NVIDIA-Tesla V100-PCIE-32GB,0,true:GPU-0fc3eda5-e98b-a25b-5b0d-cf5c855d1448,10,32768,100,NVIDIA-Tesla V100-PCIE-32GB,0,true:'
```
@@ -26,12 +26,13 @@ kube-scheduler 在 `bind` 过程中调用 device-plugin 挂载设备,但仅向
因此,有必要开发一个协议,使调度器层与 device-plugin 进行通信以传递任务调度信息。调度器通过将调度结果补丁到 Pod 的注释中并在 device-plugin 中读取它来传递此信息,如下图所示:
-
+
在此过程中,需要设置 3 个注释,分别是 `时间戳`、`待分配设备` 和 `已分配设备`。调度器创建时,`待分配设备` 和 `已分配设备` 的内容相同,但 device-plugin 将根据 `待分配设备` 的内容确定当前设备分配情况,当分配成功时,相应设备将从注释中移除,因此当任务成功运行时,`待分配设备` 的内容将为空。
一个请求 3000M 设备显存的 GPU 任务的示例将生成如下的相应注释:
-```
+
+```yaml
hami.io/bind-time: 1716199325
hami.io/vgpu-devices-allocated: GPU-0fc3eda5-e98b-a25b-5b0d-cf5c855d1448,NVIDIA,3000,0:;
-hami.io/vgpu-devices-to-allocate: ;
\ No newline at end of file
+hami.io/vgpu-devices-to-allocate: ;
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/get-started/deploy-with-helm.md b/i18n/zh/docusaurus-plugin-content-docs/current/get-started/deploy-with-helm.md
index 6fe6c9f2..7aa5ea89 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/get-started/deploy-with-helm.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/get-started/deploy-with-helm.md
@@ -18,7 +18,7 @@ title: 使用 Helm 部署 HAMi
## 安装步骤 {#installation}
-### 1. 配置 nvidia-container-toolkit {#configure-nvidia-container-toolkit}
+### 配置 nvidia-container-toolkit {#configure-nvidia-container-toolkit}
配置 nvidia-container-toolkit
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/aws-installation.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/aws-installation.md
index ee68b65a..127070b9 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/aws-installation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/aws-installation.md
@@ -1,5 +1,6 @@
---
-title: HAMi on AWS
+title: 在 AWS 上安装与使用 HAMi
+linktitle: HAMi on AWS
translated: true
---
@@ -49,7 +50,7 @@ kubectl get pods -n kube-system
### NVIDIA 设备
-[使用独占 GPU](https://project-hami.io/zh/docs/userguide/NVIDIA-device/examples/use-exclusive-card)
-[为容器分配特定设备显存](https://project-hami.io/zh/docs/userguide/NVIDIA-device/examples/allocate-device-memory)
-[为容器分配设备核心资源](https://project-hami.io/zh/docs/userguide/NVIDIA-device/examples/allocate-device-core)
-[将任务分配给 mig 实例](https://project-hami.io/zh/docs/userguide/NVIDIA-device/examples/dynamic-mig-example)
+- [使用独占 GPU](https://project-hami.io/zh/docs/userguide/nvidia-device/examples/use-exclusive-card)
+- [为容器分配特定设备显存](https://project-hami.io/zh/docs/userguide/nvidia-device/examples/allocate-device-memory)
+- [为容器分配设备核心资源](https://project-hami.io/zh/docs/userguide/nvidia-device/examples/allocate-device-core)
+- [将任务分配给 mig 实例](https://project-hami.io/zh/docs/userguide/nvidia-device/examples/dynamic-mig-example)
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-hami-dra.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-hami-dra.md
index e2b8fc99..6835d0c5 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-hami-dra.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-hami-dra.md
@@ -3,13 +3,15 @@ title: HAMi DRA
translated: true
---
-# Kubernetes 的 HAMi DRA
+## Kubernetes 的 HAMi DRA
## 介绍
+
HAMi 已经提供了对 K8s [DRA](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)(动态资源分配)功能的支持。
通过在集群中安装 [HAMi Dra webhook](https://github.com/Project-HAMi/HAMi-DRA) 你可以在 DRA 模式下获得与传统使用方式一致的使用体验。
## 前提条件
+
* Kubernetes 版本 >= 1.34 并且 DRA Consumable Capacity [featuregate](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/) 启用
## 安装
@@ -22,6 +24,7 @@ helm dependency build
```
然后用以下命令进行安装:
+
```bash
helm install hami hami-charts/hami --set dra.enable=true -n hami-system
```
@@ -29,8 +32,10 @@ helm install hami hami-charts/hami --set dra.enable=true -n hami-system
> **注意:** *DRA 模式与传统模式不兼容,请勿同时启用。*
## 支持的设备
+
DRA 功能的实现需要对应设备的 DRA Driver 提供支持,目前支持的设备包括:
-* [NVIDIA GPU](../userguide/NVIDIA-device/dynamic-resource-allocation.md)
+
+* [NVIDIA GPU](../userguide/nvidia-device/dynamic-resource-allocation)
请参照对应的页面安装设备驱动。
@@ -38,6 +43,6 @@ DRA 功能的实现需要对应设备的 DRA Driver 提供支持,目前支持
HAMi DRA 提供了与传统模式相同的监控功能,安装 HAMi DRA 时会默认启用监控服务,你可以将监控服务通过 NodePort 暴露到本地,或者添加 Prometheus 采集来访问监控指标。
-你可以在 [这里](../userguide/monitoring/device-allocation.md) 查看 HAMi DRA 提供的监控指标。
+你可以在 [这里](../userguide/monitoring/device-allocation) 查看 HAMi DRA 提供的监控指标。
-更多信息请参考 [HAMi DRA monitor](https://github.com/Project-HAMi/HAMi-DRA/blob/main/docs/MONITOR.md)
\ No newline at end of file
+更多信息请参考 [HAMi DRA monitor](https://github.com/Project-HAMi/HAMi-DRA/blob/main/docs/MONITOR.md)
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-ascend.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-ascend.md
index 531a5a71..b113d0f8 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-ascend.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-ascend.md
@@ -1,10 +1,8 @@
---
-title: Volcano Ascend vNPU
+title: Volcano Ascend vNPU 使用指南
translated: true
---
-# Volcano 中 Ascend 设备使用指南
-
## 介绍
Volcano 通过 `ascend-device-plugin` 支持 Ascend 310 和 Ascend 910 的 vNPU 功能。同时支持管理异构 Ascend 集群(包含多种 Ascend 类型的集群,例如 910A、910B2、910B3、310p)。
@@ -15,7 +13,7 @@ Volcano 通过 `ascend-device-plugin` 支持 Ascend 310 和 Ascend 910 的 vNPU
- Ascend 310 系列的 NPU 和 vNPU 集群
- 异构 Ascend 集群
-此功能仅在Volcano 1.14及以上版本中可用。
+此功能仅在 Volcano 1.14 及以上版本中可用。
## 快速开始
@@ -23,7 +21,7 @@ Volcano 通过 `ascend-device-plugin` 支持 Ascend 310 和 Ascend 910 的 vNPU
[ascend-docker-runtime](https://gitcode.com/Ascend/mind-cluster/tree/master/component/ascend-docker-runtime)
-### 安装Volcano
+### 安装 Volcano
```shell
helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
@@ -36,7 +34,7 @@ helm install volcano volcano-sh/volcano -n volcano-system --create-namespace
```shell
kubectl label node {ascend-node} ascend=on
-```
+```
### 部署 hami-scheduler-device ConfigMap
@@ -49,6 +47,7 @@ kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-pl
```shell
kubectl apply -f https://raw.githubusercontent.com/Project-HAMi/ascend-device-plugin/refs/heads/main/ascend-device-plugin.yaml
```
+
更多信息请参考 [ascend-device-plugin 文档](https://github.com/Project-HAMi/ascend-device-plugin)。
### 更新调度器配置
@@ -103,7 +102,9 @@ spec:
huawei.com/Ascend310P-memory: "4096"
```
-支持的 Ascend 芯片及其对应的资源名称如下表所示:
+
+支持的 Ascend 芯片及其对应的资源名称如下表所示:
+
| ChipName | ResourceName | ResourceMemoryName |
|-------|-------|-------|
| 910A | huawei.com/Ascend910A | huawei.com/Ascend910A-memory |
@@ -111,4 +112,4 @@ spec:
| 910B3 | huawei.com/Ascend910B3 | huawei.com/Ascend910B3-memory |
| 910B4 | huawei.com/Ascend910B4 | huawei.com/Ascend910B4-memory |
| 910B4-1 | huawei.com/Ascend910B4-1 | huawei.com/Ascend910B4-1-memory |
-| 310P3 | huawei.com/Ascend310P | huawei.com/Ascend310P-memory |
\ No newline at end of file
+| 310P3 | huawei.com/Ascend310P | huawei.com/Ascend310P-memory |
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-vgpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-vgpu.md
index 2ba6b308..e1202bed 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-vgpu.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/how-to-use-volcano-vgpu.md
@@ -1,9 +1,9 @@
---
-title: Volcano vGPU
+title: Volcano vGPU 使用指南
translated: true
---
-# Kubernetes 的 Volcano vgpu 设备插件
+## Kubernetes 的 Volcano vgpu 设备插件
**注意**:
@@ -53,8 +53,8 @@ data:
一旦您在*所有*希望使用的 GPU 节点上启用了此选项,您就可以通过部署以下 Daemonset 在集群中启用 GPU 支持:
-```
-$ kubectl create -f https://raw.githubusercontent.com/Project-HAMi/volcano-vgpu-device-plugin/main/volcano-vgpu-device-plugin.yml
+```bash
+kubectl create -f https://raw.githubusercontent.com/Project-HAMi/volcano-vgpu-device-plugin/main/volcano-vgpu-device-plugin.yml
```
### 验证环境是否准备好
@@ -122,5 +122,6 @@ EOF
volcano-scheduler-metrics 记录每个 GPU 的使用和限制,访问以下地址以获取这些指标。
+```bash
+curl {volcano scheduler cluster ip}:8080/metrics
```
-curl {volcano scheduler cluster ip}:8080/metrics
\ No newline at end of file
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/online-installation.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/online-installation.md
index 7758f7b5..ff2585f1 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/online-installation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/online-installation.md
@@ -1,15 +1,16 @@
---
+linktitle: 通过 Helm 在线安装
title: 通过 Helm 在线安装(推荐)
translated: true
---
-最佳实践是使用 helm 部署 HAMi。
+推荐使用 Helm 部署 HAMi。
## 添加 HAMi 仓库
您可以使用以下命令添加 HAMi 图表仓库:
-```
+```bash
helm repo add hami-charts https://project-hami.github.io/HAMi/
```
@@ -17,26 +18,27 @@ helm repo add hami-charts https://project-hami.github.io/HAMi/
安装时需要 Kubernetes 版本。您可以使用以下命令获取此信息:
-```
-kubectl version
+```bash
+kubectl version --short
```
## 安装
-在安装过程中,将 Kubernetes 调度器镜像版本设置为与您的 Kubernetes 服务器版本匹配。例如,如果您的集群服务器版本是 1.16.8,请使用以下命令进行部署:
+确保 `scheduler.kubeScheduler.imageTag` 与您的 Kubernetes 服务器版本匹配。
+例如,如果您的集群服务器版本是 v1.16.8,请使用以下命令进行部署:
-```
+```bash
helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag=v1.16.8 -n kube-system
```
-您可以通过调整[配置](../userguide/configure.md)来自定义安装。
+您可以通过编辑[配置](../userguide/configure.md)来自定义安装。
## 验证您的安装
您可以使用以下命令验证您的安装:
-```
+```bash
kubectl get pods -n kube-system
```
-如果 hami-device-plugin 和 hami-scheduler pods 都处于 Running 状态,则说明您的安装成功。
\ No newline at end of file
+如果 hami-device-plugin 和 hami-scheduler 这两个 Pod 都处于 Running 状态,则说明您的安装成功。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/prequisities.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/prerequisites.md
similarity index 100%
rename from i18n/zh/docusaurus-plugin-content-docs/current/installation/prequisities.md
rename to i18n/zh/docusaurus-plugin-content-docs/current/installation/prerequisites.md
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/upgrade.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/upgrade.md
index 43f4be20..b9c53d53 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/upgrade.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/upgrade.md
@@ -3,12 +3,12 @@ title: 升级 HAMi
translated: true
---
-将HAMi升级到最新版本是一个简单的过程,更新仓库并重新启动图表:
+将 HAMi 升级到最新版本是一个简单的过程,更新仓库并重新启动 Chart:
-```
+```bash
helm uninstall hami -n kube-system
helm repo update
helm install hami hami-charts/hami -n kube-system
```
-> **警告:** *如果在不清除已提交任务的情况下升级HAMi,可能会导致分段错误。*
\ No newline at end of file
+> **警告:** *如果在不清除已提交任务的情况下升级 HAMi,可能会导致分段错误。*
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/installation/webui-installation.md b/i18n/zh/docusaurus-plugin-content-docs/current/installation/webui-installation.md
index 50ee122e..1c3bce04 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/installation/webui-installation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/installation/webui-installation.md
@@ -1,10 +1,9 @@
---
-title: WebUI
+linktitle: 安装 WebUI
translated: true
+title: 使用 Helm Charts 部署 HAMi-WebUI
---
-# 使用 Helm Charts 部署 HAMi-WebUI
-
本主题包含在 Kubernetes 上使用 Helm Charts 安装和运行 HAMi-WebUI 的说明。
WebUI 只能通过本地主机访问,因此您需要通过配置 `~/.kube/config` 将本地主机连接到集群。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/key-features/device-resource-isolation.md b/i18n/zh/docusaurus-plugin-content-docs/current/key-features/device-resource-isolation.md
index 9d711d06..39eba3b2 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/key-features/device-resource-isolation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/key-features/device-resource-isolation.md
@@ -6,13 +6,13 @@ translated: true
一个用于设备隔离的简单演示:
一个具有以下资源的任务。
-```
+```yaml
resources:
limits:
- nvidia.com/gpu: 1 # 请求1个vGPU
- nvidia.com/gpumem: 3000 # 每个vGPU包含3000m设备显存
+ nvidia.com/gpu: 1 # 请求 1 个 vGPU
+ nvidia.com/gpumem: 3000 # 每个 vGPU 包含 3000m 设备显存
```
将在容器内看到 3G 设备显存
-
+
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/key-features/device-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/key-features/device-sharing.md
index 9e9393f0..4dcff58f 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/key-features/device-sharing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/key-features/device-sharing.md
@@ -3,10 +3,10 @@ title: 设备共享
translated: true
---
-- 通过设置核心使用率(百分比),进行设备的部分分配
-- 通过设置显存(单位:MB),进行设备的部分分配
-- 对流式多处理器进行硬限制
-- 无需对现有程序进行任何修改
-- 支持动态MIG切片能力,样例
+- 通过设置核心使用率(百分比),进行设备的部分分配
+- 通过设置显存(单位:MB),进行设备的部分分配
+- 对流式多处理器进行硬限制
+- 无需对现有程序进行任何修改
+- 支持动态 MIG 切片能力,样例
-
+
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/releases.md b/i18n/zh/docusaurus-plugin-content-docs/current/releases.md
index d7becba5..dfd2a2ff 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/releases.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/releases.md
@@ -1,5 +1,5 @@
---
-title: Releases
+title: 发布记录
translated: true
---
@@ -64,4 +64,4 @@ HAMi 使用 GitHub 标签来管理版本。新版本和候选版本使用通配
主要功能遵循 HAMi 设计提案流程。您可以参考[此处](https://github.com/Project-HAMi/HAMi/tree/master/docs/proposals/resource-interpreter-webhook)作为提案示例。
-在发布开始时,可能会有许多问题分配给发布里程碑。发布的优先级在每两周一次的社区会议中讨论。随着发布的进展,几个问题可能会被移到下一个里程碑。因此,如果一个问题很重要,重要的是在发布周期的早期倡导其优先级。
\ No newline at end of file
+在发布开始时,可能会有许多问题分配给发布里程碑。发布的优先级在每两周一次的社区会议中讨论。随着发布的进展,几个问题可能会被移到下一个里程碑。因此,如果一个问题很重要,重要的是在发布周期的早期倡导其优先级。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/AWSNeuron-device/enable-awsneuron-managing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/AWSNeuron-device/enable-awsneuron-managing.md
index cc90c467..7ad3b966 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/AWSNeuron-device/enable-awsneuron-managing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/AWSNeuron-device/enable-awsneuron-managing.md
@@ -1,8 +1,9 @@
---
title: 启用 AWS-Neuron 设备共享
+linktitle: AWS-Neuron 共享
---
-## 概述
+## 启用 AWS-Neuron 设备共享
AWS Neuron 设备是 AWS 专为机器学习工作负载设计的硬件加速器,特别针对深度学习推理和训练场景进行了优化。这些设备属于 AWS Inferentia 和 Trainium 产品家族,可在 AWS 云上为 AI 应用提供高性能、高性价比且可扩展的解决方案。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Ascend-device/enable-ascend-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Ascend-device/enable-ascend-sharing.md
index 55dda920..ed124de7 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Ascend-device/enable-ascend-sharing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Ascend-device/enable-ascend-sharing.md
@@ -1,8 +1,11 @@
---
title: 启用 Ascend 共享
+linktitle: Ascend 共享
translated: true
---
+## 启用 Ascend 共享
+
基于虚拟化模板支持显存切片,自动使用可用的租赁模板。有关详细信息,请查看[设备模板](./device-template.md)。
## 先决条件
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/enable-cambricon-mlu-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/enable-cambricon-mlu-sharing.md
index e030ac1d..de4dd535 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/enable-cambricon-mlu-sharing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/enable-cambricon-mlu-sharing.md
@@ -1,5 +1,6 @@
---
title: 启用寒武纪 MLU 共享
+linktitle: MLU 共享
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/examples/allocate-core-and-memory.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/examples/allocate-core-and-memory.md
index fa08c0f3..1de75829 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/examples/allocate-core-and-memory.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/examples/allocate-core-and-memory.md
@@ -1,9 +1,10 @@
---
title: 为容器分配设备核心和显存资源
+linktitle: 分配核心和显存
translated: true
---
-## 为容器分配设备核心和显存
+## 为容器分配设备核心和显存资源
要分配设备核心资源的某一部分,您只需在容器中使用 `cambricon.com/vmlu` 指定所需的寒武纪 MLU 数量,并分配 `cambricon.com/mlu370.smlu.vmemory` 和 `cambricon.com/mlu370.smlu.vcore`。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/examples/allocate-exclusive.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/examples/allocate-exclusive.md
index b2fcd8b4..8f2ad239 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/examples/allocate-exclusive.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/examples/allocate-exclusive.md
@@ -1,5 +1,6 @@
---
title: 分配独占设备
+linktitle: 独占设备
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-core-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-core-usage.md
index 865097c5..d33e4264 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-core-usage.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-core-usage.md
@@ -1,14 +1,13 @@
---
title: 分配设备核心给容器
+linktitle: 指定核心
translated: true
---
-## 分配设备核心给容器
-
通过指定资源 `cambricon.com/mlu.smlu.vcore` 来分配设备核心资源的百分比。
可选项,每个 `cambricon.com/mlu.smlu.vcore` 单位等于设备核心的 1%。
-```
+```yaml
resources:
limits:
cambricon.com/vmlu: 1 # 请求 1 个 MLU
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-memory-usage.md
index 2ff691b4..c92bc4b5 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-memory-usage.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-memory-usage.md
@@ -1,5 +1,6 @@
---
title: 为容器分配设备显存
+linktitle: 指定显存
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-type-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-type-to-use.md
index afaddfee..d6f61651 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-type-to-use.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Cambricon-device/specify-device-type-to-use.md
@@ -1,5 +1,6 @@
---
title: 分配到特定设备类型
+linktitle: 指定设备类型
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Enflame-device/enable-enflame-gcu-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Enflame-device/enable-enflame-gcu-sharing.md
index f78912e4..e555567d 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Enflame-device/enable-enflame-gcu-sharing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Enflame-device/enable-enflame-gcu-sharing.md
@@ -1,5 +1,6 @@
---
title: 启用燧原 GPU 共享
+linktitle: GPU 共享
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/enable-hygon-dcu-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/enable-hygon-dcu-sharing.md
index dbb0b0fa..a8ce70bc 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/enable-hygon-dcu-sharing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/enable-hygon-dcu-sharing.md
@@ -1,5 +1,6 @@
---
title: 启用 Hygon DCU 共享
+linktitle: DCU 共享
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/allocate-core-and-memory.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/allocate-core-and-memory.md
index fa7cd013..e3ea48b5 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/allocate-core-and-memory.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/allocate-core-and-memory.md
@@ -1,9 +1,10 @@
---
title: 为容器分配设备核心和显存资源
+linktitle: 分配核心和显存
translated: true
---
-## 为容器分配设备核心和显存
+## 为容器分配设备核心和显存资源
要分配设备核心资源的某一部分,您只需在容器中使用 `hygon.com/dcunum` 请求的海光 DCU 数量,并分配 `hygon.com/dcucores` 和 `hygon.com/dcumem`。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/allocate-exclusive.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/allocate-exclusive.md
index f79ce319..dc526e6c 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/allocate-exclusive.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/allocate-exclusive.md
@@ -1,5 +1,6 @@
---
title: 分配独占设备
+linktitle: 独占设备
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/specify-certain-cards.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/specify-certain-cards.md
index afa0045b..53d825a8 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/specify-certain-cards.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/examples/specify-certain-cards.md
@@ -1,5 +1,6 @@
---
title: 将任务分配给特定的 DCU
+linktitle: 指定 DCU
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-core-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-core-usage.md
index 949b281d..776e833c 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-core-usage.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-core-usage.md
@@ -1,10 +1,9 @@
---
title: 分配设备核心给容器
+linktitle: 指定核心
translated: true
---
-## 分配设备核心给容器
-
通过指定资源 `hygon.com/dcucores` 来分配设备核心资源的百分比。
可选项,每个 `hygon.com/dcucores` 单位等于设备核心的 1%。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-memory-usage.md
index 1ff094df..e56642ad 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-memory-usage.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-memory-usage.md
@@ -1,5 +1,6 @@
---
title: 为容器分配设备显存
+linktitle: 指定显存
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-uuid-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-uuid-to-use.md
index 1d4544ae..bd521c69 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-uuid-to-use.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Hygon-device/specify-device-uuid-to-use.md
@@ -1,9 +1,10 @@
---
title: 分配到特定设备
+linktitle: 指定设备
translated: true
---
-## 分配到特定设备类型
+## 分配到特定设备
有时任务可能希望在某个特定的DCU上运行,可以在pod注释中填写`hygon.com/use-gpuuuid`字段。HAMi调度器将尝试匹配具有该UUID的设备。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Iluvatar-device/enable-illuvatar-gpu-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Iluvatar-device/enable-illuvatar-gpu-sharing.md
index 63f8b1d4..884c82f7 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Iluvatar-device/enable-illuvatar-gpu-sharing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Iluvatar-device/enable-illuvatar-gpu-sharing.md
@@ -1,8 +1,11 @@
---
title: 启用天数智芯 GPU 共享
+linktitle: GPU 共享
translated: true
---
+## 启用天数智芯 GPU 共享
+
本组件支持复用天数智芯 GPU 设备 (MR-V100、BI-V150、BI-V100),并为此提供以下几种与 vGPU 类似的复用功能,包括:
***GPU 共享***: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kueue/how-to-use-kueue.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kueue/how-to-use-kueue.md
index 88a1527e..aba9d71a 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kueue/how-to-use-kueue.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kueue/how-to-use-kueue.md
@@ -2,7 +2,7 @@
title: 如何在 HAMi 上使用 Kueue
---
-# 在 HAMi 中使用 Kueue
+## 在 HAMi 中使用 Kueue
本指南将帮助你使用 Kueue 来管理 HAMi vGPU 资源,包括启用 Deployment 支持、配置
ResourceTransformation,以及创建请求 vGPU 资源的工作负载。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/enable-kunlunxin-schedule.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/enable-kunlunxin-schedule.md
index f55a09d0..326f6ec1 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/enable-kunlunxin-schedule.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/enable-kunlunxin-schedule.md
@@ -1,12 +1,15 @@
---
title: 启用昆仑芯 GPU 拓扑感知调度
+linktitle: 拓扑感知调度
---
+## 启用昆仑芯 GPU 拓扑感知调度
+
**昆仑芯 GPU 拓扑感知调度现在通过 `kunlunxin.com/xpu` 资源得到支持。**
-当在单个P800服务器上配置多个XPU时,当XPU卡连接到同一NUMA节点或互相之间可以直接连接时,性能会显著提升。从而在服务器上的所有 XPU 之间形成拓扑,如下所示:
+当在单个 P800 服务器上配置多个 XPU 时,当 XPU 卡连接到同一 NUMA 节点或互相之间可以直接连接时,性能会显著提升。从而在服务器上的所有 XPU 之间形成拓扑,如下所示:
-
+
当用户作业请求一定数量的 `kunlunxin.com/xpu` 资源时,
Kubernetes 将 Pod 调度到适当的节点上,目标是减少碎片化
@@ -30,9 +33,9 @@ Kubernetes 将 Pod 调度到适当的节点上,目标是减少碎片化
## 启用拓扑感知调度
-- 在 P800 节点上部署昆仑芯设备插件。
+* 在 P800 节点上部署昆仑芯设备插件。
(请联系您的设备供应商获取相应的软件包和文档。)
-- 按照 `README.md` 中的说明部署 HAMi。
+* 按照 `README.md` 中的说明部署 HAMi。
## 运行昆仑芯作业
@@ -53,4 +56,4 @@ spec:
resources:
limits:
kunlunxin.com/xpu: 4 # 请求 4 个 XPU
-```
\ No newline at end of file
+```
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_vxpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_vxpu.md
index bb8002cf..1a2d32ab 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_vxpu.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_vxpu.md
@@ -2,8 +2,6 @@
title: 分配 vxpu 设备
---
-## 分配 vxpu 设备
-
要分配特定显存大小的 vxpu,您只需要分配 `kunlunxin.com/vxpu` 以及 `kunlunxin.com/vxpu-memory`
```yaml
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_whole_xpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_whole_xpu.md
index b0400544..7265088d 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_whole_xpu.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Kunlunxin-device/examples/allocate_whole_xpu.md
@@ -1,8 +1,9 @@
---
title: 分配整个 xpu 卡
+linktitle: 分配整个卡
---
-## 分配独占设备
+## 分配整个 xpu 卡
要分配整个 xpu 设备,您只需要分配 `kunlunxin.com/xpu`,无需其他字段。您可以为容器分配多个 XPU。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Metax-device/Metax-GPU/enable-metax-gpu-schedule.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Metax-device/Metax-GPU/enable-metax-gpu-schedule.md
index c546ae31..0c05a0a5 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Metax-device/Metax-GPU/enable-metax-gpu-schedule.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Metax-device/Metax-GPU/enable-metax-gpu-schedule.md
@@ -1,13 +1,16 @@
---
title: 启用沐曦 GPU 拓扑感知调度
+linktitle: 拓扑感知调度
translated: true
---
+## 启用沐曦 GPU 拓扑感知调度
+
**HAMi 现在通过在沐曦 GPU 之间实现拓扑感知来支持 metax.com/gpu**:
当在单个服务器上配置多个 GPU 时,GPU 卡根据它们是否连接到同一个 PCIe 交换机或 MetaXLink 而存在远近关系。这在服务器上的所有卡之间形成了一个拓扑,如下图所示:
-
+
用户作业请求一定数量的 metax-tech.com/gpu 资源,Kubernetes 将 Pod 调度到适当的节点。gpu-device 进一步处理在资源节点上分配剩余资源的逻辑,遵循以下标准:
@@ -18,11 +21,11 @@ translated: true
2. 使用 `node-scheduler-policy=spread` 时,尽可能将 Metax 资源分配在同一个 Metaxlink 或 Paiswich 下,如下图所示:
- 
+ 
3. 使用 `node-scheduler-policy=binpack` 时,分配 GPU 资源,以尽量减少对 MetaxXLink 拓扑的破坏,如下图所示:
- 
+ 
## 重要说明
@@ -32,14 +35,14 @@ translated: true
## 前提条件
-* 沐曦 GPU 插件 >= 0.8.0
-* Kubernetes 版本 >= 1.23
+- 沐曦 GPU 插件 >= 0.8.0
+- Kubernetes 版本 >= 1.23
## 启用拓扑感知调度
-* 在 metax 节点上部署沐曦 GPU 插件(请咨询您的设备提供商以获取其软件包和文档)
+- 在 metax 节点上部署沐曦 GPU 插件(请咨询您的设备提供商以获取其软件包和文档)
-* 根据 README.md 部署 HAMi
+- 根据 README.md 部署 HAMi
## 运行 Metax 作业
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Metax-device/Metax-sGPU/enable-metax-gpu-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Metax-device/Metax-sGPU/enable-metax-gpu-sharing.md
index 575b9d2c..9aa64ae6 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Metax-device/Metax-sGPU/enable-metax-gpu-sharing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Metax-device/Metax-sGPU/enable-metax-gpu-sharing.md
@@ -1,8 +1,11 @@
---
title: 启用沐曦 GPU 共享
+linktitle: GPU 共享
translated: true
---
+## 启用沐曦 GPU 共享
+
**HAMi 目前支持复用沐曦 GPU 设备,提供与 vGPU 类似的复用功能**,包括:
- **GPU 共享**: 每个任务可以只占用一部分显卡,多个任务可以共享一张显卡
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/enable-mthreads-gpu-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/enable-mthreads-gpu-sharing.md
index b6034d4a..c22fe492 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/enable-mthreads-gpu-sharing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/enable-mthreads-gpu-sharing.md
@@ -1,5 +1,6 @@
---
title: 启用 Mthreads GPU 共享
+linktitle: GPU 共享
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/examples/allocate-core-and-memory.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/examples/allocate-core-and-memory.md
index 545df768..f3b555fe 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/examples/allocate-core-and-memory.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/examples/allocate-core-and-memory.md
@@ -1,9 +1,10 @@
---
title: 为容器分配设备核心和显存资源
+linktitle: 分配核心和显存
translated: true
---
-## 为容器分配设备核心和显存
+## 为容器分配设备核心和显存资源
要分配设备核心资源的一部分,您只需在容器中使用 `mthreads.com/vgpu` 请求的寒武纪 MLU 数量的同时,分配 `mthreads.com/sgpu-memory` 和 `mthreads.com/sgpu-core`。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/examples/allocate-exclusive.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/examples/allocate-exclusive.md
index 94d15cdf..a2e37e0e 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/examples/allocate-exclusive.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/examples/allocate-exclusive.md
@@ -1,5 +1,6 @@
---
title: 分配独占设备
+linktitle: 独占设备
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/specify-device-core-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/specify-device-core-usage.md
index 3b8111d7..c71ec2f5 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/specify-device-core-usage.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/specify-device-core-usage.md
@@ -1,13 +1,12 @@
---
title: 分配设备核心给容器
+linktitle: 指定核心
translated: true
---
-## 为容器分配设备核心
-
通过指定资源 `mthreads.com/sgpu-core` 来分配部分设备核心资源。可选项,每个 `mthreads.com/smlu-core` 单位等于 1/16 的设备核心。
-```
+```yaml
resources:
limits:
mthreads.com/vgpu: 1 # 请求 1 个 GPU
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/specify-device-memory-usage.md
index 168223e1..ccc6aa0f 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/specify-device-memory-usage.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/Mthreads-device/specify-device-memory-usage.md
@@ -1,5 +1,6 @@
---
title: 为容器分配设备显存
+linktitle: 指定显存
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/dynamic-resource-allocation.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/dynamic-resource-allocation.md
index 6e086585..20d09ad7 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/dynamic-resource-allocation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/dynamic-resource-allocation.md
@@ -3,7 +3,7 @@ title: 动态资源分配
translated: true
---
-# 动态资源分配
+## 动态资源分配
## 介绍
@@ -16,11 +16,12 @@ HAMi 已经在 NVIDIA 设备上支持了 K8s [DRA](https://kubernetes.io/docs/co
## 安装
-Nvidia dra driver 内置在 HAMi 中,无需单独安装,只需要在[安装 HAMi DRA](../../installation/how-to-use-hami-dra.md) 时指定 `--set hami-dra-webhook.drivers.nvidia.enabled=true` 参数即可。更多信息请参考[安装 Nvidia DRA driver](https://github.com/Project-HAMi/HAMi-DRA?tab=readme-ov-file#installation)
+Nvidia dra driver 内置在 HAMi 中,无需单独安装,只需要在[安装 HAMi DRA](../../installation/how-to-use-hami-dra) 时指定 `--set hami-dra-webhook.drivers.nvidia.enabled=true` 参数即可。更多信息请参考[安装 Nvidia DRA driver](https://github.com/Project-HAMi/HAMi-DRA?tab=readme-ov-file#installation)
## 验证安装
验证安装成功,请使用以下命令查看 GPU 设备:
+
```bash
kubectl get resourceslices.resource.k8s.io -A
-```
\ No newline at end of file
+```
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-core.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-core.md
index f8814a62..14798629 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-core.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-core.md
@@ -1,9 +1,10 @@
---
title: 为容器分配设备核心资源
+linktitle: 分配核心
translated: true
---
-## 将设备核心分配给容器
+## 为容器分配设备核心资源
要分配设备核心资源的某一部分,您只需分配 `nvidia.com/gpucores`,无需其他资源字段。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-memory.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-memory.md
index 95dc11f5..5c50a51c 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-memory.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-memory.md
@@ -1,5 +1,6 @@
---
title: 为容器分配特定设备显存
+linktitle: 分配显存
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-memory2.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-memory2.md
index 8866ce17..d2bf9d69 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-memory2.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/allocate-device-memory2.md
@@ -1,5 +1,6 @@
---
title: 按百分比分配设备显存给容器
+linktitle: 按百分比分配显存
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/specify-card-type-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/specify-card-type-to-use.md
index b8e77453..eea7c36c 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/specify-card-type-to-use.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/specify-card-type-to-use.md
@@ -1,5 +1,6 @@
---
title: 分配任务到特定类型
+linktitle: 指定卡类型
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/specify-certain-card.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/specify-certain-card.md
index 841c6d0b..ee795553 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/specify-certain-card.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/specify-certain-card.md
@@ -1,5 +1,6 @@
---
title: 将任务分配给特定的 GPU
+linktitle: 指定 GPU
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/use-exclusive-card.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/use-exclusive-card.md
index dd0d1126..4a0717bf 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/use-exclusive-card.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/examples/use-exclusive-card.md
@@ -1,9 +1,10 @@
---
title: 使用独占 GPU
+linktitle: 独占 GPU
translated: true
---
-## 将设备核心分配给容器
+## 使用独占 GPU
要以独占模式使用 GPU,这是 nvidia-k8s-device-plugin 的默认行为,您只需分配 `nvidia.com/gpu` 而无需其他资源字段。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-core-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-core-usage.md
index bb3615db..5cc023f8 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-core-usage.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-core-usage.md
@@ -1,10 +1,9 @@
---
title: 分配设备核心给容器
+linktitle: 指定核心
translated: true
---
-## 分配设备核心给容器
-
通过指定资源 `nvidia.com/gpucores` 来分配设备核心资源的百分比。可选项,每个单位的 `nvidia.com/gpucores` 等于设备核心的 1%。
```yaml
@@ -14,4 +13,4 @@ translated: true
nvidia.com/gpucores: 50 # 每个 GPU 分配 50% 的设备核心。
```
-> **注意:** *HAMi-core 使用时间片来限制设备核心的使用。因此,通过 nvidia-smi 查看 GPU 利用率时会有波动*
\ No newline at end of file
+> **注意:** *HAMi-core 使用时间片来限制设备核心的使用。因此,通过 nvidia-smi 查看 GPU 利用率时会有波动*
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-memory-usage.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-memory-usage.md
index bb04e1f5..2ce679b2 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-memory-usage.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-memory-usage.md
@@ -1,5 +1,6 @@
---
title: 为容器分配设备显存
+linktitle: 指定显存
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-type-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-type-to-use.md
index 6c8986e5..89956045 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-type-to-use.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-type-to-use.md
@@ -1,5 +1,6 @@
---
title: 分配到特定设备类型
+linktitle: 指定设备类型
translated: true
---
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-uuid-to-use.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-uuid-to-use.md
index 876e6802..a4cef143 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-uuid-to-use.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/specify-device-uuid-to-use.md
@@ -1,9 +1,10 @@
---
title: 分配到特定设备
+linktitle: 指定设备
translated: true
---
-## 分配到特定设备类型
+## 分配到特定设备
有时任务可能希望在某个特定的GPU上运行,可以在pod注释中填写`nvidia.com/use-gpuuuid`字段。HAMi调度器将尝试匹配具有该UUID的设备。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/using-resourcequota.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/using-resourcequota.md
index be05b5c1..8e28e02b 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/using-resourcequota.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/NVIDIA-device/using-resourcequota.md
@@ -34,4 +34,4 @@ spec:
## 监控扩展的 resourcequota
-HAMi 调度器提供了相关指标,用于帮助用户监控当前 ResourceQuota 的使用情况。您可以参考 [HAMi 监控](../../userguide/monitoring/device-allocation.md) 文档,查看指标的详细说明。
+HAMi 调度器提供了相关指标,用于帮助用户监控当前 ResourceQuota 的使用情况。您可以参考 [HAMi 监控](../monitoring/device-allocation) 文档,查看指标的详细说明。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/configure.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/configure.md
index 6dd5fe41..ba2e147c 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/configure.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/configure.md
@@ -1,10 +1,8 @@
---
-title: 配置
+title: 全局配置
translated: true
---
-# 全局配置
-
## 设备配置:ConfigMap
:::note
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/device-allocation.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/device-allocation.md
index c90ced11..187d6afc 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/device-allocation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/monitoring/device-allocation.md
@@ -24,7 +24,7 @@ curl {scheduler node ip}:31993/metrics
| vGPUMemoryAllocated | 分配给某个容器的 vGPU 显存 | `{containeridx="Ascend310P",deviceuuid="aio-node74-arm-Ascend310P-0",nodename="aio-node74-arm",podname="ascend310p-pod",podnamespace="default",zone="vGPU"}` 3.221225472e+09 |
| QuotaUsed | resourcequota 的使用情况 | `{quotaName="nvidia.com/gpucores", quotanamespace="default",limit="200",zone="vGPU"}` 100 |
-如果你在使用 [HAMi DRA](../../installation/how-to-use-hami-dra.md), 它将暴露如下指标 :
+如果你在使用 [HAMi DRA](../../installation/how-to-use-hami-dra), 它将暴露如下指标 :
| 指标 | 描述 | 示例 |
|----------|-------------|---------|
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/NVIDIA-GPU/how-to-use-volcano-vgpu.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/NVIDIA-GPU/how-to-use-volcano-vgpu.md
index 5e7ba300..40b9a3ca 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/NVIDIA-GPU/how-to-use-volcano-vgpu.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/NVIDIA-GPU/how-to-use-volcano-vgpu.md
@@ -3,7 +3,7 @@ title: 如何使用 Volcano vGPU
translated: true
---
-# Volcano vgpu 设备插件用于 Kubernetes
+## Volcano vgpu 设备插件用于 Kubernetes
:::note
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/NVIDIA-GPU/monitor.md b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/NVIDIA-GPU/monitor.md
index ba9f01d5..b9ef8d32 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/NVIDIA-GPU/monitor.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/userguide/volcano-vgpu/NVIDIA-GPU/monitor.md
@@ -3,8 +3,6 @@ title: 监控 Volcano vGPU
translated: true
---
-## 监控
-
volcano-scheduler-metrics 记录每个 GPU 的使用情况和限制,访问以下地址获取这些指标。
```bash
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0.json b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0.json
index 0406075b..d4d95047 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0.json
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0.json
@@ -5,31 +5,31 @@
},
"sidebar.docs.category.Core Concepts": {
"message": "核心概念",
- "description": "The label for category Core Concepts in sidebar docs"
+ "description": "The label for category 'Core Concepts' in sidebar 'docs'"
},
"sidebar.docs.category.Key Features": {
"message": "关键特性",
- "description": "The label for category Key Features in sidebar docs"
+ "description": "The label for category 'Key Features' in sidebar 'docs'"
},
"sidebar.docs.category.Get Started": {
- "message": "开始使用",
- "description": "The label for category Get Started in sidebar docs"
+ "message": "快速开始",
+ "description": "The label for category 'Get Started' in sidebar 'docs'"
},
"sidebar.docs.category.Installation": {
"message": "安装",
- "description": "The label for category Installation in sidebar docs"
+ "description": "The label for category 'Installation' in sidebar 'docs'"
},
"sidebar.docs.category.User Guide": {
"message": "用户指南",
- "description": "The label for category User Guide in sidebar docs"
+ "description": "The label for category 'User Guide' in sidebar 'docs'"
},
"sidebar.docs.category.Monitoring": {
"message": "监控",
- "description": "The label for category Monitoring in sidebar docs"
+ "description": "The label for category 'Monitoring' in sidebar 'docs'"
},
"sidebar.docs.category.Share NVIDIA GPU devices": {
"message": "共享 NVIDIA GPU 设备",
- "description": "The label for category Share NVIDIA GPU devices in sidebar docs"
+ "description": "The label for category 'Share NVIDIA GPU devices' in sidebar 'docs'"
},
"sidebar.docs.category.Examples": {
"message": "示例",
@@ -37,34 +37,66 @@
},
"sidebar.docs.category.Share Cambricon MLU devices": {
"message": "共享 Cambricon MLU 设备",
- "description": "The label for category Share Cambricon MLU devices in sidebar docs"
+ "description": "The label for category 'Share Cambricon MLU devices' in sidebar 'docs'"
},
"sidebar.docs.category.Share Mthreads GPU devices": {
"message": "共享 Mthreads GPU 设备",
- "description": "The label for category Share Mthreads GPU devices in sidebar docs"
+ "description": "The label for category 'Share Mthreads GPU devices' in sidebar 'docs'"
},
"sidebar.docs.category.Optimize Metax GPU scheduling": {
"message": "优化 Metax GPU 调度",
- "description": "The label for category Optimize Metax GPU scheduling in sidebar docs"
+ "description": "The label for category 'Optimize Metax GPU scheduling' in sidebar 'docs'"
},
"sidebar.docs.category.Share Ascend devices": {
"message": "共享 Ascend 设备",
- "description": "The label for category Share Ascend devices in sidebar docs"
+ "description": "The label for category 'Share Ascend devices' in sidebar 'docs'"
},
"sidebar.docs.category.Volcano vgpu support": {
"message": "Volcano vgpu 支持",
- "description": "The label for category Volcano vgpu support in sidebar docs"
+ "description": "The label for category 'Volcano vgpu support' in sidebar 'docs'"
},
"sidebar.docs.category.NVIDIA GPU": {
"message": "NVIDIA GPU",
- "description": "The label for category NVIDIA GPU in sidebar docs"
+ "description": "The label for category 'NVIDIA GPU' in sidebar 'docs'"
},
"sidebar.docs.category.Developer Guide": {
"message": "开发者指南",
- "description": "The label for category Developer Guide in sidebar docs"
+ "description": "The label for category 'Developer Guide' in sidebar 'docs'"
},
"sidebar.docs.category.Contributor Guide": {
"message": "贡献者指南",
- "description": "The label for category Contributor Guide in sidebar docs"
+ "description": "The label for category 'Contributor Guide' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.nvidia-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.cambricon-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Share Hygon DCU devices": {
+ "message": "Share Hygon DCU devices",
+ "description": "The label for category 'Share Hygon DCU devices' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.hygon-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.mthreads-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.metax-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.ascend-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.volcano-vgpu-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
}
}
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/core-concepts/architecture.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/core-concepts/architecture.md
index 3f46d3aa..bf5db6f3 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/core-concepts/architecture.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/core-concepts/architecture.md
@@ -4,7 +4,7 @@ title: Architecture
The overall architecture of HAMi is shown as below:
-
+
The HAMi consists of the following components:
@@ -17,9 +17,6 @@ HAMi MutatingWebhook checks if this task can be handled by HAMi, it scans the re
The HAMi scheduler is responsible for assigning tasks to the appropriate nodes and devices. At the same time, the scheduler needs to maintain a global view of heterogeneous computing devices for monitoring.
-The device-plugin layer obtains the scheduling result from the annotations field of the task and maps the corresponding device to the container。
+The device-plugin layer obtains the scheduling result from the annotations field of the task and maps the corresponding device to the container.
The In container resource control is responsible for monitoring the resource usage within the container and providing hard isolation capabilities.
-
-
-
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/core-concepts/introduction.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/core-concepts/introduction.md
index fe5e28d1..e6137247 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/core-concepts/introduction.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/core-concepts/introduction.md
@@ -8,34 +8,34 @@ slug: /
Heterogeneous AI Computing Virtualization Middleware (HAMi), formerly known as k8s-vGPU-scheduler, is an "all-in-one" chart designed to manage Heterogeneous AI Computing Devices in a k8s cluster. It can provide the ability to share Heterogeneous AI devices among tasks.
-HAMi is a [Cloud Native Computing Foundation](https://cncf.io/) sandbox project & [Landscape project](https://landscape.cncf.io/?item=orchestration-management--scheduling-orchestration--hami) & [CNAI Landscape project](https://landscape.cncf.io/?group=cnai&item=cnai--general-orchestration--hami).
+HAMi is a [Cloud Native Computing Foundation](https://cncf.io/) sandbox project & [Landscape project](https://landscape.cncf.io/?item=orchestration-management--scheduling-orchestration--hami) & [CNAI Landscape project](https://landscape.cncf.io/?group=cnai&item=orchestration-management--scheduling-orchestration--hami).
+
+## Why HAMi
-## Why HAMi:
- __Device sharing__
- - Support multiple Heterogeneous AI Computing devices
- - Support device-sharing for multi-device containers
+ - Support multiple Heterogeneous AI Computing devices
+ - Support device-sharing for multi-device containers
- __Device Memory Control__
- - Hard limit inside container
- - Support dynamic device memory allocation
- - Support memory allocation by MB or by percentage
+ - Hard limit inside container
+ - Support dynamic device memory allocation
+ - Support memory allocation by MB or by percentage
- __Device Specification__
- - Support specify a type of certain heterogeneous AI computing devices
- - Support specify a certain heterogeneous AI computing devices using device UUID
+ - Support specify a type of certain heterogeneous AI computing devices
+ - Support specify a certain heterogeneous AI computing devices using device UUID
- __Easy to try__
- - Transparent to tasks inside container
- - Install/Uninstall using helm, easy and green
+ - Transparent to tasks inside container
+ - Install/Uninstall using helm, easy and green
- __Open and Neutral__
- - Jointly initiated by Internet, finance, manufacturing, cloud providers, etc.
- - Target for open governance with CNCF
-
+ - Jointly initiated by Internet, finance, manufacturing, cloud providers, etc.
+ - Target for open governance with CNCF
## What's Next
Here are some recommended next steps:
- Learn HAMi's [architecture](./architecture.md).
-- Start to [install HAMi](../installation/prequisities.md).
\ No newline at end of file
+- Start to [install HAMi](../installation/prerequisites.md).
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/Dynamic-mig.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/Dynamic-mig.md
index 3111f346..a91bc837 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/Dynamic-mig.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/Dynamic-mig.md
@@ -1,8 +1,7 @@
-----
-Dynamic MIG Implementation
-----
-
-# NVIDIA GPU MPS and MIG dynamic slice plugin
+---
+linktitle: Dynamic MIG Implementation
+title: NVIDIA GPU MPS and MIG dynamic slice plugin
+---
## Special Thanks
@@ -23,6 +22,7 @@ HAMi is done by using [hami-core](https://github.com/Project-HAMi/HAMi-core), wh
- Tasks can choose to use MIG, use HAMi-core, or use both.
### Config maps
+
- hami-scheduler-device-configMap
This configmap defines the plugin configurations including resourceName, and MIG geometries, and node-level configurations.
@@ -100,11 +100,11 @@ data:
## Structure
-
+
## Examples
-Dynamic mig is compatible with hami tasks, as the example below:
+Dynamic mig is compatible with hami tasks, as the example below:
Just Setting `nvidia.com/gpu` and `nvidia.com/gpumem`.
```yaml
@@ -120,7 +120,7 @@ spec:
resources:
limits:
nvidia.com/gpu: 2 # requesting 2 vGPUs
- nvidia.com/gpumem: 8000 # Each vGPU contains 8000m device memory (Optional,Integer)
+ nvidia.com/gpumem: 8000 # Each vGPU contains 8000m device memory (Optional,Integer)
```
A task can decide only to use `mig` or `hami-core` by setting `annotations.nvidia.com/vgpu-mode` to corresponding value, as the example below shows:
@@ -140,14 +140,14 @@ spec:
resources:
limits:
nvidia.com/gpu: 2 # requesting 2 vGPUs
- nvidia.com/gpumem: 8000 # Each vGPU contains 8000m device memory (Optional,Integer
+ nvidia.com/gpumem: 8000 # Each vGPU contains 8000m device memory (Optional,Integer
```
## Procedures
The Procedure of a vGPU task which uses dynamic-mig is shown below:
-
+
Note that after submitted a task, deviceshare plugin will iterate over templates defined in configMap `hami-scheduler-device`, and find the first available template to fit. You can always change the content of that configMap, and restart vc-scheduler to customize.
@@ -159,4 +159,3 @@ If you submit the example on an empty A100-PCIE-40GB node, then it will select a
```
Then start the container with 2g.10gb instances * 2
-
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/HAMi-core-design.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/HAMi-core-design.md
index 9bd8b22b..ac99d951 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/HAMi-core-design.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/HAMi-core-design.md
@@ -6,21 +6,22 @@ title: HAMi-core design
HAMi-core is a hook library for CUDA environment, it is the in-container gpu resource controller, it has been adopted by [HAMi](https://github.com/HAMi-project/HAMi), [volcano](https://github.com/volcano-sh/devices)
-
+
## Features
HAMi-core has the following features:
+
1. Virtualize device memory
-
+ 
-2. Limit device utilization by self-implemented time shard
+1. Limit device utilization by self-implemented time shard
-3. Real-time device utilization monitor
+1. Real-time device utilization monitor
## Design
HAMi-core operates by Hijacking the API-call between CUDA-Runtime(libcudart.so) and CUDA-Driver(libcuda.so), as the figure below:
-
+
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/build.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/build.md
index 22718078..64df7798 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/build.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/build.md
@@ -4,7 +4,7 @@ title: Build HAMi
## Make Binary
-### prequisities
+### Prerequisites
The following tools are required:
@@ -28,7 +28,7 @@ go build -ldflags '-s -w -X github.com/Project-HAMi/HAMi/pkg/version.version=v0.
## Make Image
-### prequisities
+### Prerequisites
The following tools are required:
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/mindmap.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/mindmap.md
index f206f26f..63172bad 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/mindmap.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/mindmap.md
@@ -4,4 +4,4 @@ title: HAMi mind map
## Mind map
-
\ No newline at end of file
+
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/protocol.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/protocol.md
index cbbbc8a0..4f98feeb 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/protocol.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/developers/protocol.md
@@ -10,28 +10,28 @@ In order to perform more accurate scheduling, the HAMi scheduler needs to percei
However, the device-plugin device registration API does not provide corresponding parameter acquisition, so HAMi-device-plugin stores these supplementary information in the node annotations during registering for the scheduler to read, as the following figure shows:
-
+
Here you need to use two annotations, one of which is the timestamp, if it exceeds the specified threshold, the device on the corresponding node will be considered invalid. The other information for device registration. A node with 2 32G-V100 GPUs can be registered as shown below:
-```
+```yaml
hami.io/node-handshake: Requesting_2024.05.14 07:07:33
hami.io/node-nvidia-register: 'GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec,10,32768,100,NVIDIA-Tesla V100-PCIE-32GB,0,true:GPU-0fc3eda5-e98b-a25b-5b0d-cf5c855d1448,10,32768,100,NVIDIA-Tesla V100-PCIE-32GB,0,true:'
```
-
-### Schedule Decision Making
+### Schedule Decision Making
The kube-scheduler calls device-plugin to mount devices during the `bind` process, but only the `UUID` of the device is provided to device-plugin. Therefore, in the scenario of device-sharing, device-plugin cannot obtain the specifications of the corresponding device, such as the `device memory` and `computing cores` requested by the task.
Therefore, it is necessary to develop a protocol for the scheduler layer to communicate with device-plugin to pass information about task dispatch. The scheduler passes this information by patching the scheduling result to the pod's annotations and reading it in device-plugin, as the figure below:
-
+
In this process, there are 3 annotations that need to be set, which are the `timestamp`, `devices to be assigned`, and the `devices allocated`. The content of `devices to be assigned` and the `devices allocated` are the same when the scheduler creates them, but device-plugin will determine the current device allocation by the content of `devices to be assigned`, and when the assignment is successful, the corresponding device will be removed from the annotation, so the content of `device to be assigned` will be empty when the task is successfully run.
An example of a task requesting a GPU with 3000M device memory will generate the corresponding annotations as follows
-```
+
+```yaml
hami.io/bind-time: 1716199325
hami.io/vgpu-devices-allocated: GPU-0fc3eda5-e98b-a25b-5b0d-cf5c855d1448,NVIDIA,3000,0:;
hami.io/vgpu-devices-to-allocate: ;
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/get-started/nginx-example.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/get-started/nginx-example.md
index 2cbd1183..3e5aa4cd 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/get-started/nginx-example.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/get-started/nginx-example.md
@@ -1,14 +1,16 @@
---
-title: Deploy HAMi using helm
+title: Deploy HAMi using Helm
---
This guide will cover:
+
- Configure nvidia container runtime in each GPU nodes
- Install HAMi using helm
- Launch a vGPU task
- Check if the corresponding device resources are limited inside container
-### Prerequisites
+## Prerequisites
+
- [Helm](https://helm.sh/zh/docs/) version v3+
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) version v1.16+
- [CUDA](https://developer.nvidia.com/cuda-toolkit) version v10.2+
@@ -17,6 +19,7 @@ This guide will cover:
### Installation
#### 1. Configure nvidia-container-toolkit
+
Configure nvidia-container-toolkit
Execute the following steps on all your GPU nodes.
@@ -55,7 +58,7 @@ When running `Kubernetes` with `Docker`, edit the configuration file, typically
And then restart `Docker`:
-```
+```bash
sudo systemctl daemon-reload && systemctl restart docker
```
@@ -64,7 +67,7 @@ sudo systemctl daemon-reload && systemctl restart docker
When running `Kubernetes` with `containerd`, modify the configuration file typically located at `/etc/containerd/config.toml`, to set up
`nvidia-container-runtime` as the default low-level runtime:
-```
+```toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
@@ -83,34 +86,35 @@ version = 2
And then restart `containerd`:
-```
+```bash
sudo systemctl daemon-reload && systemctl restart containerd
```
#### 2. Label your nodes
+
Label your GPU nodes for scheduling with HAMi by adding the label "gpu=on". Without this label, the nodes cannot be managed by our scheduler.
-```
+```bash
kubectl label nodes {nodeid} gpu=on
```
-#### 3. Deploy HAMi using helm:
+#### 3. Deploy HAMi using Helm
First, you need to check your Kubernetes version by using the following command:
-```
+```bash
kubectl version
```
Then, add our repo in helm
-```
+```bash
helm repo add hami-charts https://project-hami.github.io/HAMi/
```
During installation, set the Kubernetes scheduler image version to match your Kubernetes server version. For instance, if your cluster server version is 1.16.8, use the following command for deployment:
-```
+```bash
helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag=v1.16.8 -n kube-system
```
@@ -118,11 +122,11 @@ If everything goes well, you will see both vgpu-device-plugin and vgpu-scheduler
### Demo
-#### 1. Submit demo task:
+#### 1. Submit demo task
Containers can now request NVIDIA vGPUs using the `nvidia.com/gpu`` resource type.
-```
+```yaml
apiVersion: v1
kind: Pod
metadata:
@@ -135,20 +139,20 @@ spec:
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 vGPUs
- nvidia.com/gpumem: 10240 # Each vGPU contains 10240m device memory (Optional,Integer)
+ nvidia.com/gpumem: 10240 # Each vGPU contains 10240m device memory (Optional,Integer)
```
#### Verify in container resource control
Execute the following query command:
-```
+```bash
kubectl exec -it gpu-pod nvidia-smi
```
-The result should be
+The result should be
-```
+```text
[HAMI-core Msg(28:140561996502848:libvgpu.c:836)]: Initializing.....
Wed Apr 10 09:28:58 2024
+-----------------------------------------------------------------------------------------+
@@ -172,5 +176,3 @@ Wed Apr 10 09:28:58 2024
+-----------------------------------------------------------------------------------------+
[HAMI-core Msg(28:140561996502848:multiprocess_memory_limit.c:434)]: Calling exit handler 28
```
-
-
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/online-installation.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/online-installation.md
index a142c371..7893af3e 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/online-installation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/online-installation.md
@@ -1,41 +1,44 @@
---
-title: Online Installation from Helm (Recommended)
+linktitle: 通过 Helm 在线安装
+title: 通过 Helm 在线安装(推荐)
+translated: true
---
-The best practice to deploy HAMi is using helm.
+推荐使用 Helm 来部署 HAMi。
-## Add HAMi repo
+## 添加 HAMi 仓库
-You can add HAMi chart repository using the following command:
+您可以使用以下命令添加 HAMi 图表仓库:
-```
+```bash
helm repo add hami-charts https://project-hami.github.io/HAMi/
```
-## Get your kubernetes version
+## 获取您的 Kubernetes 版本
-kubernetes version is needed for properly installation. You can get this information by using the following command:
+安装时需要 Kubernetes 版本。您可以使用以下命令获取此信息:
-```
+```bash
kubectl version
```
-## Installation
+## 安装
-During installation, set the Kubernetes scheduler image version to match your Kubernetes server version. For instance, if your cluster server version is 1.16.8, use the following command for deployment:
+在安装过程中,将 `scheduler.kubeScheduler.imageTag` 设置为与您的 Kubernetes 服务器版本匹配。
+例如,如果您的集群服务器版本是 v1.16.8,请使用以下命令进行部署:
-```
+```bash
helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag=v1.16.8 -n kube-system
```
-You can customize your installation by adjusting the [configs](../userguide/configure.md).
+您可以通过调整[配置](../userguide/configure.md)来自定义安装。
-## Verify your installation
+## 验证您的安装
-You can verify your installation using the following command:
+您可以使用以下命令验证您的安装:
-```
+```bash
kubectl get pods -n kube-system
```
-If both hami-device-plugin and hami-scheduler pods are in the Running state, your installation is successful.
\ No newline at end of file
+如果 hami-device-plugin 和 hami-scheduler 这两个 Pod 都处于 Running 状态,则说明您的安装成功。
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/prequisities.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/prerequisites.md
similarity index 99%
rename from i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/prequisities.md
rename to i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/prerequisites.md
index 4b1ef701..13666bb8 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/prequisities.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/prerequisites.md
@@ -1,5 +1,5 @@
---
-title: Prequisities
+title: Prerequisites
---
## Prerequisites
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/webui-installation.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/webui-installation.md
index c1b5e202..704c4cc2 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/webui-installation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/installation/webui-installation.md
@@ -12,7 +12,7 @@ The WebUI can only be accessed by your localhost, so you need to connect your lo
The HAMi-WebUI open-source community offers Helm Charts for running it on Kubernetes. Please be aware that the code is provided without any warranties. If you encounter any problems, you can report them to the [Official GitHub repository](https://github.com/Project-HAMi/HAMi-WebUI/tree/main/charts/hami-webui).
-## Prequisities
+## Prerequisites
To install HAMi-WebUI using Helm, ensure you meet these requirements:
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/key-features/device-resource-isolation.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/key-features/device-resource-isolation.md
index 39f323ad..7b82c2f0 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/key-features/device-resource-isolation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/key-features/device-resource-isolation.md
@@ -5,7 +5,7 @@ title: Device resource isolation
A simple demonstration for device isolation:
A task with the following resources.
-```
+```yaml
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 vGPU
@@ -14,4 +14,4 @@ A task with the following resources.
will see 3G device memory inside container
-
\ No newline at end of file
+
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/key-features/device-sharing.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/key-features/device-sharing.md
index c84c2aba..c8cd8aba 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/key-features/device-sharing.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/key-features/device-sharing.md
@@ -7,4 +7,4 @@ title: Device sharing
- Permits partial device allocation by specifying device core usage.
- Requires zero changes to existing programs.
-
\ No newline at end of file
+
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/userguide/Metax-device/enable-metax-gpu-schedule.md b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/userguide/Metax-device/enable-metax-gpu-schedule.md
index 249b7dae..da715618 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/userguide/Metax-device/enable-metax-gpu-schedule.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v1.3.0/userguide/Metax-device/enable-metax-gpu-schedule.md
@@ -7,9 +7,10 @@ title: Enable Metax GPU topology-aware scheduling
When multiple GPUs are configured on a single server, the GPU cards are connected to the same PCIe Switch or MetaXLink depending on whether they are connected
, there is a near-far relationship. This forms a topology among all the cards on the server, as shown in the following figure:
-
+
A user job requests a certain number of metax-tech.com/gpu resources, Kubernetes schedule pods to the appropriate node. gpu-device further processes the logic of allocating the remaining resources on the resource node following criteria below:
+
1. MetaXLink takes precedence over PCIe Switch in two way:
– A connection is considered a MetaXLink connection when there is a MetaXLink connection and a PCIe Switch connection between the two cards.
– When both the MetaXLink and the PCIe Switch can meet the job request
@@ -17,11 +18,11 @@ Equipped with MetaXLink interconnected resources.
2. When using `node-scheduler-policy=spread` , Allocate Metax resources to be under the same Metaxlink or Paiswich as much as possible, as the following figure shows:
-
+
-3. When using `node-scheduler-policy=binpack`, Assign GPU resources, so minimize the damage to MetaxXLink topology, as the following figure shows:
+1. When using `node-scheduler-policy=binpack`, Assign GPU resources, so minimize the damage to MetaxXLink topology, as the following figure shows:
-
+
## Important Notes
@@ -45,7 +46,7 @@ Equipped with MetaXLink interconnected resources.
Mthreads GPUs can now be requested by a container
using the `metax-tech.com/gpu` resource type:
-```
+```yaml
apiVersion: v1
kind: Pod
metadata:
@@ -63,5 +64,3 @@ spec:
```
> **NOTICE2:** *You can find more examples in examples folder
-
-
\ No newline at end of file
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1.json b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1.json
index f43ee7c8..3a9e181c 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1.json
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1.json
@@ -5,27 +5,27 @@
},
"sidebar.docs.category.Core Concepts": {
"message": "核心概念",
- "description": "The label for category Core Concepts in sidebar docs"
+ "description": "The label for category 'Core Concepts' in sidebar 'docs'"
},
"sidebar.docs.category.Get Started": {
- "message": "开始使用",
- "description": "The label for category Get Started in sidebar docs"
+ "message": "快速开始",
+ "description": "The label for category 'Get Started' in sidebar 'docs'"
},
"sidebar.docs.category.Installation": {
"message": "安装",
- "description": "The label for category Installation in sidebar docs"
+ "description": "The label for category 'Installation' in sidebar 'docs'"
},
"sidebar.docs.category.User Guide": {
"message": "用户指南",
- "description": "The label for category User Guide in sidebar docs"
+ "description": "The label for category 'User Guide' in sidebar 'docs'"
},
"sidebar.docs.category.Monitoring": {
"message": "监控",
- "description": "The label for category Monitoring in sidebar docs"
+ "description": "The label for category 'Monitoring' in sidebar 'docs'"
},
"sidebar.docs.category.Share NVIDIA GPU devices": {
"message": "共享 NVIDIA GPU 设备",
- "description": "The label for category Share NVIDIA GPU devices in sidebar docs"
+ "description": "The label for category 'Share NVIDIA GPU devices' in sidebar 'docs'"
},
"sidebar.docs.category.Examples": {
"message": "示例",
@@ -33,14 +33,70 @@
},
"sidebar.docs.category.Share Cambricon MLU devices": {
"message": "共享 Cambricon MLU 设备",
- "description": "The label for category Share Cambricon MLU devices in sidebar docs"
+ "description": "The label for category 'Share Cambricon MLU devices' in sidebar 'docs'"
},
"sidebar.docs.category.Developer Guide": {
"message": "开发者指南",
- "description": "The label for category Developer Guide in sidebar docs"
+ "description": "The label for category 'Developer Guide' in sidebar 'docs'"
},
"sidebar.docs.category.Contributor Guide": {
"message": "贡献者指南",
- "description": "The label for category Contributor Guide in sidebar docs"
+ "description": "The label for category 'Contributor Guide' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Key Features": {
+ "message": "Key Features",
+ "description": "The label for category 'Key Features' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.nvidia-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.cambricon-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Share Hygon DCU devices": {
+ "message": "Share Hygon DCU devices",
+ "description": "The label for category 'Share Hygon DCU devices' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.hygon-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Share Mthreads GPU devices": {
+ "message": "Share Mthreads GPU devices",
+ "description": "The label for category 'Share Mthreads GPU devices' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.mthreads-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Optimize Metax GPU scheduling": {
+ "message": "Optimize Metax GPU scheduling",
+ "description": "The label for category 'Optimize Metax GPU scheduling' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.metax-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Share Ascend devices": {
+ "message": "Share Ascend devices",
+ "description": "The label for category 'Share Ascend devices' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.ascend-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Volcano vgpu support": {
+ "message": "Volcano vgpu support",
+ "description": "The label for category 'Volcano vgpu support' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.NVIDIA GPU": {
+ "message": "NVIDIA GPU",
+ "description": "The label for category 'NVIDIA GPU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.volcano-vgpu-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
}
}
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/contributor/contribute-docs.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/contributor/contribute-docs.md
index 469c542c..fa2c10a6 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/contributor/contribute-docs.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/contributor/contribute-docs.md
@@ -2,7 +2,7 @@
title: How to contribute docs
---
-Starting from version 1.3, the community documentation will be available on the Karmada website.
+Starting from version 1.3, the community documentation will be available on the HAMi website.
This document explains how to contribute docs to
the `Project-HAMi/website` repository.
@@ -13,7 +13,7 @@ the `Project-HAMi/website` repository.
- Docs need to be translated into multiple languages for readers from different regions.
The community now supports both Chinese and English.
English is the official language of documentation.
-- For our docs we use markdown. If you are unfamiliar with Markdown, please see https://guides.github.com/features/mastering-markdown/ or https://www.markdownguide.org/ if you are looking for something more substantial.
+- For our docs we use markdown. If you are unfamiliar with Markdown, please see [https://guides.github.com/features/mastering-markdown/](https://guides.github.com/features/mastering-markdown/) or [https://www.markdownguide.org/](https://www.markdownguide.org/) if you are looking for something more substantial.
- We get some additions through [Docusaurus 2](https://docusaurus.io/), a model static website generator.
## Setup
@@ -27,7 +27,7 @@ cd website
Our website is organized like below:
-```
+```text
website
├── sidebars.json # sidebar for the current docs version
├── docs # docs directory for the current docs version
@@ -77,7 +77,7 @@ It's important for your article to specify metadata concerning an article at the
For now, let's take a look at a quick example which should explain the most relevant entries in **Front Matter**:
-```
+```yaml
---
title: A doc with tags
---
@@ -86,34 +86,37 @@ title: A doc with tags
```
The top section between two lines of --- is the Front Matter section. Here we define a couple of entries which tell Docusaurus how to handle the article:
-* Title is the equivalent of the `
` in a HTML document or `# ` in a Markdown article.
-* Each document has a unique ID. By default, a document ID is the name of the document (without the extension) related to the root docs directory.
+
+- Title is the equivalent of the `
` in a HTML document or `# ` in a Markdown article.
+- Each document has a unique ID. By default, a document ID is the name of the document (without the extension) related to the root docs directory.
### Linking to other docs
You can easily route to other places by adding any of the following links:
-* Absolute URLs to external sites like `https://github.com` or `https://k8s.io` - you can use any of the Markdown notations for this, so
- * `` or
- * `[kubernetes](https://k8s.io)` will work.
-* Link to markdown files or the resulting path.
+
+- Absolute URLs to external sites like `https://github.com` or `https://k8s.io` - you can use any of the Markdown notations for this, so
+ - `` or
+ - `[kubernetes](https://k8s.io)` will work.
+- Link to markdown files or the resulting path.
You can use relative paths to index the corresponding files.
-* Link to pictures or other resources.
+- Link to pictures or other resources.
If your article contains images or other resources, you may create a corresponding directory in `/docs/resources`, and article related files are placed in that directory.
- Now we store public pictures about Karmada in `/docs/resources/general`. You can use the following to link the pictures:
- * ``
+ Now we store public pictures about HAMi in `/docs/resources/general`. You can use the following to link the pictures:
+ - ``
-### Directory organization
+### Directory organization
-Docusaurus 2 uses a sidebar to manage documents.
+Docusaurus 2 uses a sidebar to manage documents.
Creating a sidebar is useful to:
-* Group multiple related documents
-* Display a sidebar on each of those documents
-* Provide paginated navigation, with next/previous button
+
+- Group multiple related documents
+- Display a sidebar on each of those documents
+- Provide paginated navigation, with next/previous button
For our docs, you can know how our documents are organized from [https://github.com/Project-HAMi/website/blob/main/sidebars.js](https://github.com/Project-HAMi/website/blob/main/sidebars.js).
-```
+```js
module.exports = {
docs: [
{
@@ -141,7 +144,8 @@ module.exports = {
```
The order of documents in a directory is strictly in the order of items.
-```
+
+```yaml
type: "category",
label: "Core Concepts",
collapsed: false,
@@ -157,9 +161,10 @@ If you add a document, you must add it to `sidebars.js` to make it display prope
### About Chinese docs
There are two situations about the Chinese version of the document:
-* You want to translate our existing English docs to Chinese. In this case, you need to modify the corresponding file content from [https://github.com/Project-HAMi/website/tree/main/i18n/zh/docusaurus-plugin-content-docs/current](https://github.com/Project-HAMi/website/tree/main/i18n/zh/docusaurus-plugin-content-docs/current).
+
+- You want to translate our existing English docs to Chinese. In this case, you need to modify the corresponding file content from [https://github.com/Project-HAMi/website/tree/main/i18n/zh/docusaurus-plugin-content-docs/current](https://github.com/Project-HAMi/website/tree/main/i18n/zh/docusaurus-plugin-content-docs/current).
The organization of this directory is exactly the same as the outer layer. `current.json` holds translations for the documentation directory. You can edit it if you want to translate the name of directory.
-* You want to contribute Chinese docs without English version. Any articles of any kind are welcomed. In this case, you can add articles and titles to the main directory first. Article content can be TBD first, like this.
+- You want to contribute Chinese docs without English version. Any articles of any kind are welcomed. In this case, you can add articles and titles to the main directory first. Article content can be TBD first, like this.
Then add the corresponding Chinese content to the Chinese directory.
## Debugging docs
@@ -177,4 +182,4 @@ If the previewed page is not what you expected, please check your docs again.
### Versioning
For the newly supplemented documents of each version, we will synchronize to the latest version on the release date of each version, and the documents of the old version will not be modified.
-For errata found in the documentation, we will fix it with every release.
\ No newline at end of file
+For errata found in the documentation, we will fix it with every release.
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/contributor/github-workflow.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/contributor/github-workflow.md
index 5940fd30..0c66d946 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/contributor/github-workflow.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/contributor/github-workflow.md
@@ -5,7 +5,7 @@ description: An overview of the GitHub workflow used by the Karmada project. It
> This doc is lifted from [Kubernetes github-workflow](https://github.com/kubernetes/community/blob/master/contributors/guide/github-workflow.md).
-
+
## 1 Fork in the cloud
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/core-concepts/architecture.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/core-concepts/architecture.md
index 3f46d3aa..bf5db6f3 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/core-concepts/architecture.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/core-concepts/architecture.md
@@ -4,7 +4,7 @@ title: Architecture
The overall architecture of HAMi is shown as below:
-
+
The HAMi consists of the following components:
@@ -17,9 +17,6 @@ HAMi MutatingWebhook checks if this task can be handled by HAMi, it scans the re
The HAMi scheduler is responsible for assigning tasks to the appropriate nodes and devices. At the same time, the scheduler needs to maintain a global view of heterogeneous computing devices for monitoring.
-The device-plugin layer obtains the scheduling result from the annotations field of the task and maps the corresponding device to the container。
+The device-plugin layer obtains the scheduling result from the annotations field of the task and maps the corresponding device to the container.
The In container resource control is responsible for monitoring the resource usage within the container and providing hard isolation capabilities.
-
-
-
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/core-concepts/introduction.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/core-concepts/introduction.md
index fe5e28d1..e6137247 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/core-concepts/introduction.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/core-concepts/introduction.md
@@ -8,34 +8,34 @@ slug: /
Heterogeneous AI Computing Virtualization Middleware (HAMi), formerly known as k8s-vGPU-scheduler, is an "all-in-one" chart designed to manage Heterogeneous AI Computing Devices in a k8s cluster. It can provide the ability to share Heterogeneous AI devices among tasks.
-HAMi is a [Cloud Native Computing Foundation](https://cncf.io/) sandbox project & [Landscape project](https://landscape.cncf.io/?item=orchestration-management--scheduling-orchestration--hami) & [CNAI Landscape project](https://landscape.cncf.io/?group=cnai&item=cnai--general-orchestration--hami).
+HAMi is a [Cloud Native Computing Foundation](https://cncf.io/) sandbox project & [Landscape project](https://landscape.cncf.io/?item=orchestration-management--scheduling-orchestration--hami) & [CNAI Landscape project](https://landscape.cncf.io/?group=cnai&item=orchestration-management--scheduling-orchestration--hami).
+
+## Why HAMi
-## Why HAMi:
- __Device sharing__
- - Support multiple Heterogeneous AI Computing devices
- - Support device-sharing for multi-device containers
+ - Support multiple Heterogeneous AI Computing devices
+ - Support device-sharing for multi-device containers
- __Device Memory Control__
- - Hard limit inside container
- - Support dynamic device memory allocation
- - Support memory allocation by MB or by percentage
+ - Hard limit inside container
+ - Support dynamic device memory allocation
+ - Support memory allocation by MB or by percentage
- __Device Specification__
- - Support specify a type of certain heterogeneous AI computing devices
- - Support specify a certain heterogeneous AI computing devices using device UUID
+ - Support specify a type of certain heterogeneous AI computing devices
+ - Support specify a certain heterogeneous AI computing devices using device UUID
- __Easy to try__
- - Transparent to tasks inside container
- - Install/Uninstall using helm, easy and green
+ - Transparent to tasks inside container
+ - Install/Uninstall using helm, easy and green
- __Open and Neutral__
- - Jointly initiated by Internet, finance, manufacturing, cloud providers, etc.
- - Target for open governance with CNCF
-
+ - Jointly initiated by Internet, finance, manufacturing, cloud providers, etc.
+ - Target for open governance with CNCF
## What's Next
Here are some recommended next steps:
- Learn HAMi's [architecture](./architecture.md).
-- Start to [install HAMi](../installation/prequisities.md).
\ No newline at end of file
+- Start to [install HAMi](../installation/prerequisites.md).
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/get-started/nginx-example.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/get-started/nginx-example.md
index 2cbd1183..3e5aa4cd 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/get-started/nginx-example.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/get-started/nginx-example.md
@@ -1,14 +1,16 @@
---
-title: Deploy HAMi using helm
+title: Deploy HAMi using Helm
---
This guide will cover:
+
- Configure nvidia container runtime in each GPU nodes
- Install HAMi using helm
- Launch a vGPU task
- Check if the corresponding device resources are limited inside container
-### Prerequisites
+## Prerequisites
+
- [Helm](https://helm.sh/zh/docs/) version v3+
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) version v1.16+
- [CUDA](https://developer.nvidia.com/cuda-toolkit) version v10.2+
@@ -17,6 +19,7 @@ This guide will cover:
### Installation
#### 1. Configure nvidia-container-toolkit
+
Configure nvidia-container-toolkit
Execute the following steps on all your GPU nodes.
@@ -55,7 +58,7 @@ When running `Kubernetes` with `Docker`, edit the configuration file, typically
And then restart `Docker`:
-```
+```bash
sudo systemctl daemon-reload && systemctl restart docker
```
@@ -64,7 +67,7 @@ sudo systemctl daemon-reload && systemctl restart docker
When running `Kubernetes` with `containerd`, modify the configuration file typically located at `/etc/containerd/config.toml`, to set up
`nvidia-container-runtime` as the default low-level runtime:
-```
+```toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
@@ -83,34 +86,35 @@ version = 2
And then restart `containerd`:
-```
+```bash
sudo systemctl daemon-reload && systemctl restart containerd
```
#### 2. Label your nodes
+
Label your GPU nodes for scheduling with HAMi by adding the label "gpu=on". Without this label, the nodes cannot be managed by our scheduler.
-```
+```bash
kubectl label nodes {nodeid} gpu=on
```
-#### 3. Deploy HAMi using helm:
+#### 3. Deploy HAMi using Helm
First, you need to check your Kubernetes version by using the following command:
-```
+```bash
kubectl version
```
Then, add our repo in helm
-```
+```bash
helm repo add hami-charts https://project-hami.github.io/HAMi/
```
During installation, set the Kubernetes scheduler image version to match your Kubernetes server version. For instance, if your cluster server version is 1.16.8, use the following command for deployment:
-```
+```bash
helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag=v1.16.8 -n kube-system
```
@@ -118,11 +122,11 @@ If everything goes well, you will see both vgpu-device-plugin and vgpu-scheduler
### Demo
-#### 1. Submit demo task:
+#### 1. Submit demo task
Containers can now request NVIDIA vGPUs using the `nvidia.com/gpu`` resource type.
-```
+```yaml
apiVersion: v1
kind: Pod
metadata:
@@ -135,20 +139,20 @@ spec:
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 vGPUs
- nvidia.com/gpumem: 10240 # Each vGPU contains 10240m device memory (Optional,Integer)
+ nvidia.com/gpumem: 10240 # Each vGPU contains 10240m device memory (Optional,Integer)
```
#### Verify in container resource control
Execute the following query command:
-```
+```bash
kubectl exec -it gpu-pod nvidia-smi
```
-The result should be
+The result should be
-```
+```text
[HAMI-core Msg(28:140561996502848:libvgpu.c:836)]: Initializing.....
Wed Apr 10 09:28:58 2024
+-----------------------------------------------------------------------------------------+
@@ -172,5 +176,3 @@ Wed Apr 10 09:28:58 2024
+-----------------------------------------------------------------------------------------+
[HAMI-core Msg(28:140561996502848:multiprocess_memory_limit.c:434)]: Calling exit handler 28
```
-
-
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/online-installation.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/online-installation.md
index 1d0546d2..7893af3e 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/online-installation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/online-installation.md
@@ -1,53 +1,44 @@
---
-title: Online Installation from Helm (Recommended)
+linktitle: 通过 Helm 在线安装
+title: 通过 Helm 在线安装(推荐)
+translated: true
---
-You can install `kubectl-karmada` plug-in in any of the following ways:
+推荐使用 Helm 来部署 HAMi。
-- Download from the release.
-- Install using Krew.
-- Build from source code.
+## 添加 HAMi 仓库
-## Prerequisites
-
-### kubectl
-`kubectl` is the Kubernetes command line tool lets you control Kubernetes clusters.
-For installation instructions see [installing kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl).
-
-## Download from the release
-
-Karmada provides `kubectl-karmada` plug-in download service since v0.9.0. You can choose proper plug-in version which fits your operator system form [karmada release](https://github.com/karmada-io/karmada/releases).
-
-Take v1.2.1 that working with linux-amd64 os as an example:
+您可以使用以下命令添加 HAMi 图表仓库:
```bash
-wget https://github.com/karmada-io/karmada/releases/download/v1.2.1/kubectl-karmada-linux-amd64.tgz
-
-tar -zxf kubectl-karmada-linux-amd64.tgz
+helm repo add hami-charts https://project-hami.github.io/HAMi/
```
-Next, move `kubectl-karmada` executable file to `PATH` path, reference from [Installing kubectl plugins](https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/#installing-kubectl-plugins).
+## 获取您的 Kubernetes 版本
-## Install using Krew
+安装时需要 Kubernetes 版本。您可以使用以下命令获取此信息:
-Krew is the plugin manager for `kubectl` command-line tool.
+```bash
+kubectl version
+```
-[Install and set up](https://krew.sigs.k8s.io/docs/user-guide/setup/install/) Krew on your machine.
+## 安装
-Then install `kubectl-karmada` plug-in:
+在安装过程中,将 `scheduler.kubeScheduler.imageTag` 设置为与您的 Kubernetes 服务器版本匹配。
+例如,如果您的集群服务器版本是 v1.16.8,请使用以下命令进行部署:
```bash
-kubectl krew install karmada
+helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag=v1.16.8 -n kube-system
```
-You can refer to [Quickstart of Krew](https://krew.sigs.k8s.io/docs/user-guide/quickstart/) for more information.
+您可以通过调整[配置](../userguide/configure.md)来自定义安装。
-## Build from source code
+## 验证您的安装
-Clone karmada repo and run `make` cmd from the repository:
+您可以使用以下命令验证您的安装:
```bash
-make kubectl-karmada
+kubectl get pods -n kube-system
```
-Next, move the `kubectl-karmada` executable file under the `_output` folder in the project root directory to the `PATH` path.
+如果 hami-device-plugin 和 hami-scheduler 这两个 Pod 都处于 Running 状态,则说明您的安装成功。
diff --git a/versioned_docs/version-v2.4.1/installation/prequisities.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/prerequisites.md
similarity index 99%
rename from versioned_docs/version-v2.4.1/installation/prequisities.md
rename to i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/prerequisites.md
index 4b1ef701..13666bb8 100644
--- a/versioned_docs/version-v2.4.1/installation/prequisities.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/prerequisites.md
@@ -1,5 +1,5 @@
---
-title: Prequisities
+title: Prerequisites
---
## Prerequisites
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/webui-installation.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/webui-installation.md
index 821fc0ef..07a05dfb 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/webui-installation.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.4.1/installation/webui-installation.md
@@ -12,7 +12,7 @@ The WebUI can only be accessed by your localhost, so you need to connect your lo
The HAMi-WebUI open-source community offers Helm Charts for running it on Kubernetes. Please be aware that the code is provided without any warranties. If you encounter any problems, you can report them to the [Official GitHub repository](https://github.com/Project-HAMi/HAMi-WebUI/tree/main/charts/hami-webui).
-## Prequisities
+## Prerequisites
To install HAMi-WebUI using Helm, ensure you meet these requirements:
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.5.0.json b/i18n/zh/docusaurus-plugin-content-docs/version-v2.5.0.json
index f0a6ff90..829c6703 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.5.0.json
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.5.0.json
@@ -5,23 +5,23 @@
},
"sidebar.docs.category.Core Concepts": {
"message": "核心概念",
- "description": "The label for category Core Concepts in sidebar docs"
+ "description": "The label for category 'Core Concepts' in sidebar 'docs'"
},
"sidebar.docs.category.Get Started": {
- "message": "开始使用",
- "description": "The label for category Get Started in sidebar docs"
+ "message": "快速开始",
+ "description": "The label for category 'Get Started' in sidebar 'docs'"
},
"sidebar.docs.category.Installation": {
"message": "安装",
- "description": "The label for category Installation in sidebar docs"
+ "description": "The label for category 'Installation' in sidebar 'docs'"
},
"sidebar.docs.category.User Guide": {
"message": "用户指南",
- "description": "The label for category User Guide in sidebar docs"
+ "description": "The label for category 'User Guide' in sidebar 'docs'"
},
"sidebar.docs.category.Monitoring": {
"message": "监控",
- "description": "The label for category Monitoring in sidebar docs"
+ "description": "The label for category 'Monitoring' in sidebar 'docs'"
},
"sidebar.docs.category.Share NVIDIA GPU devices": {
"message": "共享 NVIDIA GPU 设备",
@@ -37,15 +37,15 @@
},
"sidebar.docs.category.Contributor Guide": {
"message": "贡献者指南",
- "description": "The label for category Contributor Guide in sidebar docs"
+ "description": "The label for category 'Contributor Guide' in sidebar 'docs'"
},
"sidebar.docs.category.Developer Guide": {
"message": "开发者指南",
- "description": "The label for category Developer Guide in sidebar docs"
+ "description": "The label for category 'Developer Guide' in sidebar 'docs'"
},
"sidebar.docs.category.Key Features": {
"message": "核心功能",
- "description": "The label for category Key Features in sidebar docs"
+ "description": "The label for category 'Key Features' in sidebar 'docs'"
},
"sidebar.docs.category.Share Hygon DCU devices": {
"message": "共享海光 DCU 设备",
@@ -61,10 +61,90 @@
},
"sidebar.docs.category.Volcano vgpu support": {
"message": "Volcano vGPU",
- "description": "The label for category Volcano vgpu support in sidebar docs"
+ "description": "The label for category 'Volcano vgpu support' in sidebar 'docs'"
},
"sidebar.docs.category.Share Ascend devices": {
"message": "共享昇腾 GPU 设备",
"description": "The label for category Share Ascend devices in sidebar docs"
+ },
+ "sidebar.docs.category.nvidia-gpu": {
+ "message": "NVIDIA GPU",
+ "description": "The label for category 'NVIDIA GPU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.nvidia-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Cambricon MLU": {
+ "message": "Cambricon MLU",
+ "description": "The label for category 'Cambricon MLU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.cambricon-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Hygon DCU": {
+ "message": "Hygon DCU",
+ "description": "The label for category 'Hygon DCU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.hygon-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Mthreads GPU": {
+ "message": "Mthreads GPU",
+ "description": "The label for category 'Mthreads GPU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.mthreads-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Metax GPU": {
+ "message": "Metax GPU",
+ "description": "The label for category 'Metax GPU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.metax-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.gpu": {
+ "message": "gpu",
+ "description": "The label for category 'gpu' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.sgpu": {
+ "message": "sgpu",
+ "description": "The label for category 'sgpu' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Ascend NPU": {
+ "message": "Ascend NPU",
+ "description": "The label for category 'Ascend NPU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.ascend-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Enflame GCU": {
+ "message": "Enflame GCU",
+ "description": "The label for category 'Enflame GCU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.enflame-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.Iluvatar GPU": {
+ "message": "Iluvatar GPU",
+ "description": "The label for category 'Iluvatar GPU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.iluvatar-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.volcano-nvidia-gpu": {
+ "message": "NVIDIA GPU",
+ "description": "The label for category 'NVIDIA GPU' in sidebar 'docs'"
+ },
+ "sidebar.docs.category.volcano-vgpu-examples": {
+ "message": "Examples",
+ "description": "The label for category 'Examples' in sidebar 'docs'"
}
}
diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-v2.5.0/contributor/contribute-docs.md b/i18n/zh/docusaurus-plugin-content-docs/version-v2.5.0/contributor/contribute-docs.md
index 0c09c730..c0d7cecc 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/version-v2.5.0/contributor/contribute-docs.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/version-v2.5.0/contributor/contribute-docs.md
@@ -3,13 +3,13 @@ title: 如何贡献文档
translated: true
---
-从1.3版本开始,社区文档将在HAMi网站上提供。本文件解释了如何向`Project-HAMi/website`仓库贡献文档。
+从 1.3 版本开始,社区文档将在 HAMi 网站上提供。本文件解释了如何向`Project-HAMi/website`仓库贡献文档。
## 前提条件
-- 文档和代码一样,也按版本分类和存储。1.3是我们归档的第一个版本。
+- 文档和代码一样,也按版本分类和存储。1.3 是我们归档的第一个版本。
- 文档需要翻译成多种语言,以便来自不同地区的读者阅读。社区现在支持中文和英文。英文是文档的官方语言。
-- 我们的文档使用Markdown。如果您不熟悉Markdown,请参阅https://guides.github.com/features/mastering-markdown/或https://www.markdownguide.org/以获取更详细的信息。
+- 我们的文档使用 Markdown。如果您不熟悉 Markdown,请参阅[https://guides.github.com/features/mastering-markdown/](https://guides.github.com/features/mastering-markdown/)或 [https://www.markdownguide.org/](https://www.markdownguide.org/)以获取更详细的信息。
- 我们通过[Docusaurus 2](https://docusaurus.io/)获得了一些附加功能,这是一个现代静态网站生成器。
## 设置
@@ -23,7 +23,7 @@ cd website
我们的网站组织如下:
-```
+```text
website
├── sidebars.json # 当前文档版本的侧边栏
├── docs # 当前文档版本的文档目录
@@ -47,13 +47,13 @@ website
└── package.json
```
-`versions.json`文件是一个版本列表,从最新到最早。下表解释了版本化文件如何映射到其版本和生成的URL。
+`versions.json`文件是一个版本列表,从最新到最早。下表解释了版本化文件如何映射到其版本和生成的 URL。
-| 路径 | 版本 | URL |
-| --------------------------------------- | -------------- | ----------------- |
-| `versioned_docs/version-1.0.0/hello.md` | 1.0.0 | /docs/1.0.0/hello |
-| `versioned_docs/version-1.1.0/hello.md` | 1.1.0 (最新) | /docs/hello |
-| `docs/hello.md` | 当前 | /docs/next/hello |
+| 路径 | 版本 | URL |
+| --- | --- | --- |
+| `versioned_docs/version-1.0.0/hello.md` | 1.0.0 | /docs/1.0.0/hello |
+| `versioned_docs/version-1.1.0/hello.md` | 1.1.0 (最新) | /docs/hello |
+| `docs/hello.md` | 当前 | /docs/next/hello |
:::提示
@@ -68,11 +68,11 @@ website
### 在顶部开始一个标题
-在Markdown文件的顶部指定有关文章的元数据是很重要的,这个部分称为**Front Matter**。
+在 Markdown 文件的顶部指定有关文章的元数据是很重要的,这个部分称为**Front Matter**。
现在,让我们看一个快速示例,它应该解释**Front Matter**中最相关的条目:
-```
+```yaml
---
title: 带有标签的文档
---
@@ -80,25 +80,25 @@ title: 带有标签的文档
## 二级标题
```
-在两行---之间的顶部部分是Front Matter部分。在这里,我们定义了一些条目,告诉Docusaurus如何处理文章:
+在两行---之间的顶部部分是 Front Matter 部分。在这里,我们定义了一些条目,告诉 Docusaurus 如何处理文章:
-- 标题相当于HTML文档中的`
`或Markdown文章中的`# `。
-- 每个文档都有一个唯一的ID。默认情况下,文档ID是与根文档目录相关的文档名称(不带扩展名)。
+- 标题相当于 HTML 文档中的`
+ {isZh
+ ? '来自 CNCF 生态的真实落地案例。每篇案例展示了组织如何借助 HAMi 提升 GPU 利用率并扩展 AI 基础设施。'
+ : 'Real-world adoption stories from the CNCF ecosystem. Each case study highlights how organizations use HAMi to improve GPU utilization and scale AI infrastructure.'}
+
+ {isZh
+ ? '参与 HAMi 开源社区,通过讨论、会议和贡献推动异构 AI 基础设施的发展。'
+ : 'Join the HAMi open-source community and advance heterogeneous AI infrastructure through discussions, meetings, and contributions.'}
+
+
+
+
+
+
+
+
{isZh ? '维护者' : 'Maintainers'}
+
+ {isZh
+ ? 'HAMi 由以下维护者共同推进,负责项目方向、评审与版本发布。'
+ : 'HAMi is maintained by the people below, who help guide project direction, reviews, and releases.'}
+
+ {isZh
+ ? 'HAMi 是开源的云原生 GPU 虚拟化中间件,为 AI 工作负载提供异构加速器的共享、隔离与调度能力。'
+ : 'HAMi is an open-source, cloud-native GPU virtualization middleware that brings sharing, isolation and scheduling of heterogeneous accelerators to AI workloads on Kubernetes.'}
+
{isZh ? '使用 HAMi 前后对比' : 'Before and After Using HAMi'}
+
+ {isZh
+ ? '相同工作负载下,对比传统整卡独占与 HAMi GPU 共享后的资源利用率变化。'
+ : 'Compare traditional whole-GPU allocation with HAMi GPU sharing under the same workloads.'}
+
+
+
+
+
+
+
+
+
+
+
{isZh ? '生态与设备支持' : 'Ecosystem & Device Support'}
+
+ {isZh
+ ? '覆盖多厂商加速设备生态,详情和支持矩阵见文档。'
+ : 'Broad accelerator ecosystem across vendors. See docs for full support matrix.'}
+
+ {isZh
+ ? 'HAMi 由社区与企业贡献者共同推进,以下组织持续参与项目建设与生态协作。'
+ : 'HAMi is advanced by contributors from the community and industry. These organizations actively participate in project development and ecosystem collaboration.'}
+
+
+
+
+
+
+ {isZh ? '全球社区指标' : 'Global Community Metrics'}
+
+
+ {isZh
+ ? '实时展示 HAMi 社区增长与开源活跃度。'
+ : 'A live snapshot of HAMi community growth and open-source momentum.'}
+