diff --git a/blog/kcd-beijing-2026-dra-gpu-scheduling/index.md b/blog/kcd-beijing-2026-dra-gpu-scheduling/index.md new file mode 100644 index 00000000..c4cf25ee --- /dev/null +++ b/blog/kcd-beijing-2026-dra-gpu-scheduling/index.md @@ -0,0 +1,292 @@ +--- +title: "From Device Plugin to DRA: GPU Scheduling Paradigm Upgrade and HAMi-DRA Practice Review" +date: "2026-03-23" +description: "This post reviews HAMi community's technical sharing at KCD Beijing 2026, exploring the paradigm shift from Device Plugin to DRA in GPU scheduling, and HAMi-DRA's practical experience and performance optimization." +tags: ["KCD", "DRA", "GPU", "Kubernetes", "AI", "scheduling"] +authors: [hami_community] +--- + +[KCD Beijing 2026](https://community.cncf.io/events/details/cncf-kcd-beijing-presents-kcd-beijing-vllm-2026/) was one of the largest Kubernetes community events in recent years. + +**Over 1,000 people registered, setting a new record for KCD Beijing.** + +The HAMi community not only gave a technical talk but also set up a booth, engaging deeply with developers and enterprise users from the cloud-native and AI infrastructure fields. + +The topic of this talk was: + +> **From Device Plugin to DRA: GPU Scheduling Paradigm Upgrade and HAMi-DRA Practice** + +This article combines the on-site presentation and slides for a more complete technical review. Slides download: [GitHub - HAMi-DRA KCD Beijing 2026](https://github.com/Project-HAMi/community/blob/main/talks/01-kcd-beijing-20260323/KCD-Beijing-2026-GPU-Scheduling-DRA-HAMi-Wang-Jifei-James-Deng.pdf). + + + +## HAMi Community at the Event + +The talk was delivered by two core contributors of the HAMi community: + +- Wang Jifei (Dynamia, HAMi Approver, main HAMi-DRA contributor) +- James Deng (Fourth Paradigm, HAMi Reviewer) + +They have long focused on: + +- GPU scheduling and virtualization +- Kubernetes resource models +- Heterogeneous compute management + +At the booth, the HAMi community discussed with attendees questions such as: + +- Is Kubernetes really suitable for AI workloads? +- Should GPUs be treated as "scheduling resources" rather than "devices"? +- How to introduce DRA without breaking the ecosystem? + +## Event Recap + +![Main conference hall](/img/kcd-beijing-2026/keynote.jpg) + +![Attendee registration](/img/kcd-beijing-2026/register.jpg) + +![Attendees visiting the HAMi booth](/img/kcd-beijing-2026/booth.jpg) + +![Volunteers stamping for attendees](/img/kcd-beijing-2026/booth2.jpg) + +![Wang Jifei presenting](/img/kcd-beijing-2026/wangjifei.jpg) + +![James Deng presenting](/img/kcd-beijing-2026/james.jpg) + +## GPU Scheduling Paradigm is Changing + +The core of this talk is not just DRA itself, but a bigger shift: + +> **GPUs are evolving from "devices" to "resource objects".** + +## 1. The Ceiling of Device Plugin + +The problem with the traditional model is its limited expressiveness: + +- Can only describe "quantity" (`nvidia.com/gpu: 1`) +- Cannot express: + - Multi-dimensional resources (memory / core / slice) + - Multi-card combinations + - Topology (NUMA / NVLink) + +👉 This directly leads to: + +- Scheduling logic leakage (extender / sidecar) +- Increased system complexity +- Limited concurrency + +## 2. DRA: Leap in Resource Modeling + +DRA's core advantages are: + +- **Multi-dimensional resource modeling** +- **Complete device lifecycle management** +- **Fine-grained resource allocation** + +Key change: + +> **Resource requests move from Pod fields → independent ResourceClaim objects** + +## Key Reality: DRA is Too Complex + +A key slide in the PPT, often overlooked: + +### 👉 DRA request looks like this + +```yaml +spec: + devices: + requests: + - exactly: + allocationMode: ExactCount + capacity: + requests: + memory: 4194304k + count: 1 +``` + +And you also need to write a CEL selector: + +```yaml +device.attributes["gpu.hami.io"].type == "hami-gpu" +``` + +### Compared to Device Plugin + +```yaml +resources: + limits: + nvidia.com/gpu: 1 +``` + +👉 The conclusion is clear: + +> **DRA is an upgrade in capability, but UX is clearly degraded.** + +## HAMi-DRA's Key Breakthrough: Automation + +One of the most valuable parts of this talk: + +### 👉 Webhook Automatically Generates ResourceClaim + +HAMi's approach is not to have users "use DRA directly", but: + +> **Let users keep using Device Plugin, and the system automatically converts to DRA** + +### How it works + +Input (user): + +```yaml +nvidia.com/gpu: 1 +nvidia.com/gpumemory: 4000 +``` + +↓ + +Webhook conversion: + +- Generate ResourceClaim +- Build CEL selector +- Inject device constraints (UUID / GPU type) + +↓ + +Output (system internal): + +- Standard DRA objects +- Schedulable resource expression + +### Core value + +> **Turn DRA from an "expert interface" into an interface ordinary users can use.** + +## DRA Driver: Real Implementation Complexity + +DRA driver is not just "registering resources", but full lifecycle management: + +### Three core interfaces + +- Publish Resources +- Prepare Resources +- Unprepare Resources + +### Real challenges + +- `libvgpu.so` injection +- `ld.so.preload` +- Environment variable management +- Temporary directories (cache / lock) + +👉 This means: + +> **GPU scheduling has entered the runtime orchestration layer, not just simple resource allocation.** + +## Performance Comparison: DRA is Not Just "More Elegant" + +A key benchmark from the PPT: + +### Pod creation time comparison + +- HAMi (traditional): up to ~42,000 +- HAMi-DRA: significantly reduced (~30%+ improvement) + +👉 This shows: + +> **DRA's resource pre-binding mechanism can reduce scheduling conflicts and retries** + +## Observability Paradigm Shift + +An underestimated change: + +### Traditional model + +- Resource info: from Node +- Usage: from Pod +- → Needs aggregation, inference + +### DRA model + +- ResourceSlice: device inventory +- ResourceClaim: resource allocation +- → **Resource perspective is first-class** + +👉 The change: + +> **Observability shifts from "inference" to "direct modeling"** + +## Unified Modeling for Heterogeneous Devices + +A key future direction from the PPT: + +> **If device attributes are standardized, a vendor-agnostic scheduling model is possible** + +For example: + +- PCIe root +- PCI bus ID +- GPU attributes + +👉 This is a bigger narrative: + +> **DRA is the starting point for heterogeneous compute abstraction** + +## Bigger Trend: Kubernetes is Becoming the AI Control Plane + +Connecting these points reveals a bigger trend: + +### 1. Node → Resource + +- From "scheduling machines" +- To "scheduling resource objects" + +### 2. Device → Virtual Resource + +- GPU is no longer just a card +- But a divisible, composable resource + +### 3. Imperative → Declarative + +- Scheduling logic → resource declaration + +👉 Essentially: + +> **Kubernetes is evolving into the AI Infra Control Plane** + +## HAMi's Position + +HAMi's positioning is becoming clearer: + +> **GPU Resource Layer on Kubernetes** + +- Downward: adapts to heterogeneous GPUs +- Upward: supports AI workloads (training / inference / Agent) +- Middle: scheduling + virtualization + abstraction + +HAMi-DRA: + +> **is the key step aligning this resource layer with Kubernetes native models** + +## Community Significance + +Another important point from this talk: + +- Contributors from different companies collaborated +- Validated in real production environments +- Shared experience through the community + +This is the way HAMi has always insisted on: + +> **Promoting AI infrastructure through community, not closed systems** + +## Summary + +The real value of this talk is not just introducing DRA, but answering a key question: + +> **How to turn a "correct but hard to use" model into a system you can use today?** + +HAMi-DRA's answer: + +- Don't change user habits +- Absorb DRA capabilities +- Handle complexity internally diff --git a/i18n/zh/code.json b/i18n/zh/code.json index 554ceba6..0010bc36 100644 --- a/i18n/zh/code.json +++ b/i18n/zh/code.json @@ -675,5 +675,17 @@ "theme.SearchModal.footer.backToSearchText": { "message": "Back to search", "description": "The back to search text for footer" + }, + "theme.hami.blog.meta.unknownAuthor": { + "message": "HAMi 社区", + "description": "Default author name for blog posts when no authors are specified" + }, + "theme.hami.blog.meta.authorLabel": { + "message": "作者", + "description": "Label for blog post author metadata" + }, + "theme.hami.blog.meta.publishedLabel": { + "message": "发布时间", + "description": "Label for blog post publish date metadata" } } diff --git a/i18n/zh/docusaurus-plugin-content-blog/kcd-beijing-2026-dra-gpu-scheduling/index.md b/i18n/zh/docusaurus-plugin-content-blog/kcd-beijing-2026-dra-gpu-scheduling/index.md new file mode 100644 index 00000000..3a66c562 --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-blog/kcd-beijing-2026-dra-gpu-scheduling/index.md @@ -0,0 +1,292 @@ +--- +title: "从 Device Plugin 到 DRA:GPU 调度范式升级与 HAMi-DRA 实践回顾" +date: "2026-03-23" +description: "本文回顾了 HAMi 社区在 KCD Beijing 2026 的技术分享,深入探讨了从 Device Plugin 到 DRA 的 GPU 调度范式升级,以及 HAMi-DRA 的实践经验和性能优化成果。" +tags: ["KCD", "DRA", "GPU", "Kubernetes", "AI", "调度"] +authors: [hami_community] +--- + +刚刚过去的 [KCD Beijing 2026](https://www.bagevent.com/event/kcd-beijing-2026),是近年来规模最大的一次 Kubernetes 社区大会之一。 + +**超过 1000 人报名参与,刷新历届 KCD 北京记录。** + +HAMi 社区不仅受邀进行了技术分享,也在现场设立了展台,与来自云原生与 AI 基础设施领域的开发者、企业用户进行了深入交流。 + +本次分享主题为: + +> **从 Device Plugin 到 DRA:GPU 调度范式升级与 HAMi-DRA 实践** + +本文结合现场分享内容与 PPT,做一次更完整的技术回顾。附幻灯片下载:[GitHub - HAMi-DRA KCD Beijing 2026](https://github.com/Project-HAMi/community/blob/main/talks/01-kcd-beijing-20260323/KCD-Beijing-2026-GPU-Scheduling-DRA-HAMi-Wang-Jifei-James-Deng.pdf)。 + + + +## HAMi 社区在现场 + +本次分享由两位 HAMi 社区核心贡献者完成: + +- 王纪飞(「Dynamia 密瓜智能」,HAMi Approver,HAMi-DRA 主要贡献者) +- James Deng(第四范式,HAMi Reviewer) + +他们长期专注于: + +- GPU 调度与虚拟化 +- Kubernetes 资源模型 +- 异构算力管理 + +同时,HAMi 社区在现场设有展台,与参会者围绕以下问题进行了大量交流: + +- Kubernetes 是否真的适合 AI workload? +- GPU 是否应该成为"调度资源",而不是"设备"? +- 如何在不破坏生态的情况下引入 DRA? + +## 现场回顾 + +![大会主会场](/img/kcd-beijing-2026/keynote.jpg) + +![观众注册中](/img/kcd-beijing-2026/register.jpg) + +![HAMi 展台前参会者前来交流打卡](/img/kcd-beijing-2026/booth.jpg) + +![志愿者在为观众盖章](/img/kcd-beijing-2026/booth2.jpg) + +![王纪飞正在分享中](/img/kcd-beijing-2026/wangjifei.jpg) + +![James Deng 正在分享](/img/kcd-beijing-2026/james.jpg) + +## GPU 调度范式正在发生变化 + +这次分享的核心,其实不是 DRA 本身,而是一个更大的转变: + +> **GPU 正在从"设备"变成"资源对象"。** + +## 1. Device Plugin 的天花板 + +传统模型的问题,本质上在于表达能力: + +- 只能描述"数量"(`nvidia.com/gpu: 1`) +- 无法表达: + - 多维资源(显存 / core / slice) + - 多卡组合 + - 拓扑(NUMA / NVLink) + +👉 这直接导致: + +- 调度逻辑外溢(extender / sidecar) +- 系统复杂度上升 +- 并发能力受限 + +## 2. DRA:资源建模能力的跃迁 + +DRA 的核心优势是: + +- **多维资源建模能力** +- **完整设备生命周期管理** +- **细粒度资源分配能力** + +关键变化: + +> **资源申请从 Pod 内嵌字段 → 独立 ResourceClaim 对象** + +## 关键现实问题:DRA 太复杂了 + +PPT 里有一页非常关键,很多人会忽略: + +### 👉 DRA 请求长这样 + +```yaml +spec: + devices: + requests: + - exactly: + allocationMode: ExactCount + capacity: + requests: + memory: 4194304k + count: 1 +``` + +同时还要写 CEL selector: + +```yaml +device.attributes["gpu.hami.io"].type == "hami-gpu" +``` + +### 对比 Device Plugin + +```yaml +resources: + limits: + nvidia.com/gpu: 1 +``` + +👉 结论非常明确: + +> **DRA 是能力升级,但 UX 明显退化。** + +## HAMi-DRA 的关键突破:自动化 + +这是这次分享最有价值的部分之一: + +### 👉 Webhook 自动生成 ResourceClaim + +HAMi 的做法不是让用户"直接用 DRA",而是: + +> **让用户继续用 Device Plugin,用系统自动转换成 DRA** + +### 工作机制 + +输入(用户): + +```yaml +nvidia.com/gpu: 1 +nvidia.com/gpumemory: 4000 +``` + +↓ + +Webhook 转换: + +- 生成 ResourceClaim +- 构造 CEL selector +- 注入设备约束(UUID / GPU 类型) + +↓ + +输出(系统内部): + +- 标准 DRA 对象 +- 可调度资源表达 + +### 核心价值 + +> **把 DRA 从"专家接口",变成"普通用户可用接口"。** + +## DRA Driver:真正的落地复杂度 + +DRA driver 并不只是"注册资源",而是完整 lifecycle 管理: + +### 三个核心接口 + +- Publish Resources +- Prepare Resources +- Unprepare Resources + +### 实际挑战 + +- `libvgpu.so` 注入 +- `ld.so.preload` +- 环境变量管理 +- 临时目录(cache / lock) + +👉 这意味着: + +> **GPU 调度已经进入 runtime orchestration 层,而不是简单资源分配。** + +## 性能对比:DRA 并不只是"更优雅" + +PPT 中给出了一个很关键的 benchmark: + +### Pod 创建时间对比 + +- HAMi(传统):最高 ~42,000 +- HAMi-DRA:显著下降(~30%+ 改善) + +👉 这说明: + +> **DRA 的资源预绑定机制,可以减少调度阶段冲突和重试** + +## 可观测性的范式变化 + +这是一个被低估的变化: + +### 传统模型 + +- 资源信息:来自 Node +- 使用情况:来自 Pod +- → 需要聚合、推导 + +### DRA 模型 + +- ResourceSlice:设备库存 +- ResourceClaim:资源分配 +- → **资源视角是第一等公民** + +👉 这带来的变化: + +> **Observability 从"推导"变成"直接建模"** + +## 异构设备的统一建模 + +PPT 提出了一个非常关键的未来方向: + +> **如果设备属性标准化,可以实现 vendor-agnostic 调度模型** + +例如: + +- PCIe root +- PCI bus ID +- GPU attributes + +👉 这其实是一个更大的叙事: + +> **DRA 是 heterogeneous compute abstraction 的起点** + +## 更大的趋势:Kubernetes 正在成为 AI 控制平面 + +把这些点串起来,其实可以看到一个更大的趋势: + +### 1. Node → Resource + +- 从"调度机器" +- 到"调度资源对象" + +### 2. Device → Virtual Resource + +- GPU 不再是卡 +- 而是可切分、组合的资源 + +### 3. Imperative → Declarative + +- 调度逻辑 → 资源声明 + +👉 本质上: + +> **Kubernetes 正在进化为 AI Infra Control Plane** + +## 🌱 HAMi 在其中的位置 + +HAMi 的定位正在逐渐清晰: + +> **Kubernetes 上的 GPU Resource Layer** + +- 向下:适配异构 GPU +- 向上:支撑 AI workload(训练 / 推理 / Agent) +- 中间:调度 + 虚拟化 + 抽象 + +而 HAMi-DRA: + +> **是这个资源层与 Kubernetes 原生模型对齐的关键一步** + +## 社区的意义 + +这次分享还有一个很重要的点: + +- 来自不同公司的贡献者共同完成 +- 在真实生产环境中验证 +- 通过社区分享经验 + +这也是 HAMi 一直坚持的方式: + +> **用社区推动 AI 基础设施,而不是封闭系统** + +## 总结 + +这次分享真正的价值不只是介绍 DRA,而是回答了一个关键问题: + +> **如何把一个"正确但难用"的模型,变成"今天就能用的系统"?** + +HAMi-DRA 给出的答案是: + +- 不改变用户习惯 +- 吸收 DRA 能力 +- 内部完成复杂性转化 diff --git a/src/theme/BlogPostItem/Header/index.js b/src/theme/BlogPostItem/Header/index.js index b3afae85..d8e1e9c4 100644 --- a/src/theme/BlogPostItem/Header/index.js +++ b/src/theme/BlogPostItem/Header/index.js @@ -1,25 +1,44 @@ import React from 'react'; import clsx from 'clsx'; import useBaseUrl from '@docusaurus/useBaseUrl'; +import useDocusaurusContext from '@docusaurus/useDocusaurusContext'; import {useBlogPost} from '@docusaurus/plugin-content-blog/client'; +import Translate from '@docusaurus/Translate'; import BlogPostItemHeaderTitle from '@theme/BlogPostItem/Header/Title'; import BlogPostItemHeaderInfo from '@theme/BlogPostItem/Header/Info'; import BlogPostItemHeaderAuthors from '@theme/BlogPostItem/Header/Authors'; import styles from './styles.module.css'; -function formatAuthors(metadata) { +function formatDate(date, locale) { + const language = locale === 'zh' ? 'zh-CN' : 'en-US'; + return new Date(date).toLocaleDateString(language, { + year: 'numeric', + month: 'numeric', + day: 'numeric', + }); +} + +function formatAuthors(metadata, fallbackLabel) { const names = (metadata.authors || []) .map((author) => author?.name) .filter(Boolean); if (names.length === 0) { - return 'HAMi Community'; + return fallbackLabel; } return names.join(', '); } export default function BlogPostItemHeader() { const {metadata, frontMatter, isBlogPostPage} = useBlogPost(); + const {i18n} = useDocusaurusContext(); const cover = frontMatter.cover; + const fallbackLabel = ( + + HAMi Community + + ); return (
@@ -32,10 +51,26 @@ export default function BlogPostItemHeader() { {isBlogPostPage && (
- Author: {formatAuthors(metadata)} + + + Author + + : + {' '} + {formatAuthors(metadata, fallbackLabel)}
- Published: {new Date(metadata.date).toLocaleDateString()} + + + Published + + : + {' '} + {formatDate(metadata.date, i18n.currentLocale)}
)} diff --git a/static/img/kcd-beijing-2026/booth.jpg b/static/img/kcd-beijing-2026/booth.jpg new file mode 100644 index 00000000..27752193 Binary files /dev/null and b/static/img/kcd-beijing-2026/booth.jpg differ diff --git a/static/img/kcd-beijing-2026/booth2.jpg b/static/img/kcd-beijing-2026/booth2.jpg new file mode 100644 index 00000000..7501161e Binary files /dev/null and b/static/img/kcd-beijing-2026/booth2.jpg differ diff --git a/static/img/kcd-beijing-2026/james.jpg b/static/img/kcd-beijing-2026/james.jpg new file mode 100644 index 00000000..0f507f7c Binary files /dev/null and b/static/img/kcd-beijing-2026/james.jpg differ diff --git a/static/img/kcd-beijing-2026/keynote.jpg b/static/img/kcd-beijing-2026/keynote.jpg new file mode 100644 index 00000000..9859fe96 Binary files /dev/null and b/static/img/kcd-beijing-2026/keynote.jpg differ diff --git a/static/img/kcd-beijing-2026/register.jpg b/static/img/kcd-beijing-2026/register.jpg new file mode 100644 index 00000000..321590bb Binary files /dev/null and b/static/img/kcd-beijing-2026/register.jpg differ diff --git a/static/img/kcd-beijing-2026/wangjifei.jpg b/static/img/kcd-beijing-2026/wangjifei.jpg new file mode 100644 index 00000000..67c5c75f Binary files /dev/null and b/static/img/kcd-beijing-2026/wangjifei.jpg differ