Skip to content

Conversation

@vlerenc
Copy link
Member

@vlerenc vlerenc commented Nov 12, 2025

What this PR does / why we need it:
@vasu1124 suggested various changes after this blog was already merged, based on similar changes in another blog published under ApeiroRA. This update shall catch up this blog to the new facts and statements.

Special notes for your reviewer:
@vasu1124 Please take over from here, make modifications or not, and either close or merge this PR as you see fit.

@vlerenc vlerenc requested a review from a team as a code owner November 12, 2025 11:07
@gardener-robot gardener-robot added needs/review Needs review size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 12, 2025
@vlerenc vlerenc requested a review from vasu1124 November 12, 2025 11:13
Copy link
Contributor

@n-boshnakov n-boshnakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, just a minor nit.

## What is Kubernetes AI Conformance?

As AI/ML applications become more prevalent, the need for a standardized environment to run them on Kubernetes has become critical. The CNCF's Kubernetes AI Conformance Working Group was established to address this need. It aims to define a clear, verifiable set of requirements that a Kubernetes distribution must meet to be considered "AI Conformant."
As AI/ML applications become more prevalent and crucial for business, the need for standardized environments [1] to run them has become critical. The CNCF's Kubernetes AI Conformance Working Group was established to address this need. It aims to define a clear, verifiable set of requirements that a Kubernetes distribution must meet to be considered "AI Conformant". In fact, equipped with these requirements CNCF established the [**Certified** Kubernetes AI Conformance Program](https://github.com/cncf/k8s-ai-conformance).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As AI/ML applications become more prevalent and crucial for business, the need for standardized environments [1] to run them has become critical. The CNCF's Kubernetes AI Conformance Working Group was established to address this need. It aims to define a clear, verifiable set of requirements that a Kubernetes distribution must meet to be considered "AI Conformant". In fact, equipped with these requirements CNCF established the [**Certified** Kubernetes AI Conformance Program](https://github.com/cncf/k8s-ai-conformance).
As AI/ML applications become more prevalent and crucial for business, the need for standardized environments [1] to run them has become critical. The CNCF's Kubernetes AI Conformance Working Group was established to address this need. It aims to define a clear, verifiable set of requirements that a Kubernetes distribution must meet to be considered "AI Conformant". In fact, equipped with these requirements, CNCF established the [**Certified** Kubernetes AI Conformance Program](https://github.com/cncf/k8s-ai-conformance).

@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Nov 13, 2025
* **Workload Execution:** By passing the conformance tests, Gardener proves that it can reliably run sample AI/ML workloads that utilize GPU acceleration, confirming that the entire stackfrom the operating system to the Kubernetes control planeis functioning correctly.
* **GPU Discovery and Allocation:** Gardener-managed clusters correctly identify available GPUs on worker nodes and make them schedulable resources within Kubernetes. This allows users to simply request, for example, `nvidia.com/gpu` resources in their pod specifications.
* **Driver and Runtime Integrity:** The conformance verifies that the correct drivers and container runtimes are in place to expose GPUs to containers. Gardener’s managed approach guarantees that these components are correctly installed and versioned.
* **Workload Execution:** By passing the conformance tests, Gardener proves that it can reliably run sample AI/ML workloads that utilize GPU acceleration, confirming that the entire stack - from the operating system to the Kubernetes control plane - is functioning correctly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mega-nit: Let's use en-dashes for breaking the sentence

Suggested change
* **Workload Execution:** By passing the conformance tests, Gardener proves that it can reliably run sample AI/ML workloads that utilize GPU acceleration, confirming that the entire stack - from the operating system to the Kubernetes control plane - is functioning correctly.
* **Workload Execution:** By passing the conformance tests, Gardener proves that it can reliably run sample AI/ML workloads that utilize GPU acceleration, confirming that the entire stack from the operating system to the Kubernetes control plane is functioning correctly.

#### Meeting the Conformance Requirements
This ensures the correct drivers are installed and configured for your GPU nodes. Users no longer have to manually handle driver installations, version mismatches, or kernel module compilations. When you request a worker node with a GPU, the Operator ensures that it is ready for your AI workloads with the necessary drivers, software assets, and libraries, making the powerful hardware directly accessible to your Kubernetes pods.

[3]: In Apeiro, we enabled the [NVIDIA GPU Operator](./2025-08-25-garden-linux-enabling-ai-workloads-with-nvidia-gpus). Other GPU hardware will be supported in a similar fashion. Our goal is to extend this powerful, hands-off approach to a broader range of hardware accelerators, further strengthening the hardware sovereignty of our users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have the blog entry present in our own repository. In any case, as this is the first appearance of the word "Apeiro" on this page, we need to ensure that we link out to the Apeiro website/docs.

Suggested change
[3]: In Apeiro, we enabled the [NVIDIA GPU Operator](./2025-08-25-garden-linux-enabling-ai-workloads-with-nvidia-gpus). Other GPU hardware will be supported in a similar fashion. Our goal is to extend this powerful, hands-off approach to a broader range of hardware accelerators, further strengthening the hardware sovereignty of our users.
[3]: In Apeiro, we enabled the [NVIDIA GPU Operator](https://documentation.apeirora.eu/blog/2025-08-25-garden-linux-enabling-ai-workloads-with-nvidia-gpus). Other GPU hardware will be supported in a similar fashion. Our goal is to extend this powerful, hands-off approach to a broader range of hardware accelerators, further strengthening the hardware sovereignty of our users.

@gardener-robot
Copy link

@vasu1124 You have pull request review open invite, please check

@n-boshnakov
Copy link
Contributor

@vasu1124 Hello, Vasu. Could you please take a look at the suggestions?

@gardener-ci-robot
Copy link

The Gardener project currently lacks enough active contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:

  • After 15d of inactivity, lifecycle/stale is applied
  • After 15d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 7d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Mark this PR as rotten with /lifecycle rotten
  • Close this PR with /close

/lifecycle stale

@gardener-robot gardener-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs/changes Needs (more) changes needs/review Needs review size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants