Skip to content

Support --gpus(GPU-accelerated container) option in rkl.#593

Open
Cosh-y wants to merge 1 commit intork8s-dev:mainfrom
Cosh-y:support-gpu-accel-co-rkl
Open

Support --gpus(GPU-accelerated container) option in rkl.#593
Cosh-y wants to merge 1 commit intork8s-dev:mainfrom
Cosh-y:support-gpu-accel-co-rkl

Conversation

@Cosh-y
Copy link
Copy Markdown
Contributor

@Cosh-y Cosh-y commented Apr 18, 2026

  • Use the nvidia-container-runtime-hook to map the GPU device into the container.
  • The execution timing of the hook is set to "createRuntime" of the youki component.
  • In order to ensure the accuracy and clarity of the execution timing of the hook, I updated the version of the dependent youki(libcontainerd).
  • The way to specify the mapping of GPU devices when creating a container is as follows
apiVersion: v1
kind: Pod
metadata:
  name: simple-container-task  
  labels:
    app: my-app 
    bundle: /home/ckd/chy/rk8s/project/test/bundles/pause
spec:
  containers:
    - name: main-container1    
      image: /home/ckd/chy/rk8s/project/test/bundles/cuda-test/bundle
      args:             
        - sh          
      tty: true
      gpus:  // new gpus fields
        enabled: true
        deviceIds: ["0"]
      ports:
        - containerPort: 80
      resources:
        limits:
          cpu: "500m"
          memory: "512Mi"

--gpus option is actually fields of ContainerSpec in pod.yaml.

Signed-off-by: harry <ambition0316@126.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2d72d8cee2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread project/Cargo.toml
slayerfs = { path = "slayerfs" }
libcgroups = { version = "0.5.7", default-features = false }
libcontainer = "0.5.7"
libcontainer = "0.6.0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Sync Buck targets with upgraded OCI/libcontainer versions

This change bumps workspace dependencies to libcontainer = "0.6.0" and oci-spec = "0.9.0", but the Buck manifests still pin libcontainer/0.5.7 and oci-spec/0.8.4 (for example in project/libruntime/BUCK, project/rkl/BUCK, and project/rkforge/BUCK), so Cargo and Buck builds now compile against different runtime stacks. I checked .github/workflows/base.yml and CI runs buck2 build //project/..., so this drift can break or silently bypass the new GPU hook/runtime behavior in the Buck path; the BUCK dependencies should be regenerated/updated in the same commit.

Useful? React with 👍 / 👎.

Comment on lines +327 to +328
fn is_executable_file(path: &Path) -> bool {
path.is_file()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check execute permission for selected NVIDIA hook path

is_executable_file currently returns true for any regular file, so find_nvidia_container_runtime_hook() can select a non-executable nvidia-container-runtime-hook from PATH and pass it into OCI hooks. In that case container creation fails later with a hook execution error instead of falling back to a valid binary; this helper should verify execute permission, not just file existence.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant