-
Notifications
You must be signed in to change notification settings - Fork 70
[WIP] add th04 image konflux and update base image to latest #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughThis PR introduces two new Tekton PipelineRun configurations for orchestrating multi-arch training image builds (th04-cuda128-torch29-py312-rhel9) and updates training image dependencies. The pipelines define comprehensive workflows with repository cloning, dependency prefetching, multi-platform builds, security scanning, and artifact management. Associated image dependencies and documentation are also updated. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.tekton/odh-training-th04-cuda128-torch29-py312-rhel9-push.yaml (1)
1-18: Ensure push and pull-request pipelines are kept in sync.Both
.tekton/odh-training-th04-cuda128-torch29-py312-rhel9-push.yamland.tekton/odh-training-th04-cuda128-torch29-py312-rhel9-pull-request.yamldefine nearly identical task pipelines. When updating task versions or parameters, both files must be kept synchronized.Consider documenting this dependency or using a shared base definition (e.g., via Tekton Bundles or a shared PipelineSpec) to reduce maintenance burden and prevent divergence between the two manifests.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.tekton/odh-training-th04-cuda128-torch29-py312-rhel9-pull-request.yaml(1 hunks).tekton/odh-training-th04-cuda128-torch29-py312-rhel9-push.yaml(1 hunks)images/universal/training/README.md(2 hunks)images/universal/training/th04-cuda128-torch290-py312/Dockerfile(1 hunks)images/universal/training/th04-cuda128-torch290-py312/pylock.toml(1 hunks)images/universal/training/th04-cuda128-torch290-py312/pyproject.toml(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
images/universal/training/README.md
59-59: Link text should be descriptive
(MD059, descriptive-link-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / odh-training-th04-cuda128-torch29-py312-rhel9-on-pull-request
🔇 Additional comments (7)
images/universal/training/th04-cuda128-torch290-py312/pyproject.toml (1)
14-14: I'll help you verify the coordinated updates across the image build files. Let me run verification scripts to confirm the jupyterlab version consistency and base image tag alignment.
<function_calls>
#!/bin/bashVerify consistent jupyterlab version across related files in the th04 image directory
IMAGE_DIR="images/universal/training/th04-cuda128-torch290-py312"
echo "=== Checking jupyterlab version in pyproject.toml ==="
rg -n 'jupyterlab==' "$IMAGE_DIR/pyproject.toml" || echo "Not found"echo -e "\n=== Checking jupyterlab version in pylock.toml ==="
if [ -f "$IMAGE_DIR/pylock.toml" ]; then
rg -n 'name = "jupyterlab"' -A 3 "$IMAGE_DIR/pylock.toml" | head -20 || echo "Not found"
else
echo "File not found: $IMAGE_DIR/pylock.toml"
fiecho -e "\n=== Checking base image reference in Dockerfile ==="
if [ -f "$IMAGE_DIR/Dockerfile" ]; then
rg -n 'FROM.*odh-workbench' "$IMAGE_DIR/Dockerfile" || rg -n 'FROM ' "$IMAGE_DIR/Dockerfile" | head -5
else
echo "File not found: $IMAGE_DIR/Dockerfile"
fiecho -e "\n=== Checking README.md for jupyterlab or base image mentions ==="
if [ -f "$IMAGE_DIR/README.md" ]; then
rg -n -i 'jupyterlab|odh-workbench' "$IMAGE_DIR/README.md" || echo "No matches found"
else
echo "File not found: $IMAGE_DIR/README.md"
fi
</function_calls><function_calls>
#!/bin/bashList files in the image directory to understand structure
IMAGE_DIR="images/universal/training/th04-cuda128-torch290-py312"
echo "=== Files in image directory ==="
ls -la "$IMAGE_DIR/" || echo "Directory not found"
</function_calls>images/universal/training/th04-cuda128-torch290-py312/pylock.toml (1)
1391-1395: jupyterlab version bump is properly reflected with updated hashes.The patch version bump from 4.4.4 to 4.4.9 includes correctly updated source URLs and cryptographic hashes for both the sdist and wheels artifacts. The changes are consistent with the coordinated update mentioned in the AI summary for
pyproject.toml.Verify that this version bump aligns with the PR objective of updating to the latest base image and that the build and integration tests confirm compatibility with the full stack (CUDA 12.8, PyTorch 2.9.0, Python 3.12, RHEL 9).
.tekton/odh-training-th04-cuda128-torch29-py312-rhel9-pull-request.yaml (3)
627-627: Unfortunately, I'm unable to complete the verification due to a repository cloning failure. The sandbox environment encountered an error accessing theopendatahub-io/distributed-workloadsrepository.Manual verification required for service account existence.
The review comment references
serviceAccountName: build-pipeline-odh-training-th04-cuda128-torch29-py312-rhel9and requests confirmation that this service account is created in theopen-data-hub-tenantnamespace. Without repository access, I cannot verify:
- Whether this service account is defined within the repository
- How/where it's provisioned (e.g., via separate manifests, operator, or pre-existing infrastructure)
- Whether the PipelineRun configuration properly assumes it exists before execution
Please confirm that the service account
build-pipeline-odh-training-th04-cuda128-torch29-py312-rhel9is created in theopen-data-hub-tenantnamespace prior to running this pipeline.
11-12: CEL expression syntax is valid and follows Tekton Pipelines as Code conventions correctly.The trigger condition uses proper syntax:
event == "pull_request"andtarget_branch == "main"are valid CEL conditions.pathChanged()is the correct suffix function for glob-based file change detection- Multiple paths are properly combined with OR operators (
||) and grouped with parentheses- All string literals are properly quoted
No issues found.
139-147: Verify that task bundle SHA256 digests are pinned correctly in both pull-request and push manifests for reproducibility and security.The pipeline references Tekton task bundles from quay.io/konflux-ci registries with SHA256 digests. Confirm that:
- A corresponding push.yaml file exists with identical task bundle references
- All SHA256 digests at lines 139-147, 156-167, 188-196, and 242-250 are current and accessible in the Konflux registries
To validate current digests locally:
skopeo inspect --no-tags docker://quay.io/konflux-ci/tekton-catalog/task-init:0.2 | jq -r '.Digest'images/universal/training/th04-cuda128-torch290-py312/Dockerfile (1)
12-12: Base image tag could not be verified in public registry.The BASE_IMAGE tag
odh-workbench-jupyter-minimal-cuda-py312-ubi9:2025b-v1.39could not be found in the public quay.io/opendatahub repository inventory. Before merging, confirm this tag exists in the internal or staging registry, or verify the correct tag name if this is a new release that has not yet been published publicly..tekton/odh-training-th04-cuda128-torch29-py312-rhel9-push.yaml (1)
26-29: Based on my verification attempts, I've encountered infrastructure limitations and discovered important context about theimage-expires-afterparameter:Key Finding:
image-expires-afteris not a native Tekton Pipeline/PipelineRun field. The Tekton Buildah task has no built-in image expiration parameter. Image expiration/retention is typically a registry-level concern, not a task-level concern. This suggests the repository uses a custom parameter or implementation (possibly Konflux CI or a custom extension).Unable to Verify: Due to repository access issues, I cannot inspect:
- The actual task definitions and their default behaviors
- How this custom parameter is implemented and handled
- Whether the omission in the push pipeline is intentional
- Registry-level defaults applied by the custom implementation
Confirm the intended behavior for main-branch image retention. The push pipeline omits
image-expires-after(unlike PR builds with 5-day expiry). Without seeing the custom task implementation, it's unclear if this allows indefinite persistence or if a different default applies. Verify: (1) the task's default when the parameter is omitted, and (2) whether indefinite retention is intentional for production images.
| 1. **Base Image Dependencies** (from `workbench-images`) | ||
| - JupyterLab, notebook extensions, authentication | ||
| - Pre-installed in the base image via its own `pylock.toml` | ||
| - Latest image can be found [here](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use descriptive link text instead of "here".
The link text should be descriptive for accessibility and SEO. Replace "here" with text that describes the link destination.
Apply this diff:
- - Latest image can be found [here](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2)
+ - Latest image can be found at [opendatahub-io/notebooks latest base image](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Latest image can be found [here](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2) | |
| - Latest image can be found at [opendatahub-io/notebooks latest base image](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2) |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
59-59: Link text should be descriptive
(MD059, descriptive-link-text)
🤖 Prompt for AI Agents
In images/universal/training/README.md around line 59, the markdown uses
non-descriptive link text "here"; replace that with descriptive link text that
explains the destination (for example "params-latest.env" or "latest base image
parameters") so the link reads like "Latest image can be found in
params-latest.env" (or similar descriptive phrasing) and update the link target
unchanged.
|
/retest |
Description
Provide Konflux config for universal image with training hub 0.4.0
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.