Skip to content

Conversation

@MStokluska
Copy link
Contributor

@MStokluska MStokluska commented Nov 28, 2025

Description

Provide Konflux config for universal image with training hub 0.4.0

How Has This Been Tested?

  • builds locally

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work
  • Konflux build passes

Summary by CodeRabbit

  • Documentation

    • Updated training image documentation to reflect PyTorch 2.9.0 and more generic feature descriptions.
  • Chores

    • Added new CI/CD pipeline configurations for multi-architecture training image builds.
    • Updated training environment dependencies, including JupyterLab to v4.4.9 and base image reference.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci
Copy link

openshift-ci bot commented Nov 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign pawelpaszki for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Walkthrough

This PR introduces two new Tekton PipelineRun configurations for orchestrating multi-arch training image builds (th04-cuda128-torch29-py312-rhel9) and updates training image dependencies. The pipelines define comprehensive workflows with repository cloning, dependency prefetching, multi-platform builds, security scanning, and artifact management. Associated image dependencies and documentation are also updated.

Changes

Cohort / File(s) Summary
Tekton Pipeline Configuration
.tekton/odh-training-th04-cuda128-torch29-py312-rhel9-pull-request.yaml, .tekton/odh-training-th04-cuda128-torch29-py312-rhel9-push.yaml
Introduces two comprehensive PipelineRun manifests for CI/CD automation. Defines multi-task workflows including repository cloning, dependency prefetching, matrix-based multi-platform image builds (via build-platforms and image-platform), optional source image builds, deprecated image checks, security and scanning tasks (SAST, Clair, Coverity, Snyk, ClamAV, shell/unicode checks), tag application, and RPM signature scanning. Includes conditional execution logic, OCI artifact usage, workspace configuration, and service account setup.
Training Image Dependencies
images/universal/training/th04-cuda128-torch290-py312/Dockerfile, images/universal/training/th04-cuda128-torch290-py312/pylock.toml, images/universal/training/th04-cuda128-torch290-py312/pyproject.toml
Updates base image reference from cuda-jupyter-minimal-ubi9-python-3.12-2025a_20250903 to odh-workbench-jupyter-minimal-cuda-py312-ubi9:2025b-v1.39. Bumps JupyterLab dependency from 4.4.4 to 4.4.9 across lock and project configuration files.
Documentation
images/universal/training/README.md
Updates feature wording to be more generic, renames section to "Latest installed ML Packages", updates PyTorch version reference from 2.8.0 to 2.9.0, and adds link to latest image location.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Tekton PipelineRun complexity: Both pipeline manifests contain substantial task orchestration with conditional execution, matrix builds, and multiple security scanning integrations. Review should verify task dependencies, conditional logic (when clauses), workspace bindings, and artifact propagation between tasks.
  • Dependency version changes: Verify JupyterLab 4.4.4 → 4.4.9 compatibility and base image reference alignment across files.
  • Configuration consistency: Ensure pull-request and push pipelines follow consistent patterns and reference the same task bundles and digests.

Poem

🐰 A pipeline of tasks, both new and grand,
Multi-arch builds now orchestrated and planned,
JupyterLab hops up, version twenty-point-nine,
Security scans dance in a scanning design,
The training image blooms—oh, how it will shine! 🎯

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title accurately captures the main changes: adding Konflux configuration for th04 image and updating to the latest base image, which aligns with the PR objectives and file changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.tekton/odh-training-th04-cuda128-torch29-py312-rhel9-push.yaml (1)

1-18: Ensure push and pull-request pipelines are kept in sync.

Both .tekton/odh-training-th04-cuda128-torch29-py312-rhel9-push.yaml and .tekton/odh-training-th04-cuda128-torch29-py312-rhel9-pull-request.yaml define nearly identical task pipelines. When updating task versions or parameters, both files must be kept synchronized.

Consider documenting this dependency or using a shared base definition (e.g., via Tekton Bundles or a shared PipelineSpec) to reduce maintenance burden and prevent divergence between the two manifests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65cc09e and c18d397.

📒 Files selected for processing (6)
  • .tekton/odh-training-th04-cuda128-torch29-py312-rhel9-pull-request.yaml (1 hunks)
  • .tekton/odh-training-th04-cuda128-torch29-py312-rhel9-push.yaml (1 hunks)
  • images/universal/training/README.md (2 hunks)
  • images/universal/training/th04-cuda128-torch290-py312/Dockerfile (1 hunks)
  • images/universal/training/th04-cuda128-torch290-py312/pylock.toml (1 hunks)
  • images/universal/training/th04-cuda128-torch290-py312/pyproject.toml (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
images/universal/training/README.md

59-59: Link text should be descriptive

(MD059, descriptive-link-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / odh-training-th04-cuda128-torch29-py312-rhel9-on-pull-request
🔇 Additional comments (7)
images/universal/training/th04-cuda128-torch290-py312/pyproject.toml (1)

14-14: I'll help you verify the coordinated updates across the image build files. Let me run verification scripts to confirm the jupyterlab version consistency and base image tag alignment.
<function_calls>

#!/bin/bash

Verify consistent jupyterlab version across related files in the th04 image directory

IMAGE_DIR="images/universal/training/th04-cuda128-torch290-py312"

echo "=== Checking jupyterlab version in pyproject.toml ==="
rg -n 'jupyterlab==' "$IMAGE_DIR/pyproject.toml" || echo "Not found"

echo -e "\n=== Checking jupyterlab version in pylock.toml ==="
if [ -f "$IMAGE_DIR/pylock.toml" ]; then
rg -n 'name = "jupyterlab"' -A 3 "$IMAGE_DIR/pylock.toml" | head -20 || echo "Not found"
else
echo "File not found: $IMAGE_DIR/pylock.toml"
fi

echo -e "\n=== Checking base image reference in Dockerfile ==="
if [ -f "$IMAGE_DIR/Dockerfile" ]; then
rg -n 'FROM.*odh-workbench' "$IMAGE_DIR/Dockerfile" || rg -n 'FROM ' "$IMAGE_DIR/Dockerfile" | head -5
else
echo "File not found: $IMAGE_DIR/Dockerfile"
fi

echo -e "\n=== Checking README.md for jupyterlab or base image mentions ==="
if [ -f "$IMAGE_DIR/README.md" ]; then
rg -n -i 'jupyterlab|odh-workbench' "$IMAGE_DIR/README.md" || echo "No matches found"
else
echo "File not found: $IMAGE_DIR/README.md"
fi


</function_calls>

<function_calls>

#!/bin/bash

List files in the image directory to understand structure

IMAGE_DIR="images/universal/training/th04-cuda128-torch290-py312"
echo "=== Files in image directory ==="
ls -la "$IMAGE_DIR/" || echo "Directory not found"


</function_calls>

images/universal/training/th04-cuda128-torch290-py312/pylock.toml (1)

1391-1395: jupyterlab version bump is properly reflected with updated hashes.

The patch version bump from 4.4.4 to 4.4.9 includes correctly updated source URLs and cryptographic hashes for both the sdist and wheels artifacts. The changes are consistent with the coordinated update mentioned in the AI summary for pyproject.toml.

Verify that this version bump aligns with the PR objective of updating to the latest base image and that the build and integration tests confirm compatibility with the full stack (CUDA 12.8, PyTorch 2.9.0, Python 3.12, RHEL 9).

.tekton/odh-training-th04-cuda128-torch29-py312-rhel9-pull-request.yaml (3)

627-627: Unfortunately, I'm unable to complete the verification due to a repository cloning failure. The sandbox environment encountered an error accessing the opendatahub-io/distributed-workloads repository.

Manual verification required for service account existence.

The review comment references serviceAccountName: build-pipeline-odh-training-th04-cuda128-torch29-py312-rhel9 and requests confirmation that this service account is created in the open-data-hub-tenant namespace. Without repository access, I cannot verify:

  • Whether this service account is defined within the repository
  • How/where it's provisioned (e.g., via separate manifests, operator, or pre-existing infrastructure)
  • Whether the PipelineRun configuration properly assumes it exists before execution

Please confirm that the service account build-pipeline-odh-training-th04-cuda128-torch29-py312-rhel9 is created in the open-data-hub-tenant namespace prior to running this pipeline.


11-12: CEL expression syntax is valid and follows Tekton Pipelines as Code conventions correctly.

The trigger condition uses proper syntax:

  • event == "pull_request" and target_branch == "main" are valid CEL conditions
  • .pathChanged() is the correct suffix function for glob-based file change detection
  • Multiple paths are properly combined with OR operators (||) and grouped with parentheses
  • All string literals are properly quoted

No issues found.


139-147: Verify that task bundle SHA256 digests are pinned correctly in both pull-request and push manifests for reproducibility and security.

The pipeline references Tekton task bundles from quay.io/konflux-ci registries with SHA256 digests. Confirm that:

  1. A corresponding push.yaml file exists with identical task bundle references
  2. All SHA256 digests at lines 139-147, 156-167, 188-196, and 242-250 are current and accessible in the Konflux registries

To validate current digests locally:

skopeo inspect --no-tags docker://quay.io/konflux-ci/tekton-catalog/task-init:0.2 | jq -r '.Digest'
images/universal/training/th04-cuda128-torch290-py312/Dockerfile (1)

12-12: Base image tag could not be verified in public registry.

The BASE_IMAGE tag odh-workbench-jupyter-minimal-cuda-py312-ubi9:2025b-v1.39 could not be found in the public quay.io/opendatahub repository inventory. Before merging, confirm this tag exists in the internal or staging registry, or verify the correct tag name if this is a new release that has not yet been published publicly.

.tekton/odh-training-th04-cuda128-torch29-py312-rhel9-push.yaml (1)

26-29: Based on my verification attempts, I've encountered infrastructure limitations and discovered important context about the image-expires-after parameter:

Key Finding: image-expires-after is not a native Tekton Pipeline/PipelineRun field. The Tekton Buildah task has no built-in image expiration parameter. Image expiration/retention is typically a registry-level concern, not a task-level concern. This suggests the repository uses a custom parameter or implementation (possibly Konflux CI or a custom extension).

Unable to Verify: Due to repository access issues, I cannot inspect:

  • The actual task definitions and their default behaviors
  • How this custom parameter is implemented and handled
  • Whether the omission in the push pipeline is intentional
  • Registry-level defaults applied by the custom implementation

Confirm the intended behavior for main-branch image retention. The push pipeline omits image-expires-after (unlike PR builds with 5-day expiry). Without seeing the custom task implementation, it's unclear if this allows indefinite persistence or if a different default applies. Verify: (1) the task's default when the parameter is omitted, and (2) whether indefinite retention is intentional for production images.

1. **Base Image Dependencies** (from `workbench-images`)
- JupyterLab, notebook extensions, authentication
- Pre-installed in the base image via its own `pylock.toml`
- Latest image can be found [here](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use descriptive link text instead of "here".

The link text should be descriptive for accessibility and SEO. Replace "here" with text that describes the link destination.

Apply this diff:

-   - Latest image can be found [here](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2)
+   - Latest image can be found at [opendatahub-io/notebooks latest base image](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Latest image can be found [here](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2)
- Latest image can be found at [opendatahub-io/notebooks latest base image](https://github.com/opendatahub-io/notebooks/blob/main/manifests/base/params-latest.env#L2)
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

59-59: Link text should be descriptive

(MD059, descriptive-link-text)

🤖 Prompt for AI Agents
In images/universal/training/README.md around line 59, the markdown uses
non-descriptive link text "here"; replace that with descriptive link text that
explains the destination (for example "params-latest.env" or "latest base image
parameters") so the link reads like "Latest image can be found in
params-latest.env" (or similar descriptive phrasing) and update the link target
unchanged.

@MStokluska
Copy link
Contributor Author

/retest
Temp build failure.

@MStokluska MStokluska changed the title add th04 image konflux and update base image to latest [WIP] add th04 image konflux and update base image to latest Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant