Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -173,3 +173,10 @@ cython_debug/

# PyPI configuration file
.pypirc

# Local Hermeto prefetch (scripts/run_hermeto_fetch_deps.sh)
.hermeto-output/
.hermeto-output-verify-cpu/
.hermeto-output-verify-cuda/
# Local hermetic build simulation (scripts/simulate_hermetic_build.sh)
.hermetic-staging/
117 changes: 117 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Agent notes: Konflux, Hermeto, and hermetic Python lockfiles

This file captures lessons from debugging **prefetch-dependencies / Hermeto** failures (checksum mismatch, “No wheels found”, `pybuild-deps` errors) and aligning **CPU** and **CUDA** requirement generators. Use it when changing `pyproject.toml`, RHOAI indices, or `.tekton` prefetch inputs.

## Hermeto behavior that breaks naive lockfiles

- **Konflux invocation** (shape to mirror locally):
`hermeto --log-level debug --mode strict fetch-deps '<json>' --sbom-output-type spdx --source <repo> --output <dir>`
The pip slice of `<json>` matches `.tekton` `prefetch-input` (see `scripts/hermeto/*.json` and `scripts/run_hermeto_fetch_deps.sh`).

- **PyPI intersection for wheels**
For many binary packages, Hermeto logs lines like:
`using intersection of requirements-file and PyPI-reported checksums`
It then **drops** any wheel whose digest is not in that intersection.

- **Implication for RHOAI**
If a line uses **`package==version`** with **`--index-url`** pointing at RHOAI but the **actual wheel file** is a RHOAI rebuild (e.g. `torch-2.9.0-7-cp312-cp312-linux_x86_64.whl`), PyPI usually **does not** expose the same filename/digest. **Every** candidate can be filtered → `PackageRejected: No wheels found`.

- **What fixes it**
- Prefer **`name @ https://…/exact-file.whl`** plus a **single `--hash=sha256:…`** for bytes you control (pulp or `files.pythonhosted.org`), so Hermeto fetches by URL and verifies the hash **without** requiring a PyPI match for that RHOAI rebuild name.
- For **CPU multi-arch**, the prefetch lists **both** arch-specific requirement fragments; **`Containerfile`** installs **one** fragment according to `TARGETARCH` (see below).

- **Local Hermeto**
**`full-cpu` / `full-cuda`** may fail without **RHSM** client cert paths—Konflux injects `options.ssl` on the **rpm** input; laptops typically do not have `/etc/pki/entitlement/…`. Prefer the **pip slice** for routine validation (see below).

## Local Hermeto validation (pip slice)

**CPU green is not CUDA green.** Konflux runs **separate** prefetch inputs for CPU vs CUDA images. The committed files differ (`requirements.hashes.*` vs `requirements.hashes.*.cuda*`, **`requirements.overrides.txt`** vs **`requirements.overrides.cuda.txt`**), and the generator can put the same package in **wheel** lists for one pipeline and **source** for the other. Validating only **`pip-cpu`** misses failures that appear only in **`pip-cuda`** (and the reverse is possible in principle).

**Hermeto runs `cargo vendor --locked` on extracted Python sdists** that contain Rust (e.g. under `deps/pip/<name>-<ver>/`). That step is **independent of whether you build a CUDA image**—it triggers when the **pip prefetch** delivers an sdist with a `Cargo.toml`/`Cargo.lock` mismatch (example: historical **`jiter` 0.12.x** → `PackageWithCorruptLockfileRejected`). A package resolved as a **manylinux wheel** in CPU can still be an **sdist** on the CUDA requirement split if you do not run **`pip-cuda`**.

**What to run before pushing prefetch or hashed-requirement changes**

| Change touched | Run |
|----------------|-----|
| CPU only (`konflux_requirements.sh`, `requirements.hashes.*` not `.cuda`) | `make hermeto-verify-pip-cpu` |
| CUDA only (`konflux_requirements_cuda.sh`, `*.cuda*`, `requirements.overrides.cuda.txt`) | `make hermeto-verify-pip-cuda` |
| Shared: `pyproject.toml`, `uv.lock`, both overrides files, or both generators | `make hermeto-verify-pip` (CPU **and** CUDA) |

Commands use **strict** Hermeto (same shape as Konflux) and separate output dirs so results are not overwritten:

- `make hermeto-verify-pip-cpu` → **`.hermeto-output-verify-cpu/`**
- `make hermeto-verify-pip-cuda` → **`.hermeto-output-verify-cuda/`**

Ad-hoc: `HERMETO_OUT=/path ./scripts/run_hermeto_fetch_deps.sh pip-cuda`. Generic **`make hermeto-fetch-deps`** still defaults to **`HERMETO_MODE=pip-cpu`**—do not treat that alone as sufficient when CUDA inputs changed.

## CPU pipeline (`scripts/konflux_requirements.sh`)

- **Regenerate; do not hand-edit** hashed requirements. Use `make konflux-requirements` (or the script).

- **`--extra-index-url` in lockfiles**
Hermeto does not support that line in committed files. The script passes PyPI as an extra index **during** `uv pip compile`, then **`sed`** removes `--extra-index-url` from the generated file.

- **Torch / triton (RHOAI) vs Hermeto**
`torch==` / `triton==` under the RHOAI simple index **fail** Hermeto’s PyPI intersection. The generator **strips** those stanzas from `requirements.hashes.wheel.txt` and writes:
- `requirements.hashes.wheel.cpu.x86_64.txt` — pulp URLs + hashes for **torch** and **triton**; **torchvision** from **PyPI manylinux** URLs (so PyPI intersection succeeds).
- `requirements.hashes.wheel.cpu.aarch64.txt` — same pattern for aarch64.
**`Containerfile`** selects one of these via `TARGETARCH` (`amd64` → `x86_64`, `arm64` → `aarch64`).

- **`pylatexenc` on the PyPI wheel file**
Same intersection issue: PyPI digest ≠ RHOAI rebuild `*-8-py3-none-any.whl`. After the PyPI-wheel compile, the script rewrites **`pylatexenc==…`** to a **pulp direct URL** (currently the **cuda12.9-ubi9** artifact; the **3.2/cpu-ubi9** pulp path returned 404 for that filename when checked).

- **`pybuild-deps`**
It cannot use sdists for **`nvidia-*`**, **`torch` / `torchvision` / `triton`**, **`faiss-cpu`** in this layout. The script feeds a filtered temp file to `pybuild-deps` (see script comments).

- **CPU `requirements.hashes.source.txt` vs transitive CUDA wheels**
The resolver can still list **`nvidia-*`** (and related pins) on “PyPI source” lines even when the image installs **CPU** torch from RHOAI. Those packages are **wheel-only / not fetchable as sdists** for Hermeto’s pip input → **`No distributions found`**. **`konflux_requirements.sh`** filters the same set out **before** the `uv pip compile` that writes **`requirements.hashes.source.txt`**, not only before `pybuild-deps`.

- **Tekton / JSON sync**
When `prefetch-input` **`requirements_files`** or **`binary.packages`** change, update **`.tekton/rag-tool-*.yaml`**, **`.tekton/lightspeed-core-rag-content-cpu-f176b-*.yaml`**, and **`scripts/hermeto/prefetch-*.json`** together. The konflux script only rewrites the **`"packages"`** string in some Tekton files via `sed`; it does **not** auto-insert new requirement filenames everywhere.

## CUDA pipeline (`scripts/konflux_requirements_cuda.sh`)

- **Regenerate** with `make konflux-requirements-cuda`.

- **Policy**
RHOAI **CUDA** `torch` is canonical; do not assume PyPI CUDA `torch`’s `nvidia-*` graph applies. See `README.md` (CUDA / RHOAI / `list_wheel_requires_dist.py`).

- **`hf-xet` (CUDA x86 and hermetic install)**
**`hf-xet` 1.3+ / 1.4.x** sdists use **Rust `edition2024`**, which **Cargo ~1.84** in UBI-based build images does not support. If the CUDA wheel requirement file resolves **`hf-xet>=1.2.0`** to **1.4.x** and pip ever uses the **sdist**, metadata/build fails with *“feature `edition2024` is required”*. **Do not try to build it:** pin **`hf-xet==1.2.0`** in **`requirements.overrides.cuda.txt`**, keep **`hf-xet`** in **`PYPI_WHEELS`**, and run the same **force `1.2.0`** step before the PyPI **`--only-binary`** compile as in **`konflux_requirements.sh`** (`requirements.hashes.wheel.pypi.cuda.base.txt` must only carry **1.2.0** wheel hashes). **`huggingface_hub`** remains usable with the older wheel.

- **Never install PyPI `nvidia-*` packages in the CUDA image**
Do **not** add **`nvidia-cublas-cu12`**, **`nvidia-cudnn-cu12`**, or any other **`nvidia-*`** wheels from PyPI to hashed requirements or prefetch. RHOAI **`torch`** already ships the CUDA stack it expects; pulling the separate PyPI **`nvidia-*`** graph causes **version skew, duplicate libraries, broken `torch`, and Hermeto/prefetch failures** (`No distributions found` when strict mode disagrees with the lockfile shape). The CUDA generator strips every **`nvidia-*==…`** stanza from **`requirements.hashes.wheel.pypi.cuda.base.txt`** after the PyPI wheel compile (even when `uv` expands PyPI CUDA **`torch`** and injects those lines).

- **CUDA-specific mechanics**
- Strip **`torch` / `torchvision` / `triton`** from the **second** PyPI-only compile input so PyPI CUDA torch does not pull **`nvidia-*`** into that file; any **`nvidia-*`** that still appear in the compiled wheel hash file are removed before emitting **`.base.txt` / arch fragments**.
- **Sdist-only on PyPI**: pins are moved back to the RHOAI wheel file so `--only-binary :all:` can still run.
- **`antlr4-python3-runtime`**: PyPI has no usable wheel for omegaconf’s constraints; inject **pulp URL** + fixed stanza/hash.
- **`pylatexenc`**: pulp URL + hash on the appropriate file (same Hermeto intersection issue as CPU).
- **`pybuild-deps`**: filtered input excludes wheel-only / problematic packages (see script).
- **Wheel layout**: `requirements.hashes.wheel.pypi.cuda.base.txt` plus **`.x86_64.txt` / `.aarch64.txt`** — **not** a single `requirements.hashes.wheel.pypi.cuda.txt` (some older Tekton snippets may still be wrong; **c0ec3** YAMLs are the reference).
- **`jiter` / Hermeto `cargo vendor --locked`**: older **`jiter==0.12.x`** sdists shipped a **`Cargo.lock`** out of sync with **`Cargo.toml`**, so prefetch fails with **`PackageWithCorruptLockfileRejected`**. Fix: pin **`jiter==0.13.0`** in **`requirements.overrides.txt`** and **`requirements.overrides.cuda.txt`**, and list **`jiter`** in **`PYPI_WHEELS`** (CPU and CUDA scripts) so resolver emits **manylinux wheels** instead of sdists.

- **rag-tool-cuda Tekton**
If prefetch lists the wrong CUDA wheel filenames, fix them to match **`Containerfile-cuda`** and **`lightspeed-core-rag-content-c0ec3-*`**.

## Operational checklist after dependency changes

1. Run **`make konflux-requirements`** and/or **`make konflux-requirements-cuda`**.
2. Commit generated **`requirements.hashes.*`**, **`requirements-build*.txt`**, new **`requirements.hashes.wheel.cpu.*.txt`** when CPU script emits them, **`Containerfile`** if install paths change, and **`.tekton` / `scripts/hermeto`** if prefetch inputs change.
3. Run Hermeto pip slice checks per the **Local Hermeto validation** table (**`make hermeto-verify-pip`** when both CPU and CUDA inputs may have moved).
4. If RHOAI **rebuilds** wheels (new `-*-` segment in filenames), update **pulp URLs / hashes** in the generator constants (and re-run Hermeto).

## Local hermetic container simulation

Konflux mounts prefetched content at **`/cachi2`** and applies Hermeto **`project_files`** (substituting **`file://${output_dir}/…`** paths) before the image build. To approximate that with **`Containerfile`** / **`Containerfile-cuda`**:

1. Run Hermeto (**`make hermeto-verify-pip-cpu`** / **`hermeto-verify-pip-cuda`**, or set **`HERMETO_OUT`**) so outputs land in **`.hermeto-output-verify-*`** or a directory of your choice.
Comment thread
coderabbitai[bot] marked this conversation as resolved.
2. **`./scripts/stage_hermetic_build_context.sh`** — copies **`deps/`** to **`.hermetic-staging/cachi2/output/`**, writes **`cachi2.env`** (**`PIP_FIND_LINKS=/cachi2/output/deps/pip`**), and writes **`.hermetic-staging/patched-requirements/`** from **`.build-config.json`** **`project_files`** with **`/cachi2/output`**.
3. **`./scripts/simulate_hermetic_build.sh cpu`** (or **`cuda`**) — generates **`.hermetic-staging/Containerfile.sim.*`** (early **`COPY`** of **`cachi2`**, overlay **`COPY`** of patched requirements) and runs **`podman`/`docker` `build`**.
4. Set **`NETWORK_MODE=none`** only if early **`RUN`** layers are already cached or RPMs/gems are prefetched like on Konflux; **`pip-*`** Hermeto runs do not ship **`deps/generic/model.safetensors`** — use **`full-cpu`**, **`--model`**, **`embeddings_model/`**, or **`ALLOW_PLACEHOLDER_HERMETIC_MODEL=1`** for a build-only stub.
5. For **CUDA**, point **`HERMETO_OUT`** at a directory produced by **`pip-cuda`** before **`simulate_hermetic_build.sh cuda`**.

Comment thread
coderabbitai[bot] marked this conversation as resolved.
## References

- [Hermeto](https://github.com/hermetoproject/hermeto) — prefetch CLI and container image.
- In-repo: `scripts/konflux_requirements.sh`, `scripts/konflux_requirements_cuda.sh`, `scripts/run_hermeto_fetch_deps.sh`, **`Makefile`** targets **`hermeto-verify-pip-*`**, `scripts/stage_hermetic_build_context.sh`, `scripts/simulate_hermetic_build.sh`, `scripts/hermeto/*.json`, `README.md` (Konflux / CUDA sections).
14 changes: 14 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,20 @@ konflux-requirements: ## generate hermetic requirements.*.txt file and gemfile.l
./scripts/konflux_requirements.sh
bundle _2.2.33_ lock --add-platform aarch64-linux

HERMETO_MODE ?= pip-cpu
hermeto-fetch-deps: ## run Hermeto prefetch (HERMETO_MODE=pip-cpu|pip-cuda|full-cpu|full-cuda); podman/docker + network
@./scripts/run_hermeto_fetch_deps.sh "$(HERMETO_MODE)"
Comment on lines +109 to +110
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Declare hermeto-fetch-deps as phony to avoid accidental no-op.

If a file/directory named hermeto-fetch-deps exists, make can skip this recipe.

Suggested patch
-.PHONY: hermeto-verify-pip-cpu hermeto-verify-pip-cuda hermeto-verify-pip
+.PHONY: hermeto-fetch-deps hermeto-verify-pip-cpu hermeto-verify-pip-cuda hermeto-verify-pip

Also applies to: 115-115

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 109 - 110, The Makefile target hermeto-fetch-deps
should be declared phony so make won't treat an identically named file/dir as
up-to-date; add a .PHONY declaration that includes hermeto-fetch-deps (and
mirror the same change for the other similar Hermeto target noted in the review)
so these recipe names are always executed regardless of filesystem entries.


# Pip-slice checks against committed lockfiles (mirrors Konflux prefetch-dependencies for Python).
# CPU success does not imply CUDA: different requirements.*.cuda.txt, overrides.cuda.txt, and wheel/source splits
# (e.g. Hermeto runs cargo vendor --locked on some sdists only seen in one pipeline).
.PHONY: hermeto-verify-pip-cpu hermeto-verify-pip-cuda hermeto-verify-pip
hermeto-verify-pip-cpu: ## Hermeto strict fetch-deps pip-cpu → .hermeto-output-verify-cpu
HERMETO_OUT="$(CURDIR)/.hermeto-output-verify-cpu" ./scripts/run_hermeto_fetch_deps.sh pip-cpu
hermeto-verify-pip-cuda: ## Hermeto strict fetch-deps pip-cuda → .hermeto-output-verify-cuda
HERMETO_OUT="$(CURDIR)/.hermeto-output-verify-cuda" ./scripts/run_hermeto_fetch_deps.sh pip-cuda
hermeto-verify-pip: hermeto-verify-pip-cpu hermeto-verify-pip-cuda ## both pip slices (run before pushing Konflux prefetch changes)

konflux-rpm-lock: ## generate rpm.lock.yaml file for konflux build
./scripts/generate-rpm-lock.sh

Expand Down
42 changes: 42 additions & 0 deletions scripts/gen_containerfile_hermetic_sim.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/usr/bin/env bash
# Generate a Containerfile that injects staged /cachi2 and Hermeto-patched requirement fragments.
#
# Usage:
# ./scripts/gen_containerfile_hermetic_sim.sh cpu|cuda > .hermetic-staging/Containerfile.sim
#
set -euo pipefail

ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
# Paths must be relative to build context (repo root); keep staging under the repo.
STAGING_REL=".hermetic-staging"

[[ "${1:-}" == cpu || "${1:-}" == cuda ]] || {
echo "usage: $0 cpu|cuda" >&2
exit 1
}

if [[ "$1" == cpu ]]; then
base="$ROOT/Containerfile"
else
base="$ROOT/Containerfile-cuda"
fi

awk -v staging="$STAGING_REL" '
/^USER root$/ {
print
print ""
print "# Injected by gen_containerfile_hermetic_sim.sh (local hermetic simulation)"
print "COPY " staging "/cachi2 /cachi2"
next
}
{ print }
' "$base" | awk -v staging="$STAGING_REL" '
$0 ~ /^COPY LICENSE \/licenses\/LICENSE$/ {
print
print ""
print "# Hermeto project_files (file:// wheel paths) overlay"
print "COPY " staging "/patched-requirements/ ./"
next
}
{ print }
'
Comment on lines +24 to +42
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/usr/bin/env bash
# Verify anchor availability in base containerfiles used by simulation.
set -euo pipefail

for f in Containerfile Containerfile-cuda; do
  echo "== $f =="
  rg -n '^USER root$|^COPY LICENSE /licenses/LICENSE$' "$f" || true
done

Repository: lightspeed-core/rag-content

Length of output: 237


🏁 Script executed:

# First, let's read the gen_containerfile_hermetic_sim.sh script to understand the full context
cat -n scripts/gen_containerfile_hermetic_sim.sh | head -60

Repository: lightspeed-core/rag-content

Length of output: 1416


🏁 Script executed:

# Find all Containerfile variants in the repository
find . -name 'Containerfile*' -type f

Repository: lightspeed-core/rag-content

Length of output: 106


🏁 Script executed:

# Check if there are any other base containerfiles or references
rg -l 'Containerfile|containerfile' --type sh --type yaml --type yml | head -20

Repository: lightspeed-core/rag-content

Length of output: 98


🏁 Script executed:

# Show the exact lines with the anchors in Containerfile
rg -n -A 2 -B 2 '^USER root$|^COPY LICENSE /licenses/LICENSE$' Containerfile

Repository: lightspeed-core/rag-content

Length of output: 273


🏁 Script executed:

# Also check Containerfile-gpu to see if it's the cuda variant
head -30 Containerfile-gpu

Repository: lightspeed-core/rag-content

Length of output: 974


🏁 Script executed:

# Check git log for Containerfile changes
git log --oneline --all -- '*Containerfile*' | head -20

Repository: lightspeed-core/rag-content

Length of output: 1025


🏁 Script executed:

# Check if there's documentation about the naming
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "README*" \) -exec rg -l 'Containerfile' {} \;

Repository: lightspeed-core/rag-content

Length of output: 94


🏁 Script executed:

# Check the recent commit that added the gen_containerfile_hermetic_sim.sh
git show 729494d --stat | head -40

Repository: lightspeed-core/rag-content

Length of output: 2368


🏁 Script executed:

# Check the README for any notes about the GPU/CUDA variant
grep -A 5 -B 5 -i 'cuda\|gpu' README.md

Repository: lightspeed-core/rag-content

Length of output: 654


Add validation to fail fast when injection anchors are missing.

The awk pipeline silently succeeds even if the anchor patterns (lines 25 and 34) are never matched, potentially producing non-hermetic Containerfiles without warning.

Additionally, there's a filename bug on line 21: the script references Containerfile-cuda but the actual file is Containerfile-gpu. The GPU variant is also missing both required anchors (USER root and COPY LICENSE /licenses/LICENSE), which would need to be added before this works.

The proposed grep validation checks address the fail-fast concern:

Proposed additions
+grep -q '^USER root$' "$base" || {
+  echo "error: missing anchor 'USER root' in $base" >&2
+  exit 1
+}
+grep -q '^COPY LICENSE /licenses/LICENSE$' "$base" || {
+  echo "error: missing anchor 'COPY LICENSE /licenses/LICENSE' in $base" >&2
+  exit 1
+}
+
 awk -v staging="$STAGING_REL" '
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/gen_containerfile_hermetic_sim.sh` around lines 24 - 42, The script
silently proceeds when the injection anchors aren't present and references the
wrong GPU Containerfile name; update the code that sets/uses the base filename
(replace Containerfile-cuda with Containerfile-gpu or make it conditional) and
add explicit existence/anchor checks before running the awk pipeline: verify the
base file exists, grep for the "USER root" anchor and the "COPY LICENSE
/licenses/LICENSE" anchor in the selected Containerfile and exit non‑zero with a
clear error if either is missing so the script fails fast when anchors are not
found.

34 changes: 34 additions & 0 deletions scripts/hermeto/prefetch-konflux-cpu.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
[
{
"type": "generic",
"path": ".",
"lockfile": "artifacts.lock.yaml"
},
{
"type": "rpm",
"path": ".",
"include_summary_in_sbom": true
},
{
"type": "pip",
"path": ".",
"requirements_files": [
"requirements.hashes.wheel.txt",
"requirements.hashes.wheel.cpu.x86_64.txt",
"requirements.hashes.wheel.cpu.aarch64.txt",
"requirements.hashes.wheel.pypi.txt",
"requirements.hashes.source.txt",
"requirements.hermetic.txt"
],
"requirements_build_files": ["requirements-build.txt"],
"binary": {
"packages": "torch,torchvision,triton,uv-build,uv,pip,maturin,opencv-python,omegaconf,rapidocr,sqlite-vec,griffe,griffecli,griffelib,pyclipper,tree-sitter-typescript,hf-xet,psycopg2-binary,docling-parse,pypdf,pypdfium2,aiohappyeyeballs,aiohttp,aiosignal,aiosqlite,annotated-doc,annotated-types,anyio,asyncpg,beautifulsoup4,cffi,click,colorama,cryptography,dataclasses-json,defusedxml,distro,et-xmlfile,faiss-cpu,filetype,fire,frozenlist,greenlet,h11,httpcore,httpx,httpx-sse,idna,jinja2,jiter,joblib,jsonlines,jsonref,jsonschema-specifications,lxml,markdown-it-py,markupsafe,mdurl,mpire,mpmath,mypy-extensions,nest-asyncio,networkx,numpy,openpyxl,packaging,pandas,pillow,pluggy,prompt-toolkit,propcache,pydantic,pydantic-core,pygments,pylatexenc,python-dateutil,python-docx,python-pptx,pyyaml,referencing,requests,rpds-py,rtree,safetensors,scikit-learn,scipy,setuptools,shapely,shellingham,six,sniffio,sympy,threadpoolctl,tiktoken,tokenizers,transformers,typing-extensions,typing-inspect,typing-inspection,tzdata,xlsxwriter,zipp",
"os": "linux",
"arch": "x86_64,aarch64",
"py_version": 312
}
},
{
"type": "bundler"
}
]
33 changes: 33 additions & 0 deletions scripts/hermeto/prefetch-konflux-cuda.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
[
{
"type": "generic",
"path": ".",
"lockfile": "artifacts.lock.yaml"
},
{
"type": "rpm",
"path": "cuda"
},
Comment on lines +8 to +10
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/usr/bin/env bash
# Verify required paths/files for CUDA full manifest.
set -euo pipefail

echo "== Check cuda directory =="
fd -HI '^cuda$' -td

echo "== Check CUDA build requirements file =="
fd -HI '^requirements-build\.cuda\.txt$'

echo "== Compare CPU vs CUDA konflux manifest fields =="
rg -n '"type": "rpm"|"path": ".*"|"requirements_build_files"|include_summary_in_sbom' \
  scripts/hermeto/prefetch-konflux-cpu.json \
  scripts/hermeto/prefetch-konflux-cuda.json

Repository: lightspeed-core/rag-content

Length of output: 985


CUDA manifest has blocking path and file mismatches that will cause prefetch to fail.

Line 9 uses path: "cuda" (which doesn't exist) and Line 22 uses requirements-build.cuda.txt (which doesn't exist). The CPU manifest pattern (Lines 9 and 23) correctly uses repo root "." with requirements-build.txt. Additionally, the CUDA manifest is missing the include_summary_in_sbom: true field present in the CPU manifest.

Required fix
   {
     "type": "rpm",
-    "path": "cuda"
+    "path": ".",
+    "include_summary_in_sbom": true
   },
@@
-    "requirements_build_files": ["requirements-build.cuda.txt"],
+    "requirements_build_files": ["requirements-build.txt"],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/hermeto/prefetch-konflux-cuda.json` around lines 8 - 10, In the CUDA
manifest block replace the incorrect repo path "cuda" with the repo root "."
(matching the CPU manifest), change the non-existent requirements-build.cuda.txt
reference to requirements-build.txt, and add the missing
include_summary_in_sbom: true field (same placement as in the CPU manifest) so
the prefetch can find files and include the SBOM summary.

{
"type": "pip",
"path": ".",
"requirements_files": [
"requirements.hashes.wheel.cuda.txt",
"requirements.hashes.wheel.pypi.cuda.base.txt",
"requirements.hashes.wheel.pypi.cuda.x86_64.txt",
"requirements.hashes.wheel.pypi.cuda.aarch64.txt",
"requirements.hashes.source.cuda.txt",
"requirements.hermetic.txt"
],
"requirements_build_files": ["requirements-build.cuda.txt"],
"binary": {
"packages": "triton,pylatexenc,uv-build,uv,pip,maturin,cmake,opencv-python,omegaconf,rapidocr,sqlite-vec,griffe,griffecli,griffelib,pyclipper,tree-sitter-typescript,hf-xet,docling-parse,torch,torchvision,psycopg2-binary,faiss-cpu,llama-index-vector-stores-faiss,pypdf,pypdfium2,jiter,aiohappyeyeballs,aiohttp,aiosignal,beautifulsoup4,click,defusedxml,distro,filetype,frozenlist,h11,httpx,idna,jinja2,jsonschema,lxml,marko,networkx,numpy,openpyxl,pandas,pillow,pluggy,prompt-toolkit,propcache,pydantic,python-docx,python-pptx,pyyaml,requests,rtree,scipy,setuptools,sniffio,sympy,termcolor,tiktoken,tomlkit,typing-extensions,urllib3",
"os": "linux",
"arch": "x86_64,aarch64",
"py_version": 312
}
},
{
"type": "bundler"
}
]
21 changes: 21 additions & 0 deletions scripts/hermeto/prefetch-pip-cpu.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[
{
"type": "pip",
"path": ".",
"requirements_files": [
"requirements.hashes.wheel.txt",
"requirements.hashes.wheel.cpu.x86_64.txt",
"requirements.hashes.wheel.cpu.aarch64.txt",
"requirements.hashes.wheel.pypi.txt",
"requirements.hashes.source.txt",
"requirements.hermetic.txt"
],
"requirements_build_files": ["requirements-build.txt"],
"binary": {
"packages": "torch,torchvision,triton,uv-build,uv,pip,maturin,opencv-python,omegaconf,rapidocr,sqlite-vec,griffe,griffecli,griffelib,pyclipper,tree-sitter-typescript,hf-xet,psycopg2-binary,docling-parse,pypdf,pypdfium2,aiohappyeyeballs,aiohttp,aiosignal,aiosqlite,annotated-doc,annotated-types,anyio,asyncpg,beautifulsoup4,cffi,click,colorama,cryptography,dataclasses-json,defusedxml,distro,et-xmlfile,faiss-cpu,filetype,fire,frozenlist,greenlet,h11,httpcore,httpx,httpx-sse,idna,jinja2,jiter,joblib,jsonlines,jsonref,jsonschema-specifications,lxml,markdown-it-py,markupsafe,mdurl,mpire,mpmath,mypy-extensions,nest-asyncio,networkx,numpy,openpyxl,packaging,pandas,pillow,pluggy,prompt-toolkit,propcache,pydantic,pydantic-core,pygments,pylatexenc,python-dateutil,python-docx,python-pptx,pyyaml,referencing,requests,rpds-py,rtree,safetensors,scikit-learn,scipy,setuptools,shapely,shellingham,six,sniffio,sympy,threadpoolctl,tiktoken,tokenizers,transformers,typing-extensions,typing-inspect,typing-inspection,tzdata,xlsxwriter,zipp",
"os": "linux",
"arch": "x86_64,aarch64",
"py_version": 312
}
}
]
Loading
Loading