Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
340c005
bump rhoai image version in prow tests
Mar 27, 2026
a53c2e1
Fix missing closing brace in LLAMA_STACK_IMAGE parameter expansion
are-ces Mar 28, 2026
9105daf
Fix prow pipeline: in-cluster image build, RAG config, port-forward fix
are-ces Mar 29, 2026
dc88586
Add namespace diagnostic logging to prow pipeline
are-ces Mar 30, 2026
8aaee73
Remove namespace from cluster-scoped DataScienceCluster CR
are-ces Apr 9, 2026
6bc8998
Add model/provider override env vars to prow pipeline
are-ces Apr 9, 2026
1f2bebb
Fix prow pipeline: run bootstrap before namespace creation
are-ces Apr 10, 2026
c317499
Hardcode NAMESPACE to avoid Prow env override
are-ces Apr 10, 2026
4e2a4bd
Re-enable secret exports in prow pipeline
are-ces Apr 10, 2026
f032f4a
Add enrichment and RAG restore to prow llama-stack manifest
are-ces Apr 17, 2026
a33aff6
Fix prow e2e pipeline: secret ordering, config path, and test runner
are-ces Apr 23, 2026
3422717
Fix e2e-ops restart failures and mock-jwks port-forward
are-ces Apr 24, 2026
56c70f5
Improve port-forward resilience for Prow E2E tests
are-ces Apr 24, 2026
986c6f3
Fix update-configmap cascade failure and surface oc errors
are-ces Apr 24, 2026
54c289b
Fix verify_connectivity for auth-enabled Prow environments
are-ces Apr 27, 2026
90da0d1
Fix Llama Stack disruption cascade and pipeline port-forward coordina…
are-ces Apr 27, 2026
b033f92
Skip TLS and proxy e2e tests in Prow (no Docker Compose services)
are-ces Apr 28, 2026
aa6ab2e
Fix after_feature AttributeError on hostname_llama
are-ces Apr 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,14 @@ spec:
secretKeyRef:
name: llama-stack-ip-secret
key: key
# Same vars as docker-compose / server-mode YAML (${env.FAISS_VECTOR_STORE_ID} in byok_rag).
- name: FAISS_VECTOR_STORE_ID
valueFrom:
secretKeyRef:
name: faiss-vector-store-secret
key: id
optional: true
- name: KV_RAG_PATH
value: "/app-root/src/.llama/storage/rag/kv_store.db"
image: ${LIGHTSPEED_STACK_IMAGE}
ports:
- containerPort: 8080
Expand Down
205 changes: 205 additions & 0 deletions tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# Llama Stack pod for Prow: uses pre-built image with enrichment + RAG restore.
#
# Requires: ConfigMap llama-stack-config (run.yaml), ConfigMap rag-data (kv_store.db.gz),
# ConfigMap lightspeed-stack-config (lightspeed-stack.yaml).
# Requires: Image built as ${LLAMA_STACK_IMAGE} (set by pipeline.sh).
#
apiVersion: v1
kind: Pod
metadata:
name: llama-stack-service
labels:
pod: llama-stack-service
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
initContainers:
- name: setup-rag-data
image: busybox:latest
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
command:
- /bin/sh
- -c
- |
set -e
mkdir -p /data/src/.llama/storage/rag /data/src/.llama/storage/files /data/.e2e-rag-seed
if [ ! -f /rag-data/kv_store.db.gz ]; then
echo "FATAL: missing /rag-data/kv_store.db.gz"
ls -la /rag-data || true
exit 1
fi
gunzip -c /rag-data/kv_store.db.gz > /data/.e2e-rag-seed/kv_store.db
cp -f /data/.e2e-rag-seed/kv_store.db /data/src/.llama/storage/rag/kv_store.db
chmod -R 777 /data/src /data/.e2e-rag-seed
Comment on lines +33 to +41
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Seed the shared volume at its mounted root, not under /data/src/.llama/storage.

rag-storage is mounted as /data in this init container but as /opt/app-root/src/.llama/storage in the main container. Writing to /data/src/.llama/storage/... therefore lands at /opt/app-root/src/.llama/storage/src/.llama/storage/..., so the init container never prepopulates the path that KV_RAG_PATH points to.

Suggested fix
-          mkdir -p /data/src/.llama/storage/rag /data/src/.llama/storage/files /data/.e2e-rag-seed
+          mkdir -p /data/rag /data/files /data/.e2e-rag-seed
...
-          cp -f /data/.e2e-rag-seed/kv_store.db /data/src/.llama/storage/rag/kv_store.db
-          chmod -R 777 /data/src /data/.e2e-rag-seed
+          cp -f /data/.e2e-rag-seed/kv_store.db /data/rag/kv_store.db
+          chmod -R 777 /data/rag /data/files /data/.e2e-rag-seed
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mkdir -p /data/src/.llama/storage/rag /data/src/.llama/storage/files /data/.e2e-rag-seed
if [ ! -f /rag-data/kv_store.db.gz ]; then
echo "FATAL: missing /rag-data/kv_store.db.gz"
ls -la /rag-data || true
exit 1
fi
gunzip -c /rag-data/kv_store.db.gz > /data/.e2e-rag-seed/kv_store.db
cp -f /data/.e2e-rag-seed/kv_store.db /data/src/.llama/storage/rag/kv_store.db
chmod -R 777 /data/src /data/.e2e-rag-seed
mkdir -p /data/rag /data/files /data/.e2e-rag-seed
if [ ! -f /rag-data/kv_store.db.gz ]; then
echo "FATAL: missing /rag-data/kv_store.db.gz"
ls -la /rag-data || true
exit 1
fi
gunzip -c /rag-data/kv_store.db.gz > /data/.e2e-rag-seed/kv_store.db
cp -f /data/.e2e-rag-seed/kv_store.db /data/rag/kv_store.db
chmod -R 777 /data/rag /data/files /data/.e2e-rag-seed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml` around lines
33 - 41, The init script is seeding under /data/src/.llama/storage which ends up
nested in the main container; instead seed the shared mount root (/data) so the
main container's KV_RAG_PATH sees the files. Update the mkdir, gunzip/cp and
chmod invocations to create and write to /data/.llama/storage/rag and
/data/.llama/storage/files (not /data/src/...), e.g. ensure the script uses
/rag-data/kv_store.db.gz -> /data/.e2e-rag-seed/kv_store.db and then copies that
file to /data/.llama/storage/rag/kv_store.db, and adjust the final chmod to
operate on /data and /data/.e2e-rag-seed so the prepopulated path matches
KV_RAG_PATH.

echo "RAG data extracted successfully"
volumeMounts:
- name: rag-storage
mountPath: /data
- name: rag-data
mountPath: /rag-data
- name: materialize-run-yaml
image: busybox:latest
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
command:
- /bin/sh
- -c
- |
set -e
cp /cm/run.yaml /work/run.yaml
chmod 664 /work/run.yaml
volumeMounts:
- name: config-cm
mountPath: /cm
readOnly: true
- name: rag-storage
mountPath: /work
containers:
- name: llama-stack-container
image: ${LLAMA_STACK_IMAGE}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsNonRoot: true
runAsUser: 1001
seccompProfile:
type: RuntimeDefault
workingDir: /opt/app-root
env:
- name: PYTHONPATH
value: "/opt/app-root/src"
- name: HOME
value: "/opt/app-root/src"
- name: KV_STORE_PATH
value: "/opt/app-root/src/.llama/storage/kv_store.db"
- name: KV_RAG_PATH
value: "/opt/app-root/src/.llama/storage/rag/kv_store.db"
- name: SQL_STORE_PATH
value: "/opt/app-root/src/.llama/storage/sql_store.db"
- name: KSVC_URL
valueFrom:
secretKeyRef:
name: api-url-secret
key: key
- name: VLLM_API_KEY
valueFrom:
secretKeyRef:
name: vllm-api-key-secret
key: key
- name: INFERENCE_MODEL
value: "meta-llama/Llama-3.1-8B-Instruct"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-api-key-secret
key: key
optional: true
- name: E2E_OPENAI_MODEL
value: "gpt-4o-mini"
- name: LLAMA_STACK_CONFIG
value: "/opt/app-root/src/.llama/storage/run.yaml"
- name: FAISS_VECTOR_STORE_ID
valueFrom:
secretKeyRef:
name: faiss-vector-store-secret
key: id
- name: E2E_LLAMA_HOSTNAME
valueFrom:
secretKeyRef:
name: llama-stack-ip-secret
key: key
command:
- /bin/bash
- -c
- |
set -e
RAG_SEED="/opt/app-root/src/.llama/storage/.e2e-rag-seed/kv_store.db"
RAG_CM_GZ="/opt/app-root/rag-data-cm/kv_store.db.gz"
RAG_WORK="${KV_RAG_PATH:-/opt/app-root/src/.llama/storage/rag/kv_store.db}"
restore_rag_seed() {
mkdir -p "$(dirname "$RAG_WORK")"
if [[ -f "$RAG_CM_GZ" ]]; then
RAG_WORK="$RAG_WORK" RAG_CM_GZ="$RAG_CM_GZ" python3 -c 'import gzip, os, shutil, sys; r, g = os.environ["RAG_WORK"], os.environ["RAG_CM_GZ"]; t = r + ".tmp"; i = gzip.open(g, "rb"); o = open(t, "wb"); shutil.copyfileobj(i, o); i.close(); o.close(); sz = os.path.getsize(t); (sz >= 1048576) or (print("FATAL: RAG from ConfigMap too small:", sz, file=sys.stderr) or sys.exit(1)); os.replace(t, r); os.chmod(r, 0o664)' || exit 1
elif [[ -f "$RAG_SEED" ]]; then
cp -f "$RAG_SEED" "$RAG_WORK"
chmod 664 "$RAG_WORK" 2>/dev/null || true
fi
}
restore_rag_seed
INPUT_CONFIG="${LLAMA_STACK_CONFIG:-/opt/app-root/run.yaml}"
ENRICHED_CONFIG="/opt/app-root/run.yaml"
LIGHTSPEED_CONFIG="${LIGHTSPEED_CONFIG:-/opt/app-root/lightspeed-stack.yaml}"
ENV_FILE="/opt/app-root/.env"
if [[ -f "$LIGHTSPEED_CONFIG" ]]; then
echo "Enriching llama-stack config..."
ENRICHMENT_FAILED=0
python3 /opt/app-root/src/llama_stack_configuration.py \
-c "$LIGHTSPEED_CONFIG" \
-i "$INPUT_CONFIG" \
-o "$ENRICHED_CONFIG" \
-e "$ENV_FILE" 2>&1 || ENRICHMENT_FAILED=1
if [[ -f "$ENV_FILE" ]]; then
set -a && . "$ENV_FILE" && set +a
fi
if [[ -f "$ENRICHED_CONFIG" ]] && [[ "$ENRICHMENT_FAILED" -eq 0 ]]; then
echo "Using enriched config: $ENRICHED_CONFIG"
restore_rag_seed
exec llama stack run "$ENRICHED_CONFIG"
fi
fi
echo "Using original config: $INPUT_CONFIG"
restore_rag_seed
exec llama stack run "$INPUT_CONFIG"
ports:
- containerPort: 8321
readinessProbe:
httpGet:
path: /v1/health
port: 8321
initialDelaySeconds: 20
periodSeconds: 5
failureThreshold: 36
livenessProbe:
httpGet:
path: /v1/health
port: 8321
initialDelaySeconds: 120
periodSeconds: 20
failureThreshold: 3
volumeMounts:
- name: rag-storage
mountPath: /opt/app-root/src/.llama/storage
- name: lightspeed-config
mountPath: /opt/app-root/lightspeed-stack.yaml
subPath: lightspeed-stack.yaml
readOnly: true
- name: rag-data
mountPath: /opt/app-root/rag-data-cm
readOnly: true
volumes:
- name: rag-storage
emptyDir: {}
- name: config-cm
configMap:
name: llama-stack-config
- name: lightspeed-config
configMap:
name: lightspeed-stack-config
- name: rag-data
configMap:
name: rag-data
62 changes: 0 additions & 62 deletions tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml

This file was deleted.

1 change: 0 additions & 1 deletion tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
name: default-dsc
namespace: e2e-rhoai-dsc
spec:
serviceMesh:
managementState: Managed
Expand Down
2 changes: 1 addition & 1 deletion tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ spec:
- --port
- "8080"
- --max-model-len
- "2048"
- "32768"
image: quay.io/rh-ee-cpompeia/vllm-cpu:latest
name: kserve-container
env:
Expand Down
2 changes: 1 addition & 1 deletion tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ spec:
- --port
- "8080"
- --max-model-len
- "2048"
- "32768"
- --gpu-memory-utilization
- "0.9"
image: ${VLLM_IMAGE}
Expand Down
31 changes: 17 additions & 14 deletions tests/e2e-prow/rhoai/pipeline-services.sh
Original file line number Diff line number Diff line change
@@ -1,27 +1,30 @@
#!/bin/bash

BASE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
NAMESPACE="${NAMESPACE:-e2e-rhoai-dsc}"

# Deploy llama-stack
envsubst < "$BASE_DIR/manifests/lightspeed/llama-stack.yaml" | oc apply -f -
# Create llama-stack-ip-secret before deploying the pod (it references the secret as an env var)
export E2E_LLAMA_HOSTNAME="llama-stack-service-svc.${NAMESPACE}.svc.cluster.local"
oc create secret generic llama-stack-ip-secret \
--from-literal=key="$E2E_LLAMA_HOSTNAME" \
-n "$NAMESPACE" 2>/dev/null || echo "Secret llama-stack-ip-secret exists"

# Deploy llama-stack (substitute only LLAMA_STACK_IMAGE, leave other ${} intact)
envsubst '${LLAMA_STACK_IMAGE}' < "$BASE_DIR/manifests/lightspeed/llama-stack-prow.yaml" | oc apply -n "$NAMESPACE" -f -

oc wait pod/llama-stack-service \
-n e2e-rhoai-dsc --for=condition=Ready --timeout=600s
-n "$NAMESPACE" --for=condition=Ready --timeout=600s

# Get url address of llama-stack pod
oc label pod llama-stack-service pod=llama-stack-service -n e2e-rhoai-dsc
# Expose llama-stack service
oc label pod llama-stack-service pod=llama-stack-service -n "$NAMESPACE"

oc expose pod llama-stack-service \
--name=llama-stack-service-svc \
--port=8321 \
--type=ClusterIP \
-n e2e-rhoai-dsc

export E2E_LLAMA_HOSTNAME="llama-stack-service-svc.e2e-rhoai-dsc.svc.cluster.local"

oc create secret generic llama-stack-ip-secret \
--from-literal=key="$E2E_LLAMA_HOSTNAME" \
-n e2e-rhoai-dsc || echo "Secret exists"
-n "$NAMESPACE"

# Deploy lightspeed-stack
oc apply -f "$BASE_DIR/manifests/lightspeed/lightspeed-stack.yaml"
# Deploy lightspeed-stack (substitute only LIGHTSPEED_STACK_IMAGE, leave other ${} intact)
LIGHTSPEED_STACK_IMAGE="${LIGHTSPEED_STACK_IMAGE:-quay.io/lightspeed-core/lightspeed-stack:dev-latest}"
export LIGHTSPEED_STACK_IMAGE
envsubst '${LIGHTSPEED_STACK_IMAGE}' < "$BASE_DIR/manifests/lightspeed/lightspeed-stack.yaml" | oc apply -n "$NAMESPACE" -f -
Comment on lines +4 to +30
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make this script fail fast on deployment errors.

This file still runs without set -euo pipefail, so a timed-out oc wait, failed oc expose, or failed manifest apply can be overwritten by a later successful command and pipeline.sh will continue with a half-deployed environment.

Suggested fix
 #!/bin/bash
+set -euo pipefail
 
 BASE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 NAMESPACE="${NAMESPACE:-e2e-rhoai-dsc}"
+: "${LLAMA_STACK_IMAGE:?LLAMA_STACK_IMAGE must be set}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e-prow/rhoai/pipeline-services.sh` around lines 4 - 30, Add fail-fast
behavior by inserting set -euo pipefail near the top (immediately after the
NAMESPACE assignment) so any failing command (envsubst | oc apply, oc wait
pod/llama-stack-service, oc expose pod llama-stack-service, etc.) aborts the
script; also stop swallowing errors on secret creation—replace the silent
redirection and echo around oc create secret llama-stack-ip-secret with a check
that only ignores "already exists" (e.g., try oc create and on failure verify oc
get secret llama-stack-ip-secret returns success, otherwise exit) so unexpected
failures surface immediately.

Loading
Loading