test(evals): expand eval_data Q&A coverage #279

ItzikEzra-rh · 2026-01-14T11:45:10Z

Added 51 evaluation queries across 47 conversation groups.
In five local runs, the suite achieved a 96–100% pass rate.

Summary by CodeRabbit

Tests
- Added a large set of new conversational test cases covering cluster creation follow-ups, operator handling (including virtualization), platform/architecture queries (including s390x), VIP/configuration scenarios, Day‑1/Day‑2 operations, cloud boundaries, safety/refusal edge cases, and numerous boundary tests.
- Added an automated verification script to query the service and determine the latest stable OpenShift release.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

openshift-ci · 2026-01-14T11:45:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ItzikEzra-rh
Once this PR has been reviewed and has the lgtm label, please assign maorfr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-01-14T11:45:21Z

Hi @ItzikEzra-rh. Thanks for your PR.

I'm waiting for a rh-ecosystem-edge member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-01-14T11:45:22Z

Walkthrough

Adds many new evaluation conversation groups to test/evals/eval_data.yaml (cluster flows, operators, platform/version/boundary cases, post‑install checks) and a new script test/scripts/verify_latest_version.sh that queries Assisted Service for the latest stable OpenShift version.

Changes

Cohort / File(s)	Summary
Test evaluation data `test/evals/eval_data.yaml`	Added ~396 lines of new `conversation_group` entries covering cluster creation follow-ups, operator add/validation, platform/CPU/version queries (including s390x), VIP/auth/API boundaries, Day‑1/Day‑2 ops, destructive-action checks, expected intents, tool calls, verification & cleanup scripts. Review expected_intent/tool call accuracy and embedded scripts.
Verification script `test/scripts/verify_latest_version.sh`	New Bash script (sources `test/scripts/common.sh`) that fetches Assisted Service OpenShift versions, filters pre-releases (-fc, -rc, -ec, -nightly, -ci), semver-sorts, and prints the latest stable version; includes basic error handling. Review shell robustness and API/response parsing assumptions.

Sequence Diagram(s)

(No sequence diagrams generated — changes are primarily test data additions and a small utility script without substantial multi-component control flow requiring visualization.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

eval: data structure as per multi-turn datamodel #72 — Adds many new conversation_group entries to test/evals/eval_data.yaml using the same multi-turn structure.
MGMT-21240: improve eval test coverage #182 — Expands conversation_group-based eval tests including tool-driven flows and version/listing evals.
MGMT-21690: chatbot installation asks again for host and cluster IDs #195 — Adds evaluation test cases related to latest-version resolution and host-discovery scenarios in test/evals/eval_data.yaml.

Suggested labels

lgtm

Suggested reviewers

omertuc
carbonin

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: expanding evaluation data with new Q&A test cases for the eval_data.yaml file.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

eliajahshan · 2026-01-15T13:34:07Z

/lgtm

openshift-ci · 2026-01-15T13:34:10Z

@eliajahshan: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

eranco74 · 2026-01-15T13:44:25Z

/ok-to-test

test/evals/eval_data.yaml

- Add empty lines between conversation_groups for consistency - Create multi-turn conversation for cluster creation + follow-up queries - Update ask_about_platform_none with VIP differences question - Add list_versions_for_env follow-up query - Add verify_latest_version.sh script for version validation

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@test/evals/eval_data.yaml`:
- Around line 749-752: The test's eval_query contains an incorrectly formatted
UUID (last segment has 10 hex chars) which can make the test fail for format
validation instead of intent handling; update the cluster ID in the eval_query
string to a syntactically valid UUID (e.g., change
12345678-1234-1234-1234-1234567890 to a 36-char UUID such as
12345678-1234-1234-1234-1234567890ab) so the test exercises the intended
non-existent-cluster boundary case rather than a malformed-ID error.

🧹 Nitpick comments (2)

test/evals/eval_data.yaml (2)
579-593: Consider adding a verification script for version filtering.

The test expects the response to list versions "compatible with the s390x CPU architecture," but there's no verification script to confirm the returned versions actually support s390x. A past review comment suggested adding validation to check that the version list fits the CPU architecture.

595-608: Consider integrating verify_latest_version.sh for validation.

The PR adds test/scripts/verify_latest_version.sh which appears designed to support this test case. However, the test doesn't use eval_verify_script to validate that the returned version is actually the latest stable version. Consider adding:
eval_verify_script: ../scripts/verify_latest_version.sh
This would ensure the response is validated against the actual latest stable version from the API, rather than relying solely on intent matching.

coderabbitai · 2026-01-18T13:42:47Z

test/evals/eval_data.yaml

+    eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890?
+    eval_types:
+    - response_eval:intent
+    expected_intent: Declining the request and explaining that the Assisted Installer is used for initial cluster deployment (Day 1) and cannot be used to modify a running cluster (Day 2 operations).


⚠️ Potential issue | 🟡 Minor

Verify cluster ID format in test query.

The cluster ID 12345678-1234-1234-1234-1234567890 in the boundary test query is missing a digit (should be 12 hex characters in the last segment, but has only 10). While this is intentionally testing a boundary case with a non-existent cluster, an invalid UUID format might cause the test to fail for the wrong reason.

Suggested fix

- eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890? + eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-123456789012?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890?

eval_types:

- response_eval:intent

expected_intent: Declining the request and explaining that the Assisted Installer is used for initial cluster deployment (Day 1) and cannot be used to modify a running cluster (Day 2 operations).

eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-123456789012?

eval_types:

- response_eval:intent

expected_intent: Declining the request and explaining that the Assisted Installer is used for initial cluster deployment (Day 1) and cannot be used to modify a running cluster (Day 2 operations).

🤖 Prompt for AI Agents

In `@test/evals/eval_data.yaml` around lines 749 - 752, The test's eval_query contains an incorrectly formatted UUID (last segment has 10 hex chars) which can make the test fail for format validation instead of intent handling; update the cluster ID in the eval_query string to a syntactically valid UUID (e.g., change 12345678-1234-1234-1234-1234567890 to a 36-char UUID such as 12345678-1234-1234-1234-1234567890ab) so the test exercises the intended non-existent-cluster boundary case rather than a malformed-ID error.

- Add empty lines between conversation_groups for consistency - Create multi-turn conversation for cluster creation + follow-up queries - Update ask_about_platform_none with VIP differences question - Add list_versions_for_env follow-up query - Add verify_latest_version.sh script for version validation

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@test/scripts/verify_latest_version.sh`:
- Around line 12-14: The fetch_versions function uses curl without timeouts,
which can hang CI; update the curl invocation in fetch_versions to add
connection and total timeouts (e.g., --connect-timeout and --max-time) and a
small retry (e.g., --retry and --retry-delay) so network stalls fail fast and
deterministically when calling "${ASSISTED_SERVICE_URL}/openshift-versions" with
Authorization: Bearer ${OCM_TOKEN}; keep the same output piping to jq and
existing empty-response handling.

coderabbitai · 2026-01-18T14:03:12Z

test/scripts/verify_latest_version.sh

+fetch_versions() {
+    curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" \
+        "${ASSISTED_SERVICE_URL}/openshift-versions" | jq -r 'keys[]'


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the file in question find . -name "verify_latest_version.sh" -type f 2>/dev/null | head -5

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 114

🏁 Script executed:

# Get the actual content around the mentioned lines if [ -f "test/scripts/verify_latest_version.sh" ]; then cat -n "test/scripts/verify_latest_version.sh" | head -30 fi

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 1067

🏁 Script executed:

# Search for other curl usages in test scripts to see if there's a pattern rg "curl" test/scripts/ -n --context 2 | head -50

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 3343

🏁 Script executed:

# Check if this script is used in CI/CD pipelines find . -name "*.yml" -o -name "*.yaml" -o -name ".github" -type d 2>/dev/null | xargs grep -l "verify_latest_version" 2>/dev/null || echo "No CI references found"

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 97

🏁 Script executed:

# Look for any timeout configurations in similar test scripts rg "(timeout|--connect-timeout|--max-time)" test/scripts/ -n || echo "No timeouts found in test scripts"

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 108

🏁 Script executed:

# Check the complete verify_latest_version.sh file cat -n "test/scripts/verify_latest_version.sh"

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 1790

🏁 Script executed:

# Look for CI/CD configuration files find . -type d $ -name ".github" -o -name ".gitlab-ci" $ 2>/dev/null ls -la | grep -E "\.(yml|yaml)$|\.github"

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 213

🏁 Script executed:

# Check if there's a Makefile or similar with timeout settings find . -maxdepth 2 -name "Makefile" -o -name "*.mk" | xargs cat 2>/dev/null | head -50

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 2116

🏁 Script executed:

# Look at common.sh to see if there are any utility functions for timeouts cat -n "test/scripts/common.sh" | head -60

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 2538

Add curl timeouts to prevent CI hangs.

A stalled network call (DNS timeout, connection hang, or slow/unresponsive server) would block the job indefinitely. The current curl lacks timeout protections, and error handling on lines 34-37 only catches empty responses, not hangs. Add connect and max-time timeouts with a small retry to make failures deterministic and improve test reliability.

🔧 Suggested change

fetch_versions() { - curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" \ + curl -sSf --connect-timeout 10 --max-time 30 --retry 3 --retry-delay 2 \ + -H "Authorization: Bearer ${OCM_TOKEN}" \ "${ASSISTED_SERVICE_URL}/openshift-versions" | jq -r 'keys[]' }

🤖 Prompt for AI Agents

In `@test/scripts/verify_latest_version.sh` around lines 12 - 14, The fetch_versions function uses curl without timeouts, which can hang CI; update the curl invocation in fetch_versions to add connection and total timeouts (e.g., --connect-timeout and --max-time) and a small retry (e.g., --retry and --retry-delay) so network stalls fail fast and deterministically when calling "${ASSISTED_SERVICE_URL}/openshift-versions" with Authorization: Bearer ${OCM_TOKEN}; keep the same output piping to jq and existing empty-response handling.

- Add empty lines between conversation_groups for consistency - Create multi-turn conversation for cluster creation + follow-up queries - Update ask_about_platform_none with VIP differences question - Add list_versions_for_env follow-up query - Add verify_latest_version.sh script for version validation

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@test/evals/eval_data.yaml`:
- Around line 552-559: The YAML has incorrect indentation causing `conversation`
and its nested items (`- eval_id`, `eval_query`, `eval_types`,
`expected_intent`) to be parsed as top-level entries; fix by indenting the list
under `conversation:` and the `eval_types:` list items so they are children of
the `conversation` mapping (e.g., make `- eval_id: add_operator_virtualization`
and its `eval_query`, `eval_types` and `expected_intent` lines indented one
level under `conversation:`), and apply the same indentation pattern to all
subsequent blocks that follow this structure.

♻️ Duplicate comments (1)

test/evals/eval_data.yaml (1)
749-751: Malformed UUID in boundary test query.

The cluster ID ends with only 10 hex characters, so it may fail format validation rather than the intended boundary behavior.
Suggested fix
-  eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890?
+  eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890ab?

coderabbitai · 2026-01-18T14:11:56Z

test/evals/eval_data.yaml

+- conversation_group: add_operator_virtualization
+  description: Tests adding the OpenShift Virtualization operator.
+  conversation:
+  - eval_id: add_operator_virtualization
+    eval_query: I want to run VMs on my cluster. How do I enable that?
+    eval_types:
+    - response_eval:intent
+    expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.


⚠️ Potential issue | 🔴 Critical

Fix YAML indentation for conversation and nested lists (currently parsed as top-level).

As written, the list item under conversation: and the eval_types list are not indented, which makes the YAML invalid for this group and likely many of the following groups. Please indent list items under their parent keys and apply the same fix to the subsequent newly added blocks that follow this pattern.

✅ Proposed fix (apply same pattern to following groups)

- conversation: - - eval_id: add_operator_virtualization - eval_query: I want to run VMs on my cluster. How do I enable that? - eval_types: - - response_eval:intent - expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided. + conversation: + - eval_id: add_operator_virtualization + eval_query: I want to run VMs on my cluster. How do I enable that? + eval_types: + - response_eval:intent + expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- conversation_group: add_operator_virtualization

description: Tests adding the OpenShift Virtualization operator.

conversation:

- eval_id: add_operator_virtualization

eval_query: I want to run VMs on my cluster. How do I enable that?

eval_types:

- response_eval:intent

expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.

- conversation_group: add_operator_virtualization

description: Tests adding the OpenShift Virtualization operator.

conversation:

- eval_id: add_operator_virtualization

eval_query: I want to run VMs on my cluster. How do I enable that?

eval_types:

- response_eval:intent

expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.

🤖 Prompt for AI Agents

In `@test/evals/eval_data.yaml` around lines 552 - 559, The YAML has incorrect indentation causing `conversation` and its nested items (`- eval_id`, `eval_query`, `eval_types`, `expected_intent`) to be parsed as top-level entries; fix by indenting the list under `conversation:` and the `eval_types:` list items so they are children of the `conversation` mapping (e.g., make `- eval_id: add_operator_virtualization` and its `eval_query`, `eval_types` and `expected_intent` lines indented one level under `conversation:`), and apply the same indentation pattern to all subsequent blocks that follow this structure.

openshift-ci · 2026-01-18T14:35:13Z

@ItzikEzra-rh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/local-dev-test	`f9f69a5`	link	false	`/test local-dev-test`
ci/prow/eval-test	`f9f69a5`	link	false	`/test eval-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

test(evals): expand eval_data Q&A coverage

895ccf2

openshift-ci bot requested review from carbonin and omertuc January 14, 2026 11:45

openshift-ci bot added the needs-ok-to-test label Jan 14, 2026

test(evals): normalize eval_id naming

73be4ca

openshift-ci bot added ok-to-test and removed needs-ok-to-test labels Jan 15, 2026

eranco74 reviewed Jan 15, 2026

View reviewed changes

ItzikEzra-rh force-pushed the evals/eval-data-diversity branch from 50b8b49 to 631ee14 Compare January 18, 2026 13:36

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

ItzikEzra-rh force-pushed the evals/eval-data-diversity branch from 631ee14 to 11706f5 Compare January 18, 2026 13:57

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

ItzikEzra-rh force-pushed the evals/eval-data-diversity branch from 11706f5 to f9f69a5 Compare January 18, 2026 14:05

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

ItzikEzra-rh requested a review from eranco74 January 18, 2026 15:02

test(evals): expand eval_data Q&A coverage #279

Are you sure you want to change the base?

test(evals): expand eval_data Q&A coverage #279

Conversation

ItzikEzra-rh commented Jan 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Jan 14, 2026

Uh oh!

openshift-ci bot commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

eliajahshan commented Jan 15, 2026

Uh oh!

openshift-ci bot commented Jan 15, 2026

Uh oh!

eranco74 commented Jan 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ItzikEzra-rh commented Jan 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 14, 2026 •

edited

Loading