Skip to content

Conversation

@ItzikEzra-rh
Copy link
Contributor

@ItzikEzra-rh ItzikEzra-rh commented Jan 14, 2026

Added 51 evaluation queries across 47 conversation groups.
In five local runs, the suite achieved a 96–100% pass rate.

Summary by CodeRabbit

  • Tests
    • Added a large set of new conversational test cases covering cluster creation follow-ups, operator handling (including virtualization), platform/architecture queries (including s390x), VIP/configuration scenarios, Day‑1/Day‑2 operations, cloud boundaries, safety/refusal edge cases, and numerous boundary tests.
    • Added an automated verification script to query the service and determine the latest stable OpenShift release.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci openshift-ci bot requested review from carbonin and omertuc January 14, 2026 11:45
@openshift-ci
Copy link

openshift-ci bot commented Jan 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ItzikEzra-rh
Once this PR has been reviewed and has the lgtm label, please assign maorfr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link

openshift-ci bot commented Jan 14, 2026

Hi @ItzikEzra-rh. Thanks for your PR.

I'm waiting for a rh-ecosystem-edge member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

Walkthrough

Adds many new evaluation conversation groups to test/evals/eval_data.yaml (cluster flows, operators, platform/version/boundary cases, post‑install checks) and a new script test/scripts/verify_latest_version.sh that queries Assisted Service for the latest stable OpenShift version.

Changes

Cohort / File(s) Summary
Test evaluation data
test/evals/eval_data.yaml
Added ~396 lines of new conversation_group entries covering cluster creation follow-ups, operator add/validation, platform/CPU/version queries (including s390x), VIP/auth/API boundaries, Day‑1/Day‑2 ops, destructive-action checks, expected intents, tool calls, verification & cleanup scripts. Review expected_intent/tool call accuracy and embedded scripts.
Verification script
test/scripts/verify_latest_version.sh
New Bash script (sources test/scripts/common.sh) that fetches Assisted Service OpenShift versions, filters pre-releases (-fc, -rc, -ec, -nightly, -ci), semver-sorts, and prints the latest stable version; includes basic error handling. Review shell robustness and API/response parsing assumptions.

Sequence Diagram(s)

(No sequence diagrams generated — changes are primarily test data additions and a small utility script without substantial multi-component control flow requiring visualization.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

lgtm

Suggested reviewers

  • omertuc
  • carbonin
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: expanding evaluation data with new Q&A test cases for the eval_data.yaml file.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@eliajahshan
Copy link

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Jan 15, 2026

@eliajahshan: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@eranco74
Copy link
Collaborator

/ok-to-test

ItzikEzra-rh added a commit to ItzikEzra-rh/assisted-chat that referenced this pull request Jan 18, 2026
- Add empty lines between conversation_groups for consistency
- Create multi-turn conversation for cluster creation + follow-up queries
- Update ask_about_platform_none with VIP differences question
- Add list_versions_for_env follow-up query
- Add verify_latest_version.sh script for version validation
ItzikEzra-rh added a commit to ItzikEzra-rh/assisted-chat that referenced this pull request Jan 18, 2026
- Add empty lines between conversation_groups for consistency
- Create multi-turn conversation for cluster creation + follow-up queries
- Update ask_about_platform_none with VIP differences question
- Add list_versions_for_env follow-up query
- Add verify_latest_version.sh script for version validation
@ItzikEzra-rh ItzikEzra-rh force-pushed the evals/eval-data-diversity branch from 50b8b49 to 631ee14 Compare January 18, 2026 13:36
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@test/evals/eval_data.yaml`:
- Around line 749-752: The test's eval_query contains an incorrectly formatted
UUID (last segment has 10 hex chars) which can make the test fail for format
validation instead of intent handling; update the cluster ID in the eval_query
string to a syntactically valid UUID (e.g., change
12345678-1234-1234-1234-1234567890 to a 36-char UUID such as
12345678-1234-1234-1234-1234567890ab) so the test exercises the intended
non-existent-cluster boundary case rather than a malformed-ID error.
🧹 Nitpick comments (2)
test/evals/eval_data.yaml (2)

579-593: Consider adding a verification script for version filtering.

The test expects the response to list versions "compatible with the s390x CPU architecture," but there's no verification script to confirm the returned versions actually support s390x. A past review comment suggested adding validation to check that the version list fits the CPU architecture.


595-608: Consider integrating verify_latest_version.sh for validation.

The PR adds test/scripts/verify_latest_version.sh which appears designed to support this test case. However, the test doesn't use eval_verify_script to validate that the returned version is actually the latest stable version. Consider adding:

eval_verify_script: ../scripts/verify_latest_version.sh

This would ensure the response is validated against the actual latest stable version from the API, rather than relying solely on intent matching.

Comment on lines +749 to +753
eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890?
eval_types:
- response_eval:intent
expected_intent: Declining the request and explaining that the Assisted Installer is used for initial cluster deployment (Day 1) and cannot be used to modify a running cluster (Day 2 operations).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verify cluster ID format in test query.

The cluster ID 12345678-1234-1234-1234-1234567890 in the boundary test query is missing a digit (should be 12 hex characters in the last segment, but has only 10). While this is intentionally testing a boundary case with a non-existent cluster, an invalid UUID format might cause the test to fail for the wrong reason.

Suggested fix
-    eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890?
+    eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-123456789012?
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890?
eval_types:
- response_eval:intent
expected_intent: Declining the request and explaining that the Assisted Installer is used for initial cluster deployment (Day 1) and cannot be used to modify a running cluster (Day 2 operations).
eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-123456789012?
eval_types:
- response_eval:intent
expected_intent: Declining the request and explaining that the Assisted Installer is used for initial cluster deployment (Day 1) and cannot be used to modify a running cluster (Day 2 operations).
🤖 Prompt for AI Agents
In `@test/evals/eval_data.yaml` around lines 749 - 752, The test's eval_query
contains an incorrectly formatted UUID (last segment has 10 hex chars) which can
make the test fail for format validation instead of intent handling; update the
cluster ID in the eval_query string to a syntactically valid UUID (e.g., change
12345678-1234-1234-1234-1234567890 to a 36-char UUID such as
12345678-1234-1234-1234-1234567890ab) so the test exercises the intended
non-existent-cluster boundary case rather than a malformed-ID error.

ItzikEzra-rh added a commit to ItzikEzra-rh/assisted-chat that referenced this pull request Jan 18, 2026
- Add empty lines between conversation_groups for consistency
- Create multi-turn conversation for cluster creation + follow-up queries
- Update ask_about_platform_none with VIP differences question
- Add list_versions_for_env follow-up query
- Add verify_latest_version.sh script for version validation
@ItzikEzra-rh ItzikEzra-rh force-pushed the evals/eval-data-diversity branch from 631ee14 to 11706f5 Compare January 18, 2026 13:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@test/scripts/verify_latest_version.sh`:
- Around line 12-14: The fetch_versions function uses curl without timeouts,
which can hang CI; update the curl invocation in fetch_versions to add
connection and total timeouts (e.g., --connect-timeout and --max-time) and a
small retry (e.g., --retry and --retry-delay) so network stalls fail fast and
deterministically when calling "${ASSISTED_SERVICE_URL}/openshift-versions" with
Authorization: Bearer ${OCM_TOKEN}; keep the same output piping to jq and
existing empty-response handling.

Comment on lines +12 to +14
fetch_versions() {
curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" \
"${ASSISTED_SERVICE_URL}/openshift-versions" | jq -r 'keys[]'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the file in question
find . -name "verify_latest_version.sh" -type f 2>/dev/null | head -5

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 114


🏁 Script executed:

# Get the actual content around the mentioned lines
if [ -f "test/scripts/verify_latest_version.sh" ]; then
  cat -n "test/scripts/verify_latest_version.sh" | head -30
fi

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 1067


🏁 Script executed:

# Search for other curl usages in test scripts to see if there's a pattern
rg "curl" test/scripts/ -n --context 2 | head -50

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 3343


🏁 Script executed:

# Check if this script is used in CI/CD pipelines
find . -name "*.yml" -o -name "*.yaml" -o -name ".github" -type d 2>/dev/null | xargs grep -l "verify_latest_version" 2>/dev/null || echo "No CI references found"

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 97


🏁 Script executed:

# Look for any timeout configurations in similar test scripts
rg "(timeout|--connect-timeout|--max-time)" test/scripts/ -n || echo "No timeouts found in test scripts"

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 108


🏁 Script executed:

# Check the complete verify_latest_version.sh file
cat -n "test/scripts/verify_latest_version.sh"

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 1790


🏁 Script executed:

# Look for CI/CD configuration files
find . -type d \( -name ".github" -o -name ".gitlab-ci" \) 2>/dev/null
ls -la | grep -E "\.(yml|yaml)$|\.github"

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 213


🏁 Script executed:

# Check if there's a Makefile or similar with timeout settings
find . -maxdepth 2 -name "Makefile" -o -name "*.mk" | xargs cat 2>/dev/null | head -50

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 2116


🏁 Script executed:

# Look at common.sh to see if there are any utility functions for timeouts
cat -n "test/scripts/common.sh" | head -60

Repository: rh-ecosystem-edge/assisted-chat

Length of output: 2538


Add curl timeouts to prevent CI hangs.

A stalled network call (DNS timeout, connection hang, or slow/unresponsive server) would block the job indefinitely. The current curl lacks timeout protections, and error handling on lines 34-37 only catches empty responses, not hangs. Add connect and max-time timeouts with a small retry to make failures deterministic and improve test reliability.

🔧 Suggested change
 fetch_versions() {
-    curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" \
+    curl -sSf --connect-timeout 10 --max-time 30 --retry 3 --retry-delay 2 \
+        -H "Authorization: Bearer ${OCM_TOKEN}" \
         "${ASSISTED_SERVICE_URL}/openshift-versions" | jq -r 'keys[]'
 }
🤖 Prompt for AI Agents
In `@test/scripts/verify_latest_version.sh` around lines 12 - 14, The
fetch_versions function uses curl without timeouts, which can hang CI; update
the curl invocation in fetch_versions to add connection and total timeouts
(e.g., --connect-timeout and --max-time) and a small retry (e.g., --retry and
--retry-delay) so network stalls fail fast and deterministically when calling
"${ASSISTED_SERVICE_URL}/openshift-versions" with Authorization: Bearer
${OCM_TOKEN}; keep the same output piping to jq and existing empty-response
handling.

- Add empty lines between conversation_groups for consistency
- Create multi-turn conversation for cluster creation + follow-up queries
- Update ask_about_platform_none with VIP differences question
- Add list_versions_for_env follow-up query
- Add verify_latest_version.sh script for version validation
@ItzikEzra-rh ItzikEzra-rh force-pushed the evals/eval-data-diversity branch from 11706f5 to f9f69a5 Compare January 18, 2026 14:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@test/evals/eval_data.yaml`:
- Around line 552-559: The YAML has incorrect indentation causing `conversation`
and its nested items (`- eval_id`, `eval_query`, `eval_types`,
`expected_intent`) to be parsed as top-level entries; fix by indenting the list
under `conversation:` and the `eval_types:` list items so they are children of
the `conversation` mapping (e.g., make `- eval_id: add_operator_virtualization`
and its `eval_query`, `eval_types` and `expected_intent` lines indented one
level under `conversation:`), and apply the same indentation pattern to all
subsequent blocks that follow this structure.
♻️ Duplicate comments (1)
test/evals/eval_data.yaml (1)

749-751: Malformed UUID in boundary test query.

The cluster ID ends with only 10 hex characters, so it may fail format validation rather than the intended boundary behavior.

Suggested fix
-  eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890?
+  eval_query: Can you add a worker node to my already installed cluster 12345678-1234-1234-1234-1234567890ab?

Comment on lines +552 to +559
- conversation_group: add_operator_virtualization
description: Tests adding the OpenShift Virtualization operator.
conversation:
- eval_id: add_operator_virtualization
eval_query: I want to run VMs on my cluster. How do I enable that?
eval_types:
- response_eval:intent
expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix YAML indentation for conversation and nested lists (currently parsed as top-level).

As written, the list item under conversation: and the eval_types list are not indented, which makes the YAML invalid for this group and likely many of the following groups. Please indent list items under their parent keys and apply the same fix to the subsequent newly added blocks that follow this pattern.

✅ Proposed fix (apply same pattern to following groups)
-  conversation:
-  - eval_id: add_operator_virtualization
-    eval_query: I want to run VMs on my cluster. How do I enable that?
-    eval_types:
-    - response_eval:intent
-    expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.
+  conversation:
+    - eval_id: add_operator_virtualization
+      eval_query: I want to run VMs on my cluster. How do I enable that?
+      eval_types:
+        - response_eval:intent
+      expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- conversation_group: add_operator_virtualization
description: Tests adding the OpenShift Virtualization operator.
conversation:
- eval_id: add_operator_virtualization
eval_query: I want to run VMs on my cluster. How do I enable that?
eval_types:
- response_eval:intent
expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.
- conversation_group: add_operator_virtualization
description: Tests adding the OpenShift Virtualization operator.
conversation:
- eval_id: add_operator_virtualization
eval_query: I want to run VMs on my cluster. How do I enable that?
eval_types:
- response_eval:intent
expected_intent: Recommending the OpenShift Virtualization operator and offering to add it to a cluster if the cluster ID is provided.
🤖 Prompt for AI Agents
In `@test/evals/eval_data.yaml` around lines 552 - 559, The YAML has incorrect
indentation causing `conversation` and its nested items (`- eval_id`,
`eval_query`, `eval_types`, `expected_intent`) to be parsed as top-level
entries; fix by indenting the list under `conversation:` and the `eval_types:`
list items so they are children of the `conversation` mapping (e.g., make `-
eval_id: add_operator_virtualization` and its `eval_query`, `eval_types` and
`expected_intent` lines indented one level under `conversation:`), and apply the
same indentation pattern to all subsequent blocks that follow this structure.

@openshift-ci
Copy link

openshift-ci bot commented Jan 18, 2026

@ItzikEzra-rh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/local-dev-test f9f69a5 link false /test local-dev-test
ci/prow/eval-test f9f69a5 link false /test eval-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ItzikEzra-rh ItzikEzra-rh requested a review from eranco74 January 18, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants